2020-12-10 17:15:03

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 00/34] SEV-ES hypervisor support

From: Tom Lendacky <[email protected]>

This patch series provides support for running SEV-ES guests under KVM.

Secure Encrypted Virtualization - Encrypted State (SEV-ES) expands on the
SEV support to protect the guest register state from the hypervisor. See
"AMD64 Architecture Programmer's Manual Volume 2: System Programming",
section "15.35 Encrypted State (SEV-ES)" [1].

In order to allow a hypervisor to perform functions on behalf of a guest,
there is architectural support for notifying a guest's operating system
when certain types of VMEXITs are about to occur. This allows the guest to
selectively share information with the hypervisor to satisfy the requested
function. The notification is performed using a new exception, the VMM
Communication exception (#VC). The information is shared through the
Guest-Hypervisor Communication Block (GHCB) using the VMGEXIT instruction.
The GHCB format and the protocol for using it is documented in "SEV-ES
Guest-Hypervisor Communication Block Standardization" [2].

Under SEV-ES, a vCPU save area (VMSA) must be encrypted. SVM is updated to
build the initial VMSA and then encrypt it before running the guest. Once
encrypted, it must not be modified by the hypervisor. Modification of the
VMSA will result in the VMRUN instruction failing with a SHUTDOWN exit
code. KVM must support the VMGEXIT exit code in order to perform the
necessary functions required of the guest. The GHCB is used to exchange
the information needed by both the hypervisor and the guest.

Register data from the GHCB is copied into the KVM register variables and
accessed as usual during handling of the exit. Upon return to the guest,
updated registers are copied back to the GHCB for the guest to act upon.

There are changes to some of the intercepts that are needed under SEV-ES.
For example, CR0 writes cannot be intercepted, so the code needs to ensure
that the intercept is not enabled during execution or that the hypervisor
does not try to read the register as part of exit processing. Another
example is shutdown processing, where the vCPU cannot be directly reset.

Support is added to handle VMGEXIT events and implement the GHCB protocol.
This includes supporting standard exit events, like a CPUID instruction
intercept, to new support, for things like AP processor booting. Much of
the existing SVM intercept support can be re-used by setting the exit
code information from the VMGEXIT and calling the appropriate intercept
handlers.

Finally, to launch and run an SEV-ES guest requires changes to the vCPU
initialization, loading and execution.

[1] https://www.amd.com/system/files/TechDocs/24593.pdf
[2] https://developer.amd.com/wp-content/resources/56421.pdf

---

These patches are based on the KVM queue branch:
https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue

dc924b062488 ("KVM: SVM: check CR4 changes against vcpu->arch")

A version of the tree can also be found at:
https://github.com/AMDESE/linux/tree/sev-es-v5
This tree has one addition patch that is not yet part of the queue
tree that is required to run any SEV guest:
[PATCH] KVM: x86: adjust SEV for commit 7e8e6eed75e
https://lore.kernel.org/kvm/[email protected]/

Changes from v4:
- Updated the tracking support for CR0/CR4

Changes from v3:
- Some krobot fixes.
- Some checkpatch cleanups.

Changes from v2:
- Update the freeing of the VMSA page to account for the encrypted memory
cache coherency feature as well as the VM page flush feature.
- Update the GHCB dump function with a bit more detail.
- Don't check for RAX being present as part of a string IO operation.
- Include RSI when syncing from GHCB to support KVM hypercall arguments.
- Add GHCB usage field validation check.

Changes from v1:
- Removed the VMSA indirection support:
- On LAUNCH_UPDATE_VMSA, sync traditional VMSA over to the new SEV-ES
VMSA area to be encrypted.
- On VMGEXIT VMEXIT, directly copy valid registers into vCPU arch
register array from GHCB. On VMRUN (following a VMGEXIT), directly
copy dirty vCPU arch registers to GHCB.
- Removed reg_read_override()/reg_write_override() KVM ops.
- Added VMGEXIT exit-reason validation.
- Changed kvm_vcpu_arch variable vmsa_encrypted to guest_state_protected
- Updated the tracking support for EFER/CR0/CR4/CR8 to minimize changes
to the x86.c code
- Updated __set_sregs to not set any register values (previously supported
setting the tracked values of EFER/CR0/CR4/CR8)
- Added support for reporting SMM capability at the VM-level. This allows
an SEV-ES guest to indicate SMM is not supported
- Updated FPU support to check for a guest FPU save area before using it.
Updated SVM to free guest FPU for an SEV-ES guest during KVM create_vcpu
op.
- Removed changes to the kvm_skip_emulated_instruction()
- Added VMSA validity checks before invoking LAUNCH_UPDATE_VMSA
- Minor code restructuring in areas for better readability

Cc: Paolo Bonzini <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Brijesh Singh <[email protected]>

Tom Lendacky (34):
x86/cpu: Add VM page flush MSR availablility as a CPUID feature
KVM: SVM: Remove the call to sev_platform_status() during setup
KVM: SVM: Add support for SEV-ES capability in KVM
KVM: SVM: Add GHCB accessor functions for retrieving fields
KVM: SVM: Add support for the SEV-ES VMSA
KVM: x86: Mark GPRs dirty when written
KVM: SVM: Add required changes to support intercepts under SEV-ES
KVM: SVM: Prevent debugging under SEV-ES
KVM: SVM: Do not allow instruction emulation under SEV-ES
KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES
KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file
KVM: SVM: Add initial support for a VMGEXIT VMEXIT
KVM: SVM: Create trace events for VMGEXIT processing
KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002
KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004
KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Add support for booting APs for an SEV-ES guest
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support to launch and run an SEV-ES guest

arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_host.h | 12 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/svm.h | 40 +-
arch/x86/include/uapi/asm/svm.h | 28 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/cpu/vmware.c | 12 +-
arch/x86/kvm/Kconfig | 3 +-
arch/x86/kvm/kvm_cache_regs.h | 51 +-
arch/x86/kvm/svm/sev.c | 933 +++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.c | 446 +++++++++++---
arch/x86/kvm/svm/svm.h | 166 ++++-
arch/x86/kvm/svm/vmenter.S | 50 ++
arch/x86/kvm/trace.h | 97 +++
arch/x86/kvm/vmx/vmx.c | 6 +-
arch/x86/kvm/x86.c | 344 +++++++++--
arch/x86/kvm/x86.h | 9 +
17 files changed, 1962 insertions(+), 238 deletions(-)


base-commit: dc924b062488a0376aae41d3e0a27dc99f852a5e
--
2.28.0


2020-12-10 17:18:36

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 21/34] KVM: SVM: Add support for CR0 write traps for an SEV-ES guest

From: Tom Lendacky <[email protected]>

For SEV-ES guests, the interception of control register write access
is not recommended. Control register interception occurs prior to the
control register being modified and the hypervisor is unable to modify
the control register itself because the register is located in the
encrypted register state.

SEV-ES support introduces new control register write traps. These traps
provide intercept support of a control register write after the control
register has been modified. The new control register value is provided in
the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
of the guest control registers.

Add support to track the value of the guest CR0 register using the control
register write trap so that the hypervisor understands the guest operating
mode.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/svm.h | 17 +++++++++++++++++
arch/x86/kvm/svm/svm.c | 26 ++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 33 ++++++++++++++++++++-------------
4 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26f937111226..2714ae0adeab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1476,6 +1476,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);

+void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 6e3f92e17655..14b0d97b50e2 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -78,6 +78,22 @@
#define SVM_EXIT_XSETBV 0x08d
#define SVM_EXIT_RDPRU 0x08e
#define SVM_EXIT_EFER_WRITE_TRAP 0x08f
+#define SVM_EXIT_CR0_WRITE_TRAP 0x090
+#define SVM_EXIT_CR1_WRITE_TRAP 0x091
+#define SVM_EXIT_CR2_WRITE_TRAP 0x092
+#define SVM_EXIT_CR3_WRITE_TRAP 0x093
+#define SVM_EXIT_CR4_WRITE_TRAP 0x094
+#define SVM_EXIT_CR5_WRITE_TRAP 0x095
+#define SVM_EXIT_CR6_WRITE_TRAP 0x096
+#define SVM_EXIT_CR7_WRITE_TRAP 0x097
+#define SVM_EXIT_CR8_WRITE_TRAP 0x098
+#define SVM_EXIT_CR9_WRITE_TRAP 0x099
+#define SVM_EXIT_CR10_WRITE_TRAP 0x09a
+#define SVM_EXIT_CR11_WRITE_TRAP 0x09b
+#define SVM_EXIT_CR12_WRITE_TRAP 0x09c
+#define SVM_EXIT_CR13_WRITE_TRAP 0x09d
+#define SVM_EXIT_CR14_WRITE_TRAP 0x09e
+#define SVM_EXIT_CR15_WRITE_TRAP 0x09f
#define SVM_EXIT_INVPCID 0x0a2
#define SVM_EXIT_NPF 0x400
#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
@@ -186,6 +202,7 @@
{ SVM_EXIT_MWAIT, "mwait" }, \
{ SVM_EXIT_XSETBV, "xsetbv" }, \
{ SVM_EXIT_EFER_WRITE_TRAP, "write_efer_trap" }, \
+ { SVM_EXIT_CR0_WRITE_TRAP, "write_cr0_trap" }, \
{ SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3b61cc088b31..e35050eafe3a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2470,6 +2470,31 @@ static int cr_interception(struct vcpu_svm *svm)
return kvm_complete_insn_gp(&svm->vcpu, err);
}

+static int cr_trap(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ unsigned long old_value, new_value;
+ unsigned int cr;
+
+ new_value = (unsigned long)svm->vmcb->control.exit_info_1;
+
+ cr = svm->vmcb->control.exit_code - SVM_EXIT_CR0_WRITE_TRAP;
+ switch (cr) {
+ case 0:
+ old_value = kvm_read_cr0(vcpu);
+ svm_set_cr0(vcpu, new_value);
+
+ kvm_post_set_cr0(vcpu, old_value, new_value);
+ break;
+ default:
+ WARN(1, "unhandled CR%d write trap", cr);
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ return kvm_complete_insn_gp(vcpu, 0);
+}
+
static int dr_interception(struct vcpu_svm *svm)
{
int reg, dr;
@@ -3051,6 +3076,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_XSETBV] = xsetbv_interception,
[SVM_EXIT_RDPRU] = rdpru_interception,
[SVM_EXIT_EFER_WRITE_TRAP] = efer_trap,
+ [SVM_EXIT_CR0_WRITE_TRAP] = cr_trap,
[SVM_EXIT_INVPCID] = invpcid_interception,
[SVM_EXIT_NPF] = npf_interception,
[SVM_EXIT_RSM] = rsm_interception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fcd862f5a2b4..1b3f1f326e9c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -804,11 +804,29 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(pdptrs_changed);

+void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
+{
+ unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
+
+ if ((cr0 ^ old_cr0) & X86_CR0_PG) {
+ kvm_clear_async_pf_completion_queue(vcpu);
+ kvm_async_pf_hash_reset(vcpu);
+ }
+
+ if ((cr0 ^ old_cr0) & update_bits)
+ kvm_mmu_reset_context(vcpu);
+
+ if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
+ kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
+ !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+ kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+}
+EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
+
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
{
unsigned long old_cr0 = kvm_read_cr0(vcpu);
unsigned long pdptr_bits = X86_CR0_CD | X86_CR0_NW | X86_CR0_PG;
- unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;

cr0 |= X86_CR0_ET;

@@ -847,18 +865,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)

kvm_x86_ops.set_cr0(vcpu, cr0);

- if ((cr0 ^ old_cr0) & X86_CR0_PG) {
- kvm_clear_async_pf_completion_queue(vcpu);
- kvm_async_pf_hash_reset(vcpu);
- }
-
- if ((cr0 ^ old_cr0) & update_bits)
- kvm_mmu_reset_context(vcpu);
-
- if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
- kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
- !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
- kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+ kvm_post_set_cr0(vcpu, old_cr0, cr0);

return 0;
}
--
2.28.0

2020-12-10 17:21:01

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 34/34] KVM: SVM: Provide support to launch and run an SEV-ES guest

From: Tom Lendacky <[email protected]>

An SEV-ES guest is started by invoking a new SEV initialization ioctl,
KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is
used to drive the appropriate ASID allocation, VMSA encryption, etc.

Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted
and measured. This is done using the LAUNCH_UPDATE_VMSA command after all
calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE
has been performed. In order to establish the encrypted VMSA, the current
(traditional) VMSA and the GPRs are synced to the page that will hold the
encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then
marked as having protected guest state.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 104 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 225f18dbf522..89f6fe4468c5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -203,6 +203,16 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int sev_es_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ if (!sev_es)
+ return -ENOTTY;
+
+ to_kvm_svm(kvm)->sev_info.es_active = true;
+
+ return sev_guest_init(kvm, argp);
+}
+
static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
{
struct sev_data_activate *data;
@@ -502,6 +512,94 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int sev_es_sync_vmsa(struct vcpu_svm *svm)
+{
+ struct vmcb_save_area *save = &svm->vmcb->save;
+
+ /* Check some debug related fields before encrypting the VMSA */
+ if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
+ return -EINVAL;
+
+ /* Sync registgers */
+ save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
+ save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
+ save->rcx = svm->vcpu.arch.regs[VCPU_REGS_RCX];
+ save->rdx = svm->vcpu.arch.regs[VCPU_REGS_RDX];
+ save->rsp = svm->vcpu.arch.regs[VCPU_REGS_RSP];
+ save->rbp = svm->vcpu.arch.regs[VCPU_REGS_RBP];
+ save->rsi = svm->vcpu.arch.regs[VCPU_REGS_RSI];
+ save->rdi = svm->vcpu.arch.regs[VCPU_REGS_RDI];
+ save->r8 = svm->vcpu.arch.regs[VCPU_REGS_R8];
+ save->r9 = svm->vcpu.arch.regs[VCPU_REGS_R9];
+ save->r10 = svm->vcpu.arch.regs[VCPU_REGS_R10];
+ save->r11 = svm->vcpu.arch.regs[VCPU_REGS_R11];
+ save->r12 = svm->vcpu.arch.regs[VCPU_REGS_R12];
+ save->r13 = svm->vcpu.arch.regs[VCPU_REGS_R13];
+ save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14];
+ save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15];
+ save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP];
+
+ /* Sync some non-GPR registers before encrypting */
+ save->xcr0 = svm->vcpu.arch.xcr0;
+ save->pkru = svm->vcpu.arch.pkru;
+ save->xss = svm->vcpu.arch.ia32_xss;
+
+ /*
+ * SEV-ES will use a VMSA that is pointed to by the VMCB, not
+ * the traditional VMSA that is part of the VMCB. Copy the
+ * traditional VMSA as it has been built so far (in prep
+ * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
+ */
+ memcpy(svm->vmsa, save, sizeof(*save));
+
+ return 0;
+}
+
+static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_launch_update_vmsa *vmsa;
+ int i, ret;
+
+ if (!sev_es_guest(kvm))
+ return -ENOTTY;
+
+ vmsa = kzalloc(sizeof(*vmsa), GFP_KERNEL);
+ if (!vmsa)
+ return -ENOMEM;
+
+ for (i = 0; i < kvm->created_vcpus; i++) {
+ struct vcpu_svm *svm = to_svm(kvm->vcpus[i]);
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ goto e_free;
+
+ /*
+ * The LAUNCH_UPDATE_VMSA command will perform in-place
+ * encryption of the VMSA memory content (i.e it will write
+ * the same memory region with the guest's key), so invalidate
+ * it first.
+ */
+ clflush_cache_range(svm->vmsa, PAGE_SIZE);
+
+ vmsa->handle = sev->handle;
+ vmsa->address = __sme_pa(svm->vmsa);
+ vmsa->len = PAGE_SIZE;
+ ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, vmsa,
+ &argp->error);
+ if (ret)
+ goto e_free;
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+e_free:
+ kfree(vmsa);
+ return ret;
+}
+
static int sev_launch_measure(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
void __user *measure = (void __user *)(uintptr_t)argp->data;
@@ -959,12 +1057,18 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_INIT:
r = sev_guest_init(kvm, &sev_cmd);
break;
+ case KVM_SEV_ES_INIT:
+ r = sev_es_guest_init(kvm, &sev_cmd);
+ break;
case KVM_SEV_LAUNCH_START:
r = sev_launch_start(kvm, &sev_cmd);
break;
case KVM_SEV_LAUNCH_UPDATE_DATA:
r = sev_launch_update_data(kvm, &sev_cmd);
break;
+ case KVM_SEV_LAUNCH_UPDATE_VMSA:
+ r = sev_launch_update_vmsa(kvm, &sev_cmd);
+ break;
case KVM_SEV_LAUNCH_MEASURE:
r = sev_launch_measure(kvm, &sev_cmd);
break;
--
2.28.0

2020-12-10 17:21:07

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 26/34] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest

From: Tom Lendacky <[email protected]>

The guest FPU state is automatically restored on VMRUN and saved on VMEXIT
by the hardware, so there is no reason to do this in KVM. Eliminate the
allocation of the guest_fpu save area and key off that to skip operations
related to the guest FPU state.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm/svm.c | 8 +++++
arch/x86/kvm/x86.c | 56 +++++++++++++++++++++++++++------
3 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cecd0eca66c7..048b08437c33 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1476,6 +1476,8 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);

+void kvm_free_guest_fpu(struct kvm_vcpu *vcpu);
+
void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3e6d79593b8d..8d22ae25a0f8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1318,6 +1318,14 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!vmsa_page)
goto error_free_vmcb_page;
+
+ /*
+ * SEV-ES guests maintain an encrypted version of their FPU
+ * state which is restored and saved on VMRUN and VMEXIT.
+ * Free the fpu structure to prevent KVM from attempting to
+ * access the FPU state.
+ */
+ kvm_free_guest_fpu(vcpu);
}

err = avic_init_vcpu(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 53fe34fd1a7f..ddd614a76744 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4515,6 +4515,9 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ if (!vcpu->arch.guest_fpu)
+ return;
+
if (boot_cpu_has(X86_FEATURE_XSAVE)) {
memset(guest_xsave, 0, sizeof(struct kvm_xsave));
fill_xsave((u8 *) guest_xsave->region, vcpu);
@@ -4532,9 +4535,14 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
- u64 xstate_bv =
- *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];
- u32 mxcsr = *(u32 *)&guest_xsave->region[XSAVE_MXCSR_OFFSET / sizeof(u32)];
+ u64 xstate_bv;
+ u32 mxcsr;
+
+ if (!vcpu->arch.guest_fpu)
+ return 0;
+
+ xstate_bv = *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];
+ mxcsr = *(u32 *)&guest_xsave->region[XSAVE_MXCSR_OFFSET / sizeof(u32)];

if (boot_cpu_has(X86_FEATURE_XSAVE)) {
/*
@@ -9252,9 +9260,14 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)

kvm_save_current_fpu(vcpu->arch.user_fpu);

- /* PKRU is separately restored in kvm_x86_ops.run. */
- __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
- ~XFEATURE_MASK_PKRU);
+ /*
+ * Guests with protected state can't have it set by the hypervisor,
+ * so skip trying to set it.
+ */
+ if (vcpu->arch.guest_fpu)
+ /* PKRU is separately restored in kvm_x86_ops.run. */
+ __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
+ ~XFEATURE_MASK_PKRU);

fpregs_mark_activate();
fpregs_unlock();
@@ -9267,7 +9280,12 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
fpregs_lock();

- kvm_save_current_fpu(vcpu->arch.guest_fpu);
+ /*
+ * Guests with protected state can't have it read by the hypervisor,
+ * so skip trying to save it.
+ */
+ if (vcpu->arch.guest_fpu)
+ kvm_save_current_fpu(vcpu->arch.guest_fpu);

copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);

@@ -9777,6 +9795,9 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
struct fxregs_state *fxsave;

+ if (!vcpu->arch.guest_fpu)
+ return 0;
+
vcpu_load(vcpu);

fxsave = &vcpu->arch.guest_fpu->state.fxsave;
@@ -9797,6 +9818,9 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
struct fxregs_state *fxsave;

+ if (!vcpu->arch.guest_fpu)
+ return 0;
+
vcpu_load(vcpu);

fxsave = &vcpu->arch.guest_fpu->state.fxsave;
@@ -9855,6 +9879,9 @@ static int sync_regs(struct kvm_vcpu *vcpu)

static void fx_init(struct kvm_vcpu *vcpu)
{
+ if (!vcpu->arch.guest_fpu)
+ return;
+
fpstate_init(&vcpu->arch.guest_fpu->state);
if (boot_cpu_has(X86_FEATURE_XSAVES))
vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
@@ -9868,6 +9895,15 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
}

+void kvm_free_guest_fpu(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->arch.guest_fpu) {
+ kmem_cache_free(x86_fpu_cache, vcpu->arch.guest_fpu);
+ vcpu->arch.guest_fpu = NULL;
+ }
+}
+EXPORT_SYMBOL_GPL(kvm_free_guest_fpu);
+
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
@@ -9963,7 +9999,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
return 0;

free_guest_fpu:
- kmem_cache_free(x86_fpu_cache, vcpu->arch.guest_fpu);
+ kvm_free_guest_fpu(vcpu);
free_user_fpu:
kmem_cache_free(x86_fpu_cache, vcpu->arch.user_fpu);
free_emulate_ctxt:
@@ -10017,7 +10053,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt);
free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
kmem_cache_free(x86_fpu_cache, vcpu->arch.user_fpu);
- kmem_cache_free(x86_fpu_cache, vcpu->arch.guest_fpu);
+ kvm_free_guest_fpu(vcpu);

kvm_hv_vcpu_uninit(vcpu);
kvm_pmu_destroy(vcpu);
@@ -10065,7 +10101,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_async_pf_hash_reset(vcpu);
vcpu->arch.apf.halted = false;

- if (kvm_mpx_supported()) {
+ if (vcpu->arch.guest_fpu && kvm_mpx_supported()) {
void *mpx_state_buffer;

/*
--
2.28.0

2020-12-10 17:29:45

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 31/34] KVM: SVM: Provide support for SEV-ES vCPU creation/loading

From: Tom Lendacky <[email protected]>

An SEV-ES vCPU requires additional VMCB initialization requirements for
vCPU creation and vCPU load/put requirements. This includes:

General VMCB initialization changes:
- Set a VMCB control bit to enable SEV-ES support on the vCPU.
- Set the VMCB encrypted VM save area address.
- CRx registers are part of the encrypted register state and cannot be
updated. Remove the CRx register read and write intercepts and replace
them with CRx register write traps to track the CRx register values.
- Certain MSR values are part of the encrypted register state and cannot
be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.).
- Remove the #GP intercept (no support for "enable_vmware_backdoor").
- Remove the XSETBV intercept since the hypervisor cannot modify XCR0.

General vCPU creation changes:
- Set the initial GHCB gpa value as per the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 15 +++++++++-
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 20 ++++++++++++--
arch/x86/kvm/svm/svm.h | 6 +++-
4 files changed, 92 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index caa8628f5fba..a57331de59e2 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -98,6 +98,16 @@ enum {
INTERCEPT_MWAIT_COND,
INTERCEPT_XSETBV,
INTERCEPT_RDPRU,
+ TRAP_EFER_WRITE,
+ TRAP_CR0_WRITE,
+ TRAP_CR1_WRITE,
+ TRAP_CR2_WRITE,
+ TRAP_CR3_WRITE,
+ TRAP_CR4_WRITE,
+ TRAP_CR5_WRITE,
+ TRAP_CR6_WRITE,
+ TRAP_CR7_WRITE,
+ TRAP_CR8_WRITE,
/* Byte offset 014h (word 5) */
INTERCEPT_INVLPGB = 160,
INTERCEPT_INVLPGB_ILLEGAL,
@@ -144,6 +154,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u8 reserved_6[8]; /* Offset 0xe8 */
u64 avic_logical_id; /* Offset 0xf0 */
u64 avic_physical_id; /* Offset 0xf8 */
+ u8 reserved_7[8];
+ u64 vmsa_pa; /* Used for an SEV-ES guest */
};


@@ -198,6 +210,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {

#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
+#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)

struct vmcb_seg {
u16 selector;
@@ -295,7 +308,7 @@ struct ghcb {


#define EXPECTED_VMCB_SAVE_AREA_SIZE 1032
-#define EXPECTED_VMCB_CONTROL_AREA_SIZE 256
+#define EXPECTED_VMCB_CONTROL_AREA_SIZE 272
#define EXPECTED_GHCB_SIZE PAGE_SIZE

static inline void __unused_size_checks(void)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bb6f069464cf..e34d3a6dba80 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1846,3 +1846,59 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
svm->ap_hlt_loop = false;
}
+
+void sev_es_init_vmcb(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
+ svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
+
+ /*
+ * An SEV-ES guest requires a VMSA area that is a separate from the
+ * VMCB page. Do not include the encryption mask on the VMSA physical
+ * address since hardware will access it using the guest key.
+ */
+ svm->vmcb->control.vmsa_pa = __pa(svm->vmsa);
+
+ /* Can't intercept CR register access, HV can't modify CR registers */
+ svm_clr_intercept(svm, INTERCEPT_CR0_READ);
+ svm_clr_intercept(svm, INTERCEPT_CR4_READ);
+ svm_clr_intercept(svm, INTERCEPT_CR8_READ);
+ svm_clr_intercept(svm, INTERCEPT_CR0_WRITE);
+ svm_clr_intercept(svm, INTERCEPT_CR4_WRITE);
+ svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
+
+ svm_clr_intercept(svm, INTERCEPT_SELECTIVE_CR0);
+
+ /* Track EFER/CR register changes */
+ svm_set_intercept(svm, TRAP_EFER_WRITE);
+ svm_set_intercept(svm, TRAP_CR0_WRITE);
+ svm_set_intercept(svm, TRAP_CR4_WRITE);
+ svm_set_intercept(svm, TRAP_CR8_WRITE);
+
+ /* No support for enable_vmware_backdoor */
+ clr_exception_intercept(svm, GP_VECTOR);
+
+ /* Can't intercept XSETBV, HV can't modify XCR0 directly */
+ svm_clr_intercept(svm, INTERCEPT_XSETBV);
+
+ /* Clear intercepts on selected MSRs */
+ set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+}
+
+void sev_es_create_vcpu(struct vcpu_svm *svm)
+{
+ /*
+ * Set the GHCB MSR value as per the GHCB specification when creating
+ * a vCPU for an SEV-ES guest.
+ */
+ set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+ GHCB_VERSION_MIN,
+ sev_enc_bit));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d8217ba6791f..46dd28cd1ea6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -91,7 +91,7 @@ static DEFINE_PER_CPU(u64, current_tsc_ratio);

static const struct svm_direct_access_msrs {
u32 index; /* Index of the MSR */
- bool always; /* True if intercept is always on */
+ bool always; /* True if intercept is initially cleared */
} direct_access_msrs[MAX_DIRECT_ACCESS_MSRS] = {
{ .index = MSR_STAR, .always = true },
{ .index = MSR_IA32_SYSENTER_CS, .always = true },
@@ -109,6 +109,9 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
{ .index = MSR_IA32_LASTINTTOIP, .always = false },
+ { .index = MSR_EFER, .always = false },
+ { .index = MSR_IA32_CR_PAT, .always = false },
+ { .index = MSR_AMD64_SEV_ES_GHCB, .always = true },
{ .index = MSR_INVALID, .always = false },
};

@@ -677,8 +680,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
msrpm[offset] = tmp;
}

-static void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
- int read, int write)
+void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
+ int read, int write)
{
set_shadow_msr_intercept(vcpu, msr, read, write);
set_msr_interception_bitmap(vcpu, msrpm, msr, read, write);
@@ -1264,6 +1267,11 @@ static void init_vmcb(struct vcpu_svm *svm)
if (sev_guest(svm->vcpu.kvm)) {
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
clr_exception_intercept(svm, UD_VECTOR);
+
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ /* Perform SEV-ES specific VMCB updates */
+ sev_es_init_vmcb(svm);
+ }
}

vmcb_mark_all_dirty(svm->vmcb);
@@ -1357,6 +1365,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
svm_init_osvw(vcpu);
vcpu->arch.microcode_version = 0x01000065;

+ if (sev_es_guest(svm->vcpu.kvm))
+ /* Perform SEV-ES specific VMCB creation updates */
+ sev_es_create_vcpu(svm);
+
return 0;

error_free_vmsa_page:
@@ -1452,6 +1464,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
loadsegment(gs, svm->host.gs);
#endif
#endif
+
for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
}
@@ -3155,6 +3168,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%016llx\n", "avic_backing_page:", control->avic_backing_page);
pr_err("%-20s%016llx\n", "avic_logical_id:", control->avic_logical_id);
pr_err("%-20s%016llx\n", "avic_physical_id:", control->avic_physical_id);
+ pr_err("%-20s%016llx\n", "vmsa_pa:", control->vmsa_pa);
pr_err("VMCB State Save Area:\n");
pr_err("%-5s s: %04x a: %04x l: %08x b: %016llx\n",
"es:",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 313cfb733f7e..1cf959cfcbc8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -34,7 +34,7 @@ static const u32 host_save_user_msrs[] = {

#define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)

-#define MAX_DIRECT_ACCESS_MSRS 15
+#define MAX_DIRECT_ACCESS_MSRS 18
#define MSRPM_OFFSETS 16
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
@@ -419,6 +419,8 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu);
bool svm_interrupt_blocked(struct kvm_vcpu *vcpu);
void svm_set_gif(struct vcpu_svm *svm, bool value);
int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code);
+void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
+ int read, int write);

/* nested.c */

@@ -579,5 +581,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu);
int sev_handle_vmgexit(struct vcpu_svm *svm);
int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void sev_es_init_vmcb(struct vcpu_svm *svm);
+void sev_es_create_vcpu(struct vcpu_svm *svm);

#endif
--
2.28.0

2020-12-10 17:33:21

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 28/34] KVM: SVM: Add NMI support for an SEV-ES guest

From: Tom Lendacky <[email protected]>

The GHCB specification defines how NMIs are to be handled for an SEV-ES
guest. To detect the completion of an NMI the hypervisor must not
intercept the IRET instruction (because a #VC while running the NMI will
issue an IRET) and, instead, must receive an NMI Complete exit event from
the guest.

Update the KVM support for detecting the completion of NMIs in the guest
to follow the GHCB specification. When an SEV-ES guest is active, the
IRET instruction will no longer be intercepted. Now, when the NMI Complete
exit event is received, the iret_interception() function will be called
to simulate the completion of the NMI.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 4 ++++
arch/x86/kvm/svm/svm.c | 20 +++++++++++++-------
2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b47285384b1f..486c5609fa25 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1451,6 +1451,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!ghcb_sw_scratch_is_valid(ghcb))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
@@ -1774,6 +1775,9 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
control->exit_info_2,
svm->ghcb_sa);
break;
+ case SVM_VMGEXIT_NMI_COMPLETE:
+ ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
+ break;
case SVM_VMGEXIT_AP_HLT_LOOP:
svm->ap_hlt_loop = true;
ret = kvm_emulate_halt(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2dbc20701ef5..16746bc6a1fa 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2339,9 +2339,11 @@ static int cpuid_interception(struct vcpu_svm *svm)
static int iret_interception(struct vcpu_svm *svm)
{
++svm->vcpu.stat.nmi_window_exits;
- svm_clr_intercept(svm, INTERCEPT_IRET);
svm->vcpu.arch.hflags |= HF_IRET_MASK;
- svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
+ if (!sev_es_guest(svm->vcpu.kvm)) {
+ svm_clr_intercept(svm, INTERCEPT_IRET);
+ svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
+ }
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
return 1;
}
@@ -3358,7 +3360,8 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)

svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
vcpu->arch.hflags |= HF_NMI_MASK;
- svm_set_intercept(svm, INTERCEPT_IRET);
+ if (!sev_es_guest(svm->vcpu.kvm))
+ svm_set_intercept(svm, INTERCEPT_IRET);
++vcpu->stat.nmi_injections;
}

@@ -3442,10 +3445,12 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)

if (masked) {
svm->vcpu.arch.hflags |= HF_NMI_MASK;
- svm_set_intercept(svm, INTERCEPT_IRET);
+ if (!sev_es_guest(svm->vcpu.kvm))
+ svm_set_intercept(svm, INTERCEPT_IRET);
} else {
svm->vcpu.arch.hflags &= ~HF_NMI_MASK;
- svm_clr_intercept(svm, INTERCEPT_IRET);
+ if (!sev_es_guest(svm->vcpu.kvm))
+ svm_clr_intercept(svm, INTERCEPT_IRET);
}
}

@@ -3623,8 +3628,9 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
* If we've made progress since setting HF_IRET_MASK, we've
* executed an IRET and can allow NMI injection.
*/
- if ((svm->vcpu.arch.hflags & HF_IRET_MASK)
- && kvm_rip_read(&svm->vcpu) != svm->nmi_iret_rip) {
+ if ((svm->vcpu.arch.hflags & HF_IRET_MASK) &&
+ (sev_es_guest(svm->vcpu.kvm) ||
+ kvm_rip_read(&svm->vcpu) != svm->nmi_iret_rip)) {
svm->vcpu.arch.hflags &= ~(HF_NMI_MASK | HF_IRET_MASK);
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
}
--
2.28.0

2020-12-10 17:33:24

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 30/34] KVM: SVM: Update ASID allocation to support SEV-ES guests

From: Tom Lendacky <[email protected]>

SEV and SEV-ES guests each have dedicated ASID ranges. Update the ASID
allocation routine to return an ASID in the respective range.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4797a6768eaf..bb6f069464cf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -63,19 +63,19 @@ static int sev_flush_asids(void)
}

/* Must be called with the sev_bitmap_lock held */
-static bool __sev_recycle_asids(void)
+static bool __sev_recycle_asids(int min_asid, int max_asid)
{
int pos;

/* Check if there are any ASIDs to reclaim before performing a flush */
- pos = find_next_bit(sev_reclaim_asid_bitmap,
- max_sev_asid, min_sev_asid - 1);
- if (pos >= max_sev_asid)
+ pos = find_next_bit(sev_reclaim_asid_bitmap, max_sev_asid, min_asid);
+ if (pos >= max_asid)
return false;

if (sev_flush_asids())
return false;

+ /* The flush process will flush all reclaimable SEV and SEV-ES ASIDs */
bitmap_xor(sev_asid_bitmap, sev_asid_bitmap, sev_reclaim_asid_bitmap,
max_sev_asid);
bitmap_zero(sev_reclaim_asid_bitmap, max_sev_asid);
@@ -83,20 +83,23 @@ static bool __sev_recycle_asids(void)
return true;
}

-static int sev_asid_new(void)
+static int sev_asid_new(struct kvm_sev_info *sev)
{
+ int pos, min_asid, max_asid;
bool retry = true;
- int pos;

mutex_lock(&sev_bitmap_lock);

/*
- * SEV-enabled guest must use asid from min_sev_asid to max_sev_asid.
+ * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid.
+ * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1.
*/
+ min_asid = sev->es_active ? 0 : min_sev_asid - 1;
+ max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid;
again:
- pos = find_next_zero_bit(sev_asid_bitmap, max_sev_asid, min_sev_asid - 1);
- if (pos >= max_sev_asid) {
- if (retry && __sev_recycle_asids()) {
+ pos = find_next_zero_bit(sev_asid_bitmap, max_sev_asid, min_asid);
+ if (pos >= max_asid) {
+ if (retry && __sev_recycle_asids(min_asid, max_asid)) {
retry = false;
goto again;
}
@@ -178,7 +181,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (unlikely(sev->active))
return ret;

- asid = sev_asid_new();
+ asid = sev_asid_new(sev);
if (asid < 0)
return ret;

--
2.28.0

2020-12-10 17:43:37

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 29/34] KVM: SVM: Set the encryption mask for the SVM host save area

From: Tom Lendacky <[email protected]>

The SVM host save area is used to restore some host state on VMEXIT of an
SEV-ES guest. After allocating the save area, clear it and add the
encryption mask to the SVM host save area physical address that is
programmed into the VM_HSAVE_PA MSR.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 1 -
arch/x86/kvm/svm/svm.c | 3 ++-
arch/x86/kvm/svm/svm.h | 2 ++
3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 486c5609fa25..4797a6768eaf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -32,7 +32,6 @@ unsigned int max_sev_asid;
static unsigned int min_sev_asid;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;
-#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)

struct enc_region {
struct list_head list;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 16746bc6a1fa..d8217ba6791f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -498,7 +498,7 @@ static int svm_hardware_enable(void)

wrmsrl(MSR_EFER, efer | EFER_SVME);

- wrmsrl(MSR_VM_HSAVE_PA, page_to_pfn(sd->save_area) << PAGE_SHIFT);
+ wrmsrl(MSR_VM_HSAVE_PA, __sme_page_pa(sd->save_area));

if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
@@ -566,6 +566,7 @@ static int svm_cpu_init(int cpu)
sd->save_area = alloc_page(GFP_KERNEL);
if (!sd->save_area)
goto free_cpu_data;
+ clear_page(page_address(sd->save_area));

if (svm_sev_enabled()) {
sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d570d5a6a2c..313cfb733f7e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -21,6 +21,8 @@

#include <asm/svm.h>

+#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
+
static const u32 host_save_user_msrs[] = {
#ifdef CONFIG_X86_64
MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
--
2.28.0

2020-12-10 17:45:47

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

From: Tom Lendacky <[email protected]>

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.

Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.

The GHCB specification also requires the hypervisor to save the address of
an AP Jump Table so that, for example, vCPUs that have been parked by UEFI
can be started by the OS. Provide support for the AP Jump Table set/get
exit code.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm/sev.c | 50 +++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 7 +++++
arch/x86/kvm/svm/svm.h | 3 ++
arch/x86/kvm/x86.c | 9 ++++++
5 files changed, 71 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 048b08437c33..60a3b9d33407 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1286,6 +1286,8 @@ struct kvm_x86_ops {

void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
+
+ void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a7531de760b5..b47285384b1f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,6 +17,8 @@
#include <linux/processor.h>
#include <linux/trace_events.h>

+#include <asm/trapnr.h>
+
#include "x86.h"
#include "svm.h"
#include "cpuid.h"
@@ -1449,6 +1451,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!ghcb_sw_scratch_is_valid(ghcb))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_HLT_LOOP:
+ case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
break;
default:
@@ -1770,6 +1774,35 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
control->exit_info_2,
svm->ghcb_sa);
break;
+ case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->ap_hlt_loop = true;
+ ret = kvm_emulate_halt(&svm->vcpu);
+ break;
+ case SVM_VMGEXIT_AP_JUMP_TABLE: {
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+ switch (control->exit_info_1) {
+ case 0:
+ /* Set AP jump table address */
+ sev->ap_jump_table = control->exit_info_2;
+ break;
+ case 1:
+ /* Get AP jump table address */
+ ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+ break;
+ default:
+ pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
+ control->exit_info_1);
+ ghcb_set_sw_exit_info_1(ghcb, 1);
+ ghcb_set_sw_exit_info_2(ghcb,
+ X86_TRAP_UD |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ }
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(&svm->vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -1790,3 +1823,20 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
return kvm_sev_es_string_io(&svm->vcpu, size, port,
svm->ghcb_sa, svm->ghcb_sa_len, in);
}
+
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ /* First SIPI: Use the values as initially set by the VMM */
+ if (!svm->ap_hlt_loop)
+ return;
+
+ /*
+ * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
+ * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
+ * non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+ svm->ap_hlt_loop = false;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8d22ae25a0f8..2dbc20701ef5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4400,6 +4400,11 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
(vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
}

+static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+ sev_vcpu_deliver_sipi_vector(vcpu, vector);
+}
+
static void svm_vm_destroy(struct kvm *kvm)
{
avic_vm_destroy(kvm);
@@ -4541,6 +4546,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.apic_init_signal_blocked = svm_apic_init_signal_blocked,

.msr_filter_changed = svm_msr_filter_changed,
+
+ .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
};

static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b3f03dede6ac..5d570d5a6a2c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -68,6 +68,7 @@ struct kvm_sev_info {
int fd; /* SEV device fd */
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list; /* List of registered regions */
+ u64 ap_jump_table; /* SEV-ES AP Jump Table address */
};

struct kvm_svm {
@@ -174,6 +175,7 @@ struct vcpu_svm {
struct vmcb_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
+ bool ap_hlt_loop;

/* SEV-ES scratch area support */
void *ghcb_sa;
@@ -574,5 +576,6 @@ void sev_hardware_teardown(void);
void sev_free_vcpu(struct kvm_vcpu *vcpu);
int sev_handle_vmgexit(struct vcpu_svm *svm);
int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);

#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ddd614a76744..4fd216b61a89 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10144,6 +10144,15 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
{
struct kvm_segment cs;

+ /*
+ * Guests with protected state can't have their state altered by KVM,
+ * call the vcpu_deliver_sipi_vector() x86 op for processing.
+ */
+ if (vcpu->arch.guest_state_protected) {
+ kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, vector);
+ return;
+ }
+
kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
cs.selector = vector << 8;
cs.base = vector << 12;
--
2.28.0

2020-12-10 17:51:38

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 19/34] KVM: SVM: Support string IO operations for an SEV-ES guest

From: Tom Lendacky <[email protected]>

For an SEV-ES guest, string-based port IO is performed to a shared
(un-encrypted) page so that both the hypervisor and guest can read or
write to it and each see the contents.

For string-based port IO operations, invoke SEV-ES specific routines that
can complete the operation using common KVM port IO support.

[ set but not used variable ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/sev.c | 18 +++++++++--
arch/x86/kvm/svm/svm.c | 11 +++++--
arch/x86/kvm/svm/svm.h | 1 +
arch/x86/kvm/x86.c | 54 +++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.h | 3 ++
6 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8cf6b0493d49..26f937111226 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -614,6 +614,7 @@ struct kvm_vcpu_arch {

struct kvm_pio_request pio;
void *pio_data;
+ void *guest_ins_data;

u8 event_exit_inst_len;

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 63f20be4bc69..a7531de760b5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1406,9 +1406,14 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_EXIT_INVD:
break;
case SVM_EXIT_IOIO:
- if (!(ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_TYPE_MASK))
- if (!ghcb_rax_is_valid(ghcb))
+ if (ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_STR_MASK) {
+ if (!ghcb_sw_scratch_is_valid(ghcb))
goto vmgexit_err;
+ } else {
+ if (!(ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_TYPE_MASK))
+ if (!ghcb_rax_is_valid(ghcb))
+ goto vmgexit_err;
+ }
break;
case SVM_EXIT_MSR:
if (!ghcb_rcx_is_valid(ghcb))
@@ -1776,3 +1781,12 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)

return ret;
}
+
+int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
+{
+ if (!setup_vmgexit_scratch(svm, in, svm->vmcb->control.exit_info_2))
+ return -EINVAL;
+
+ return kvm_sev_es_string_io(&svm->vcpu, size, port,
+ svm->ghcb_sa, svm->ghcb_sa_len, in);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ad1ec6ad558e..32502c4b091d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2058,11 +2058,16 @@ static int io_interception(struct vcpu_svm *svm)
++svm->vcpu.stat.io_exits;
string = (io_info & SVM_IOIO_STR_MASK) != 0;
in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
- if (string)
- return kvm_emulate_instruction(vcpu, 0);
-
port = io_info >> 16;
size = (io_info & SVM_IOIO_SIZE_MASK) >> SVM_IOIO_SIZE_SHIFT;
+
+ if (string) {
+ if (sev_es_guest(vcpu->kvm))
+ return sev_es_string_io(svm, size, port, in);
+ else
+ return kvm_emulate_instruction(vcpu, 0);
+ }
+
svm->next_rip = svm->vmcb->control.exit_info_2;

return kvm_fast_pio(&svm->vcpu, size, port, in);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9019ad6a8138..b3f03dede6ac 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -573,5 +573,6 @@ void __init sev_hardware_setup(void);
void sev_hardware_teardown(void);
void sev_free_vcpu(struct kvm_vcpu *vcpu);
int sev_handle_vmgexit(struct vcpu_svm *svm);
+int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);

#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 78e8c8b36f9b..fcd862f5a2b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10790,6 +10790,10 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)

unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
{
+ /* Can't read the RIP when guest state is protected, just return 0 */
+ if (vcpu->arch.guest_state_protected)
+ return 0;
+
if (is_64_bit_mode(vcpu))
return kvm_rip_read(vcpu);
return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
@@ -11422,6 +11426,56 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
}
EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_read);

+static int complete_sev_es_emulated_ins(struct kvm_vcpu *vcpu)
+{
+ memcpy(vcpu->arch.guest_ins_data, vcpu->arch.pio_data,
+ vcpu->arch.pio.count * vcpu->arch.pio.size);
+ vcpu->arch.pio.count = 0;
+
+ return 1;
+}
+
+static int kvm_sev_es_outs(struct kvm_vcpu *vcpu, unsigned int size,
+ unsigned int port, void *data, unsigned int count)
+{
+ int ret;
+
+ ret = emulator_pio_out_emulated(vcpu->arch.emulate_ctxt, size, port,
+ data, count);
+ if (ret)
+ return ret;
+
+ vcpu->arch.pio.count = 0;
+
+ return 0;
+}
+
+static int kvm_sev_es_ins(struct kvm_vcpu *vcpu, unsigned int size,
+ unsigned int port, void *data, unsigned int count)
+{
+ int ret;
+
+ ret = emulator_pio_in_emulated(vcpu->arch.emulate_ctxt, size, port,
+ data, count);
+ if (ret) {
+ vcpu->arch.pio.count = 0;
+ } else {
+ vcpu->arch.guest_ins_data = data;
+ vcpu->arch.complete_userspace_io = complete_sev_es_emulated_ins;
+ }
+
+ return 0;
+}
+
+int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
+ unsigned int port, void *data, unsigned int count,
+ int in)
+{
+ return in ? kvm_sev_es_ins(vcpu, size, port, data, count)
+ : kvm_sev_es_outs(vcpu, size, port, data, count);
+}
+EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 804369fe45e3..0e8fe766a4c5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -411,5 +411,8 @@ int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
void *dst);
int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
void *dst);
+int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
+ unsigned int port, void *data, unsigned int count,
+ int in);

#endif
--
2.28.0

2020-12-10 17:52:35

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 17/34] KVM: SVM: Create trace events for VMGEXIT MSR protocol processing

From: Tom Lendacky <[email protected]>

Add trace events for entry to and exit from VMGEXIT MSR protocol
processing. The vCPU will be common for the trace events. The MSR
protocol processing is guided by the GHCB GPA in the VMCB, so the GHCB
GPA will represent the input and output values for the entry and exit
events, respectively. Additionally, the exit event will contain the
return code for the event.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 6 ++++++
arch/x86/kvm/trace.h | 44 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 2 ++
3 files changed, 52 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c2cc38e7400b..2e2548fa369b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1530,6 +1530,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)

ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;

+ trace_kvm_vmgexit_msr_protocol_enter(svm->vcpu.vcpu_id,
+ control->ghcb_gpa);
+
switch (ghcb_info) {
case GHCB_MSR_SEV_INFO_REQ:
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
@@ -1591,6 +1594,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
ret = -EINVAL;
}

+ trace_kvm_vmgexit_msr_protocol_exit(svm->vcpu.vcpu_id,
+ control->ghcb_gpa, ret);
+
return ret;
}

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 7da931a511c9..2de30c20bc26 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1631,6 +1631,50 @@ TRACE_EVENT(kvm_vmgexit_exit,
__entry->info1, __entry->info2)
);

+/*
+ * Tracepoint for the start of VMGEXIT MSR procotol processing
+ */
+TRACE_EVENT(kvm_vmgexit_msr_protocol_enter,
+ TP_PROTO(unsigned int vcpu_id, u64 ghcb_gpa),
+ TP_ARGS(vcpu_id, ghcb_gpa),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(u64, ghcb_gpa)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->ghcb_gpa = ghcb_gpa;
+ ),
+
+ TP_printk("vcpu %u, ghcb_gpa %016llx",
+ __entry->vcpu_id, __entry->ghcb_gpa)
+);
+
+/*
+ * Tracepoint for the end of VMGEXIT MSR procotol processing
+ */
+TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
+ TP_PROTO(unsigned int vcpu_id, u64 ghcb_gpa, int result),
+ TP_ARGS(vcpu_id, ghcb_gpa, result),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(u64, ghcb_gpa)
+ __field(int, result)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->ghcb_gpa = ghcb_gpa;
+ __entry->result = result;
+ ),
+
+ TP_printk("vcpu %u, ghcb_gpa %016llx, result %d",
+ __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d89736066b39..ba26b62e0262 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11323,3 +11323,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
--
2.28.0

2020-12-10 17:52:59

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 11/34] KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file

From: Tom Lendacky <[email protected]>

This is a pre-patch to consolidate some exit handling code into callable
functions. Follow-on patches for SEV-ES exit handling will then be able
to use them from the sev.c file.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 64 +++++++++++++++++++++++++-----------------
1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3b02620ba9a9..ce7bcb9cf90c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3151,6 +3151,43 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
"excp_to:", save->last_excp_to);
}

+static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
+{
+ if (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
+ svm_exit_handlers[exit_code])
+ return 0;
+
+ vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%llx\n", exit_code);
+ dump_vmcb(vcpu);
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
+ vcpu->run->internal.ndata = 2;
+ vcpu->run->internal.data[0] = exit_code;
+ vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
+
+ return -EINVAL;
+}
+
+static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
+{
+ if (svm_handle_invalid_exit(&svm->vcpu, exit_code))
+ return 0;
+
+#ifdef CONFIG_RETPOLINE
+ if (exit_code == SVM_EXIT_MSR)
+ return msr_interception(svm);
+ else if (exit_code == SVM_EXIT_VINTR)
+ return interrupt_window_interception(svm);
+ else if (exit_code == SVM_EXIT_INTR)
+ return intr_interception(svm);
+ else if (exit_code == SVM_EXIT_HLT)
+ return halt_interception(svm);
+ else if (exit_code == SVM_EXIT_NPF)
+ return npf_interception(svm);
+#endif
+ return svm_exit_handlers[exit_code](svm);
+}
+
static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
u32 *intr_info, u32 *error_code)
{
@@ -3217,32 +3254,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;

- if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
- || !svm_exit_handlers[exit_code]) {
- vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%x\n", exit_code);
- dump_vmcb(vcpu);
- vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
- vcpu->run->internal.suberror =
- KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
- vcpu->run->internal.ndata = 2;
- vcpu->run->internal.data[0] = exit_code;
- vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
- return 0;
- }
-
-#ifdef CONFIG_RETPOLINE
- if (exit_code == SVM_EXIT_MSR)
- return msr_interception(svm);
- else if (exit_code == SVM_EXIT_VINTR)
- return interrupt_window_interception(svm);
- else if (exit_code == SVM_EXIT_INTR)
- return intr_interception(svm);
- else if (exit_code == SVM_EXIT_HLT)
- return halt_interception(svm);
- else if (exit_code == SVM_EXIT_NPF)
- return npf_interception(svm);
-#endif
- return svm_exit_handlers[exit_code](svm);
+ return svm_invoke_exit_handler(svm, exit_code);
}

static void reload_tss(struct kvm_vcpu *vcpu)
--
2.28.0

2020-12-10 18:06:47

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 07/34] KVM: SVM: Add required changes to support intercepts under SEV-ES

From: Tom Lendacky <[email protected]>

When a guest is running under SEV-ES, the hypervisor cannot access the
guest register state. There are numerous places in the KVM code where
certain registers are accessed that are not allowed to be accessed (e.g.
RIP, CR0, etc). Add checks to prevent register accesses and add intercept
update support at various points within the KVM code.

Also, when handling a VMGEXIT, exceptions are passed back through the
GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error,
update the SVM intercepts to handle this for SEV-ES guests.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 3 +-
arch/x86/kvm/svm/svm.c | 111 +++++++++++++++++++++++++++++++++----
arch/x86/kvm/x86.c | 6 +-
3 files changed, 107 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1edf24f51b53..bce28482d63d 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -178,7 +178,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)

-#define SVM_INTERRUPT_SHADOW_MASK 1
+#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
+#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)

#define SVM_IOIO_STR_SHIFT 2
#define SVM_IOIO_REP_SHIFT 3
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd4c9884e5a8..857d0d3f2752 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -36,6 +36,7 @@
#include <asm/mce.h>
#include <asm/spec-ctrl.h>
#include <asm/cpu_device_id.h>
+#include <asm/traps.h>

#include <asm/virtext.h>
#include "trace.h"
@@ -340,6 +341,13 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ /*
+ * SEV-ES does not expose the next RIP. The RIP update is controlled by
+ * the type of exit and the #VC handler in the guest.
+ */
+ if (sev_es_guest(vcpu->kvm))
+ goto done;
+
if (nrips && svm->vmcb->control.next_rip != 0) {
WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
svm->next_rip = svm->vmcb->control.next_rip;
@@ -351,6 +359,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
} else {
kvm_rip_write(vcpu, svm->next_rip);
}
+
+done:
svm_set_interrupt_shadow(vcpu, 0);

return 1;
@@ -1652,9 +1662,18 @@ static void svm_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)

static void update_cr0_intercept(struct vcpu_svm *svm)
{
- ulong gcr0 = svm->vcpu.arch.cr0;
- u64 *hcr0 = &svm->vmcb->save.cr0;
+ ulong gcr0;
+ u64 *hcr0;
+
+ /*
+ * SEV-ES guests must always keep the CR intercepts cleared. CR
+ * tracking is done using the CR write traps.
+ */
+ if (sev_es_guest(svm->vcpu.kvm))
+ return;

+ gcr0 = svm->vcpu.arch.cr0;
+ hcr0 = &svm->vmcb->save.cr0;
*hcr0 = (*hcr0 & ~SVM_CR0_SELECTIVE_MASK)
| (gcr0 & SVM_CR0_SELECTIVE_MASK);

@@ -1674,7 +1693,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
struct vcpu_svm *svm = to_svm(vcpu);

#ifdef CONFIG_X86_64
- if (vcpu->arch.efer & EFER_LME) {
+ if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
vcpu->arch.efer |= EFER_LMA;
svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
@@ -2608,7 +2627,29 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

static int rdmsr_interception(struct vcpu_svm *svm)
{
- return kvm_emulate_rdmsr(&svm->vcpu);
+ u32 ecx;
+ u64 data;
+
+ if (!sev_es_guest(svm->vcpu.kvm))
+ return kvm_emulate_rdmsr(&svm->vcpu);
+
+ ecx = kvm_rcx_read(&svm->vcpu);
+ if (kvm_get_msr(&svm->vcpu, ecx, &data)) {
+ trace_kvm_msr_read_ex(ecx);
+ ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+ ghcb_set_sw_exit_info_2(svm->ghcb,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ return 1;
+ }
+
+ trace_kvm_msr_read(ecx, data);
+
+ kvm_rax_write(&svm->vcpu, data & -1u);
+ kvm_rdx_write(&svm->vcpu, (data >> 32) & -1u);
+
+ return kvm_skip_emulated_instruction(&svm->vcpu);
}

static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
@@ -2797,7 +2838,27 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)

static int wrmsr_interception(struct vcpu_svm *svm)
{
- return kvm_emulate_wrmsr(&svm->vcpu);
+ u32 ecx;
+ u64 data;
+
+ if (!sev_es_guest(svm->vcpu.kvm))
+ return kvm_emulate_wrmsr(&svm->vcpu);
+
+ ecx = kvm_rcx_read(&svm->vcpu);
+ data = kvm_read_edx_eax(&svm->vcpu);
+ if (kvm_set_msr(&svm->vcpu, ecx, data)) {
+ trace_kvm_msr_write_ex(ecx, data);
+ ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+ ghcb_set_sw_exit_info_2(svm->ghcb,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ return 1;
+ }
+
+ trace_kvm_msr_write(ecx, data);
+
+ return kvm_skip_emulated_instruction(&svm->vcpu);
}

static int msr_interception(struct vcpu_svm *svm)
@@ -2827,7 +2888,14 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
static int pause_interception(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- bool in_kernel = (svm_get_cpl(vcpu) == 0);
+ bool in_kernel;
+
+ /*
+ * CPL is not made available for an SEV-ES guest, so just set in_kernel
+ * to true.
+ */
+ in_kernel = (sev_es_guest(svm->vcpu.kvm)) ? true
+ : (svm_get_cpl(vcpu) == 0);

if (!kvm_pause_in_guest(vcpu->kvm))
grow_ple_window(vcpu);
@@ -3090,10 +3158,13 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)

trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);

- if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
- vcpu->arch.cr0 = svm->vmcb->save.cr0;
- if (npt_enabled)
- vcpu->arch.cr3 = svm->vmcb->save.cr3;
+ /* SEV-ES guests must use the CR write traps to track CR registers. */
+ if (!sev_es_guest(vcpu->kvm)) {
+ if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
+ vcpu->arch.cr0 = svm->vmcb->save.cr0;
+ if (npt_enabled)
+ vcpu->arch.cr3 = svm->vmcb->save.cr3;
+ }

if (is_guest_mode(vcpu)) {
int vmexit;
@@ -3205,6 +3276,13 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ /*
+ * SEV-ES guests must always keep the CR intercepts cleared. CR
+ * tracking is done using the CR write traps.
+ */
+ if (sev_es_guest(vcpu->kvm))
+ return;
+
if (nested_svm_virtualize_tpr(vcpu))
return;

@@ -3273,6 +3351,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;

+ /*
+ * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
+ * bit to determine the state of the IF flag.
+ */
+ if (sev_es_guest(svm->vcpu.kvm))
+ return !(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK);
+
if (!gif_set(svm))
return true;

@@ -3458,6 +3543,12 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
svm->vcpu.arch.nmi_injected = true;
break;
case SVM_EXITINTINFO_TYPE_EXEPT:
+ /*
+ * Never re-inject a #VC exception.
+ */
+ if (vector == X86_TRAP_VC)
+ break;
+
/*
* In case of software exceptions, do not reinject the vector,
* but re-execute the instruction instead. Rewind RIP first
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a3fdc16cfd6f..b6809a2851d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4018,7 +4018,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
int idx;

- if (vcpu->preempted)
+ if (vcpu->preempted && !vcpu->arch.guest_state_protected)
vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);

/*
@@ -8161,7 +8161,9 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
{
struct kvm_run *kvm_run = vcpu->run;

- kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
+ kvm_run->if_flag = (vcpu->arch.guest_state_protected)
+ ? kvm_arch_interrupt_allowed(vcpu)
+ : (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
kvm_run->cr8 = kvm_get_cr8(vcpu);
kvm_run->apic_base = kvm_get_apic_base(vcpu);
--
2.28.0

2020-12-10 18:07:37

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 06/34] KVM: x86: Mark GPRs dirty when written

From: Tom Lendacky <[email protected]>

When performing VMGEXIT processing for an SEV-ES guest, register values
will be synced between KVM and the GHCB. Prepare for detecting when a GPR
has been updated (marked dirty) in order to determine whether to sync the
register to the GHCB.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/kvm_cache_regs.h | 51 ++++++++++++++++++-----------------
1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index a889563ad02d..f15bc16de07c 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -9,6 +9,31 @@
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
| X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)

+static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+}
+
+static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+}
+
#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
{ \
@@ -18,6 +43,7 @@ static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
unsigned long val) \
{ \
vcpu->arch.regs[VCPU_REGS_##uname] = val; \
+ kvm_register_mark_dirty(vcpu, VCPU_REGS_##uname); \
}
BUILD_KVM_GPR_ACCESSORS(rax, RAX)
BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
@@ -37,31 +63,6 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
BUILD_KVM_GPR_ACCESSORS(r15, R15)
#endif

-static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
-}
-
-static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
-}
-
-static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
-}
-
-static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
-}
-
static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
{
if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
--
2.28.0

2020-12-10 18:08:06

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 04/34] KVM: SVM: Add GHCB accessor functions for retrieving fields

From: Tom Lendacky <[email protected]>

Update the GHCB accessor functions to add functions for retrieve GHCB
fields by name. Update existing code to use the new accessor functions.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 10 ++++++++++
arch/x86/kernel/cpu/vmware.c | 12 ++++++------
2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 71d630bb5e08..1edf24f51b53 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -379,6 +379,16 @@ struct vmcb {
(unsigned long *)&ghcb->save.valid_bitmap); \
} \
\
+ static inline u64 ghcb_get_##field(struct ghcb *ghcb) \
+ { \
+ return ghcb->save.field; \
+ } \
+ \
+ static inline u64 ghcb_get_##field##_if_valid(struct ghcb *ghcb) \
+ { \
+ return ghcb_##field##_is_valid(ghcb) ? ghcb->save.field : 0; \
+ } \
+ \
static inline void ghcb_set_##field(struct ghcb *ghcb, u64 value) \
{ \
__set_bit(GHCB_BITMAP_IDX(field), \
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 924571fe5864..c6ede3b3d302 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -501,12 +501,12 @@ static bool vmware_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
ghcb_rbp_is_valid(ghcb)))
return false;

- regs->bx = ghcb->save.rbx;
- regs->cx = ghcb->save.rcx;
- regs->dx = ghcb->save.rdx;
- regs->si = ghcb->save.rsi;
- regs->di = ghcb->save.rdi;
- regs->bp = ghcb->save.rbp;
+ regs->bx = ghcb_get_rbx(ghcb);
+ regs->cx = ghcb_get_rcx(ghcb);
+ regs->dx = ghcb_get_rdx(ghcb);
+ regs->si = ghcb_get_rsi(ghcb);
+ regs->di = ghcb_get_rdi(ghcb);
+ regs->bp = ghcb_get_rbp(ghcb);

return true;
}
--
2.28.0

2020-12-10 18:08:06

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 02/34] KVM: SVM: Remove the call to sev_platform_status() during setup

From: Tom Lendacky <[email protected]>

When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization can happen before CCP initialization. As
a result, sev_platform_status() will return a failure when it is called
from sev_hardware_setup(), when this isn't really an error condition.

Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 22 +---------------------
1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c0b14106258a..a4ba5476bf42 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1127,9 +1127,6 @@ void sev_vm_destroy(struct kvm *kvm)

int __init sev_hardware_setup(void)
{
- struct sev_user_data_status *status;
- int rc;
-
/* Maximum number of encrypted guests supported simultaneously */
max_sev_asid = cpuid_ecx(0x8000001F);

@@ -1148,26 +1145,9 @@ int __init sev_hardware_setup(void)
if (!sev_reclaim_asid_bitmap)
return 1;

- status = kmalloc(sizeof(*status), GFP_KERNEL);
- if (!status)
- return 1;
-
- /*
- * Check SEV platform status.
- *
- * PLATFORM_STATUS can be called in any state, if we failed to query
- * the PLATFORM status then either PSP firmware does not support SEV
- * feature or SEV firmware is dead.
- */
- rc = sev_platform_status(status, NULL);
- if (rc)
- goto err;
-
pr_info("SEV supported\n");

-err:
- kfree(status);
- return rc;
+ return 0;
}

void sev_hardware_teardown(void)
--
2.28.0

2020-12-10 18:09:42

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 01/34] x86/cpu: Add VM page flush MSR availablility as a CPUID feature

From: Tom Lendacky <[email protected]>

On systems that do not have hardware enforced cache coherency between
encrypted and unencrypted mappings of the same physical page, the
hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache
contents of an SEV guest page. When a small number of pages are being
flushed, this can be used in place of issuing a WBINVD across all CPUs.

CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is
available. Add a CPUID feature to indicate it is supported and define the
MSR.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..54df367b3180 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -237,6 +237,7 @@
#define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
#define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_VM_PAGE_FLUSH ( 8*32+21) /* "" VM Page Flush MSR is supported */

/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 972a34d93505..abfc9b0fbd8d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -470,6 +470,7 @@
#define MSR_AMD64_ICIBSEXTDCTL 0xc001103c
#define MSR_AMD64_IBSOPDATA4 0xc001103d
#define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */
+#define MSR_AMD64_VM_PAGE_FLUSH 0xc001011e
#define MSR_AMD64_SEV_ES_GHCB 0xc0010130
#define MSR_AMD64_SEV 0xc0010131
#define MSR_AMD64_SEV_ENABLED_BIT 0
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 866c9a9bcdee..236924930bf0 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_SEV, CPUID_EAX, 1, 0x8000001f, 0 },
{ X86_FEATURE_SEV_ES, CPUID_EAX, 3, 0x8000001f, 0 },
{ X86_FEATURE_SME_COHERENT, CPUID_EAX, 10, 0x8000001f, 0 },
+ { X86_FEATURE_VM_PAGE_FLUSH, CPUID_EAX, 2, 0x8000001f, 0 },
{ 0, 0, 0, 0, 0 }
};

--
2.28.0

2020-12-10 18:10:41

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 03/34] KVM: SVM: Add support for SEV-ES capability in KVM

From: Tom Lendacky <[email protected]>

Add support to KVM for determining if a system is capable of supporting
SEV-ES as well as determining if a guest is an SEV-ES guest.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/Kconfig | 3 ++-
arch/x86/kvm/svm/sev.c | 47 ++++++++++++++++++++++++++++++++++--------
arch/x86/kvm/svm/svm.c | 20 +++++++++---------
arch/x86/kvm/svm/svm.h | 17 ++++++++++++++-
4 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f92dfd8ef10d..7ac592664c52 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -100,7 +100,8 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
help
- Provides support for launching Encrypted VMs on AMD processors.
+ Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
+ with Encrypted State (SEV-ES) on AMD processors.

config KVM_MMU_AUDIT
bool "Audit KVM MMU"
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a4ba5476bf42..9bf5e9dadff5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -932,7 +932,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
struct kvm_sev_cmd sev_cmd;
int r;

- if (!svm_sev_enabled())
+ if (!svm_sev_enabled() || !sev)
return -ENOTTY;

if (!argp)
@@ -1125,29 +1125,58 @@ void sev_vm_destroy(struct kvm *kvm)
sev_asid_free(sev->asid);
}

-int __init sev_hardware_setup(void)
+void __init sev_hardware_setup(void)
{
+ unsigned int eax, ebx, ecx, edx;
+ bool sev_es_supported = false;
+ bool sev_supported = false;
+
+ /* Does the CPU support SEV? */
+ if (!boot_cpu_has(X86_FEATURE_SEV))
+ goto out;
+
+ /* Retrieve SEV CPUID information */
+ cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+
/* Maximum number of encrypted guests supported simultaneously */
- max_sev_asid = cpuid_ecx(0x8000001F);
+ max_sev_asid = ecx;

if (!svm_sev_enabled())
- return 1;
+ goto out;

/* Minimum ASID value that should be used for SEV guest */
- min_sev_asid = cpuid_edx(0x8000001F);
+ min_sev_asid = edx;

/* Initialize SEV ASID bitmaps */
sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
if (!sev_asid_bitmap)
- return 1;
+ goto out;

sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
if (!sev_reclaim_asid_bitmap)
- return 1;
+ goto out;

- pr_info("SEV supported\n");
+ pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);
+ sev_supported = true;

- return 0;
+ /* SEV-ES support requested? */
+ if (!sev_es)
+ goto out;
+
+ /* Does the CPU support SEV-ES? */
+ if (!boot_cpu_has(X86_FEATURE_SEV_ES))
+ goto out;
+
+ /* Has the system been allocated ASIDs for SEV-ES? */
+ if (min_sev_asid == 1)
+ goto out;
+
+ pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1);
+ sev_es_supported = true;
+
+out:
+ sev = sev_supported;
+ sev_es = sev_es_supported;
}

void sev_hardware_teardown(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6dc337b9c231..a1ea30c98629 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -187,9 +187,13 @@ static int vgif = true;
module_param(vgif, int, 0444);

/* enable/disable SEV support */
-static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
module_param(sev, int, 0444);

+/* enable/disable SEV-ES support */
+int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev_es, int, 0444);
+
static bool __read_mostly dump_invalid_vmcb = 0;
module_param(dump_invalid_vmcb, bool, 0644);

@@ -959,15 +963,11 @@ static __init int svm_hardware_setup(void)
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}

- if (sev) {
- if (boot_cpu_has(X86_FEATURE_SEV) &&
- IS_ENABLED(CONFIG_KVM_AMD_SEV)) {
- r = sev_hardware_setup();
- if (r)
- sev = false;
- } else {
- sev = false;
- }
+ if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
+ sev_hardware_setup();
+ } else {
+ sev = false;
+ sev_es = false;
}

svm_adjust_mmio_mask();
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index fdff76eb6ceb..56d950df82e5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -61,6 +61,7 @@ enum {

struct kvm_sev_info {
bool active; /* SEV enabled guest */
+ bool es_active; /* SEV-ES enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -352,6 +353,9 @@ static inline bool gif_set(struct vcpu_svm *svm)
#define MSR_CR3_LONG_MBZ_MASK 0xfff0000000000000U
#define MSR_INVALID 0xffffffffU

+extern int sev;
+extern int sev_es;
+
u32 svm_msrpm_offset(u32 msr);
u32 *svm_vcpu_alloc_msrpm(void);
void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
@@ -484,6 +488,17 @@ static inline bool sev_guest(struct kvm *kvm)
#endif
}

+static inline bool sev_es_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_guest(kvm) && sev->es_active;
+#else
+ return false;
+#endif
+}
+
static inline bool svm_sev_enabled(void)
{
return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
@@ -496,7 +511,7 @@ int svm_register_enc_region(struct kvm *kvm,
int svm_unregister_enc_region(struct kvm *kvm,
struct kvm_enc_region *range);
void pre_sev_run(struct vcpu_svm *svm, int cpu);
-int __init sev_hardware_setup(void);
+void __init sev_hardware_setup(void);
void sev_hardware_teardown(void);

#endif
--
2.28.0

2020-12-11 08:37:49

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 14/34] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002

From: Tom Lendacky <[email protected]>

The GHCB specification defines a GHCB MSR protocol using the lower
12-bits of the GHCB MSR (in the hypervisor this corresponds to the
GHCB GPA field in the VMCB).

Function 0x002 is a request to set the GHCB MSR value to the SEV INFO as
per the specification via the VMCB GHCB GPA field.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 26 +++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 17 +++++++++++++++++
2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index da473c6b725e..58861515d3e3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
#include "cpuid.h"
#include "trace.h"

+static u8 sev_enc_bit;
static int sev_flush_asids(void);
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -1142,6 +1143,9 @@ void __init sev_hardware_setup(void)
/* Retrieve SEV CPUID information */
cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);

+ /* Set encryption bit location for SEV-ES guests */
+ sev_enc_bit = ebx & 0x3f;
+
/* Maximum number of encrypted guests supported simultaneously */
max_sev_asid = ecx;

@@ -1500,9 +1504,29 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
}

+static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
+{
+ svm->vmcb->control.ghcb_gpa = value;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
- return -EINVAL;
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ u64 ghcb_info;
+
+ ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;
+
+ switch (ghcb_info) {
+ case GHCB_MSR_SEV_INFO_REQ:
+ set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+ GHCB_VERSION_MIN,
+ sev_enc_bit));
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 1;
}

int sev_handle_vmgexit(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 89bcb26977e5..546f8d05e81e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -514,9 +514,26 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);

/* sev.c */

+#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MIN 1ULL
+
#define GHCB_MSR_INFO_POS 0
#define GHCB_MSR_INFO_MASK (BIT_ULL(12) - 1)

+#define GHCB_MSR_SEV_INFO_RESP 0x001
+#define GHCB_MSR_SEV_INFO_REQ 0x002
+#define GHCB_MSR_VER_MAX_POS 48
+#define GHCB_MSR_VER_MAX_MASK 0xffff
+#define GHCB_MSR_VER_MIN_POS 32
+#define GHCB_MSR_VER_MIN_MASK 0xffff
+#define GHCB_MSR_CBIT_POS 24
+#define GHCB_MSR_CBIT_MASK 0xff
+#define GHCB_MSR_SEV_INFO(_max, _min, _cbit) \
+ ((((_max) & GHCB_MSR_VER_MAX_MASK) << GHCB_MSR_VER_MAX_POS) | \
+ (((_min) & GHCB_MSR_VER_MIN_MASK) << GHCB_MSR_VER_MIN_POS) | \
+ (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) | \
+ GHCB_MSR_SEV_INFO_RESP)
+
extern unsigned int max_sev_asid;

static inline bool svm_sev_enabled(void)
--
2.28.0

2020-12-11 08:38:43

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 20/34] KVM: SVM: Add support for EFER write traps for an SEV-ES guest

From: Tom Lendacky <[email protected]>

For SEV-ES guests, the interception of EFER write access is not
recommended. EFER interception occurs prior to EFER being modified and
the hypervisor is unable to modify EFER itself because the register is
located in the encrypted register state.

SEV-ES support introduces a new EFER write trap. This trap provides
intercept support of an EFER write after it has been modified. The new
EFER value is provided in the VMCB EXITINFO1 field, allowing the
hypervisor to track the setting of the guest EFER.

Add support to track the value of the guest EFER value using the EFER
write trap so that the hypervisor understands the guest operating mode.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/uapi/asm/svm.h | 2 ++
arch/x86/kvm/svm/svm.c | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 09f723945425..6e3f92e17655 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -77,6 +77,7 @@
#define SVM_EXIT_MWAIT_COND 0x08c
#define SVM_EXIT_XSETBV 0x08d
#define SVM_EXIT_RDPRU 0x08e
+#define SVM_EXIT_EFER_WRITE_TRAP 0x08f
#define SVM_EXIT_INVPCID 0x0a2
#define SVM_EXIT_NPF 0x400
#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
@@ -184,6 +185,7 @@
{ SVM_EXIT_MONITOR, "monitor" }, \
{ SVM_EXIT_MWAIT, "mwait" }, \
{ SVM_EXIT_XSETBV, "xsetbv" }, \
+ { SVM_EXIT_EFER_WRITE_TRAP, "write_efer_trap" }, \
{ SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 32502c4b091d..3b61cc088b31 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2523,6 +2523,25 @@ static int cr8_write_interception(struct vcpu_svm *svm)
return 0;
}

+static int efer_trap(struct vcpu_svm *svm)
+{
+ struct msr_data msr_info;
+ int ret;
+
+ /*
+ * Clear the EFER_SVME bit from EFER. The SVM code always sets this
+ * bit in svm_set_efer(), but __kvm_valid_efer() checks it against
+ * whether the guest has X86_FEATURE_SVM - this avoids a failure if
+ * the guest doesn't have X86_FEATURE_SVM.
+ */
+ msr_info.host_initiated = false;
+ msr_info.index = MSR_EFER;
+ msr_info.data = svm->vmcb->control.exit_info_1 & ~EFER_SVME;
+ ret = kvm_set_msr_common(&svm->vcpu, &msr_info);
+
+ return kvm_complete_insn_gp(&svm->vcpu, ret);
+}
+
static int svm_get_msr_feature(struct kvm_msr_entry *msr)
{
msr->data = 0;
@@ -3031,6 +3050,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_MWAIT] = mwait_interception,
[SVM_EXIT_XSETBV] = xsetbv_interception,
[SVM_EXIT_RDPRU] = rdpru_interception,
+ [SVM_EXIT_EFER_WRITE_TRAP] = efer_trap,
[SVM_EXIT_INVPCID] = invpcid_interception,
[SVM_EXIT_NPF] = npf_interception,
[SVM_EXIT_RSM] = rsm_interception,
--
2.28.0

2020-12-11 08:40:59

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 32/34] KVM: SVM: Provide support for SEV-ES vCPU loading

From: Tom Lendacky <[email protected]>

An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES
hardware will restore certain registers on VMEXIT, but not save them on
VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the
following changes:

General vCPU load changes:
- During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
save the current values of XCR0, XSS and PKRU to the per-CPU SVM save
area as these registers will be restored on VMEXIT.

General vCPU put changes:
- Do not attempt to restore registers that SEV-ES hardware has already
restored on VMEXIT.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 10 ++++---
arch/x86/kvm/svm/sev.c | 54 ++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 36 ++++++++++++++++---------
arch/x86/kvm/svm/svm.h | 22 +++++++++++-----
arch/x86/kvm/x86.c | 3 ++-
arch/x86/kvm/x86.h | 1 +
6 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index a57331de59e2..1c561945b426 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -234,7 +234,8 @@ struct vmcb_save_area {
u8 cpl;
u8 reserved_2[4];
u64 efer;
- u8 reserved_3[112];
+ u8 reserved_3[104];
+ u64 xss; /* Valid for SEV-ES only */
u64 cr4;
u64 cr3;
u64 cr0;
@@ -265,9 +266,12 @@ struct vmcb_save_area {

/*
* The following part of the save area is valid only for
- * SEV-ES guests when referenced through the GHCB.
+ * SEV-ES guests when referenced through the GHCB or for
+ * saving to the host save area.
*/
- u8 reserved_7[104];
+ u8 reserved_7[80];
+ u32 pkru;
+ u8 reserved_7a[20];
u64 reserved_8; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e34d3a6dba80..225f18dbf522 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,12 +18,15 @@
#include <linux/trace_events.h>

#include <asm/trapnr.h>
+#include <asm/fpu/internal.h>

#include "x86.h"
#include "svm.h"
#include "cpuid.h"
#include "trace.h"

+#define __ex(x) __kvm_handle_fault_on_reboot(x)
+
static u8 sev_enc_bit;
static int sev_flush_asids(void);
static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1902,3 +1905,54 @@ void sev_es_create_vcpu(struct vcpu_svm *svm)
GHCB_VERSION_MIN,
sev_enc_bit));
}
+
+void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu)
+{
+ struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
+ struct vmcb_save_area *hostsa;
+ unsigned int i;
+
+ /*
+ * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
+ * of which one step is to perform a VMLOAD. Since hardware does not
+ * perform a VMSAVE on VMRUN, the host savearea must be updated.
+ */
+ asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
+
+ /*
+ * Certain MSRs are restored on VMEXIT, only save ones that aren't
+ * restored.
+ */
+ for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
+ if (host_save_user_msrs[i].sev_es_restored)
+ continue;
+
+ rdmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
+ }
+
+ /* XCR0 is restored on VMEXIT, save the current host value */
+ hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+ hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+
+ /* PKRU is restored on VMEXIT, save the curent host value */
+ hostsa->pkru = read_pkru();
+
+ /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
+ hostsa->xss = host_xss;
+}
+
+void sev_es_vcpu_put(struct vcpu_svm *svm)
+{
+ unsigned int i;
+
+ /*
+ * Certain MSRs are restored on VMEXIT and were saved with vmsave in
+ * sev_es_vcpu_load() above. Only restore ones that weren't.
+ */
+ for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
+ if (host_save_user_msrs[i].sev_es_restored)
+ continue;
+
+ wrmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 46dd28cd1ea6..8fcee4cf4a62 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1418,15 +1418,20 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmcb_mark_all_dirty(svm->vmcb);
}

+ if (sev_es_guest(svm->vcpu.kvm)) {
+ sev_es_vcpu_load(svm, cpu);
+ } else {
#ifdef CONFIG_X86_64
- rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
+ rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
#endif
- savesegment(fs, svm->host.fs);
- savesegment(gs, svm->host.gs);
- svm->host.ldt = kvm_read_ldt();
+ savesegment(fs, svm->host.fs);
+ savesegment(gs, svm->host.gs);
+ svm->host.ldt = kvm_read_ldt();

- for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
- rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
+ for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
+ rdmsrl(host_save_user_msrs[i].index,
+ svm->host_user_msrs[i]);
+ }

if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
@@ -1454,19 +1459,24 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
avic_vcpu_put(vcpu);

++vcpu->stat.host_state_reload;
- kvm_load_ldt(svm->host.ldt);
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ sev_es_vcpu_put(svm);
+ } else {
+ kvm_load_ldt(svm->host.ldt);
#ifdef CONFIG_X86_64
- loadsegment(fs, svm->host.fs);
- wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
- load_gs_index(svm->host.gs);
+ loadsegment(fs, svm->host.fs);
+ wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
+ load_gs_index(svm->host.gs);
#else
#ifdef CONFIG_X86_32_LAZY_GS
- loadsegment(gs, svm->host.gs);
+ loadsegment(gs, svm->host.gs);
#endif
#endif

- for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
- wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
+ for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
+ wrmsrl(host_save_user_msrs[i].index,
+ svm->host_user_msrs[i]);
+ }
}

static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1cf959cfcbc8..657a4fc0e41f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -23,15 +23,23 @@

#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)

-static const u32 host_save_user_msrs[] = {
+static const struct svm_host_save_msrs {
+ u32 index; /* Index of the MSR */
+ bool sev_es_restored; /* True if MSR is restored on SEV-ES VMEXIT */
+} host_save_user_msrs[] = {
#ifdef CONFIG_X86_64
- MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
- MSR_FS_BASE,
+ { .index = MSR_STAR, .sev_es_restored = true },
+ { .index = MSR_LSTAR, .sev_es_restored = true },
+ { .index = MSR_CSTAR, .sev_es_restored = true },
+ { .index = MSR_SYSCALL_MASK, .sev_es_restored = true },
+ { .index = MSR_KERNEL_GS_BASE, .sev_es_restored = true },
+ { .index = MSR_FS_BASE, .sev_es_restored = true },
#endif
- MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
- MSR_TSC_AUX,
+ { .index = MSR_IA32_SYSENTER_CS, .sev_es_restored = true },
+ { .index = MSR_IA32_SYSENTER_ESP, .sev_es_restored = true },
+ { .index = MSR_IA32_SYSENTER_EIP, .sev_es_restored = true },
+ { .index = MSR_TSC_AUX, .sev_es_restored = false },
};
-
#define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)

#define MAX_DIRECT_ACCESS_MSRS 18
@@ -583,5 +591,7 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_init_vmcb(struct vcpu_svm *svm);
void sev_es_create_vcpu(struct vcpu_svm *svm);
+void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
+void sev_es_vcpu_put(struct vcpu_svm *svm);

#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fd216b61a89..47cb63a2d079 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -197,7 +197,8 @@ EXPORT_SYMBOL_GPL(host_efer);
bool __read_mostly allow_smaller_maxphyaddr = 0;
EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);

-static u64 __read_mostly host_xss;
+u64 __read_mostly host_xss;
+EXPORT_SYMBOL_GPL(host_xss);
u64 __read_mostly supported_xss;
EXPORT_SYMBOL_GPL(supported_xss);

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 0e8fe766a4c5..c5d737a0a828 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -278,6 +278,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);

extern u64 host_xcr0;
extern u64 supported_xcr0;
+extern u64 host_xss;
extern u64 supported_xss;

static inline bool kvm_mpx_supported(void)
--
2.28.0

2020-12-11 15:05:45

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 25/34] KVM: SVM: Do not report support for SMM for an SEV-ES guest

From: Tom Lendacky <[email protected]>

SEV-ES guests do not currently support SMM. Update the has_emulated_msr()
kvm_x86_ops function to take a struct kvm parameter so that the capability
can be reported at a VM level.

Since this op is also called during KVM initialization and before a struct
kvm instance is available, comments will be added to each implementation
of has_emulated_msr() to indicate the kvm parameter can be null.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 11 ++++++++++-
arch/x86/kvm/vmx/vmx.c | 6 +++++-
arch/x86/kvm/x86.c | 4 ++--
4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 256869c9f37b..cecd0eca66c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1092,7 +1092,7 @@ struct kvm_x86_ops {
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
- bool (*has_emulated_msr)(u32 index);
+ bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);

unsigned int vm_size;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3fb1703f32f8..3e6d79593b8d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3934,12 +3934,21 @@ static bool svm_cpu_has_accelerated_tpr(void)
return false;
}

-static bool svm_has_emulated_msr(u32 index)
+/*
+ * The kvm parameter can be NULL (module initialization, or invocation before
+ * VM creation). Be sure to check the kvm parameter before using it.
+ */
+static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
{
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
return false;
+ case MSR_IA32_SMBASE:
+ /* SEV-ES guests do not support SMM, so report false */
+ if (kvm && sev_es_guest(kvm))
+ return false;
+ break;
default:
break;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c3441e7e5a87..a1ff4d7a310b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6399,7 +6399,11 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
handle_exception_nmi_irqoff(vmx);
}

-static bool vmx_has_emulated_msr(u32 index)
+/*
+ * The kvm parameter can be NULL (module initialization, or invocation before
+ * VM creation). Be sure to check the kvm parameter before using it.
+ */
+static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
{
switch (index) {
case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8665e7609040..53fe34fd1a7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3795,7 +3795,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
* fringe case that is not enabled except via specific settings
* of the module parameters.
*/
- r = kvm_x86_ops.has_emulated_msr(MSR_IA32_SMBASE);
+ r = kvm_x86_ops.has_emulated_msr(kvm, MSR_IA32_SMBASE);
break;
case KVM_CAP_VAPIC:
r = !kvm_x86_ops.cpu_has_accelerated_tpr();
@@ -5794,7 +5794,7 @@ static void kvm_init_msr_list(void)
}

for (i = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) {
- if (!kvm_x86_ops.has_emulated_msr(emulated_msrs_all[i]))
+ if (!kvm_x86_ops.has_emulated_msr(NULL, emulated_msrs_all[i]))
continue;

emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];
--
2.28.0

2020-12-11 15:05:53

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 24/34] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES

From: Tom Lendacky <[email protected]>

Since many of the registers used by the SEV-ES are encrypted and cannot
be read or written, adjust the __get_sregs() / __set_sregs() to take into
account whether the VMSA/guest state is encrypted.

For __get_sregs(), return the actual value that is in use by the guest
for all registers being tracked using the write trap support.

For __set_sregs(), skip setting of all guest registers values.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/x86.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c46da0d0f7f2..8665e7609040 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9446,6 +9446,9 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
{
struct desc_ptr dt;

+ if (vcpu->arch.guest_state_protected)
+ goto skip_protected_regs;
+
kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
@@ -9463,9 +9466,11 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
sregs->gdt.limit = dt.size;
sregs->gdt.base = dt.address;

- sregs->cr0 = kvm_read_cr0(vcpu);
sregs->cr2 = vcpu->arch.cr2;
sregs->cr3 = kvm_read_cr3(vcpu);
+
+skip_protected_regs:
+ sregs->cr0 = kvm_read_cr0(vcpu);
sregs->cr4 = kvm_read_cr4(vcpu);
sregs->cr8 = kvm_get_cr8(vcpu);
sregs->efer = vcpu->arch.efer;
@@ -9602,6 +9607,9 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
if (kvm_set_apic_base(vcpu, &apic_base_msr))
goto out;

+ if (vcpu->arch.guest_state_protected)
+ goto skip_protected_regs;
+
dt.size = sregs->idt.limit;
dt.address = sregs->idt.base;
kvm_x86_ops.set_idt(vcpu, &dt);
@@ -9636,14 +9644,6 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
if (mmu_reset_needed)
kvm_mmu_reset_context(vcpu);

- max_bits = KVM_NR_INTERRUPTS;
- pending_vec = find_first_bit(
- (const unsigned long *)sregs->interrupt_bitmap, max_bits);
- if (pending_vec < max_bits) {
- kvm_queue_interrupt(vcpu, pending_vec, false);
- pr_debug("Set back pending irq %d\n", pending_vec);
- }
-
kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
@@ -9662,6 +9662,15 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
!is_protmode(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;

+skip_protected_regs:
+ max_bits = KVM_NR_INTERRUPTS;
+ pending_vec = find_first_bit(
+ (const unsigned long *)sregs->interrupt_bitmap, max_bits);
+ if (pending_vec < max_bits) {
+ kvm_queue_interrupt(vcpu, pending_vec, false);
+ pr_debug("Set back pending irq %d\n", pending_vec);
+ }
+
kvm_make_request(KVM_REQ_EVENT, vcpu);

ret = 0;
--
2.28.0

2020-12-11 15:05:56

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 10/34] KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES

From: Tom Lendacky <[email protected]>

When a SHUTDOWN VMEXIT is encountered, normally the VMCB is re-initialized
so that the guest can be re-launched. But when a guest is running as an
SEV-ES guest, the VMSA cannot be re-initialized because it has been
encrypted. For now, just return -EINVAL to prevent a possible attempt at
a guest reset.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 81572899b7ea..3b02620ba9a9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2030,6 +2030,13 @@ static int shutdown_interception(struct vcpu_svm *svm)
{
struct kvm_run *kvm_run = svm->vcpu.run;

+ /*
+ * The VM save area has already been encrypted so it
+ * cannot be reinitialized - just terminate.
+ */
+ if (sev_es_guest(svm->vcpu.kvm))
+ return -EINVAL;
+
/*
* VMCB is undefined after a SHUTDOWN intercept
* so reinitialize it.
--
2.28.0

2020-12-11 15:06:06

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 18/34] KVM: SVM: Support MMIO for an SEV-ES guest

From: Tom Lendacky <[email protected]>

For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page
so that both the hypervisor and guest can read or write to it and each
see the contents.

The GHCB specification provides software-defined VMGEXIT exit codes to
indicate a request for an MMIO read or an MMIO write. Add support to
recognize the MMIO requests and invoke SEV-ES specific routines that
can complete the MMIO operation. These routines use common KVM support
to complete the MMIO operation.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 124 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 6 ++
arch/x86/kvm/x86.c | 123 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.h | 5 ++
4 files changed, 258 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2e2548fa369b..63f20be4bc69 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1262,6 +1262,9 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
if (vcpu->arch.guest_state_protected)
sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
__free_page(virt_to_page(svm->vmsa));
+
+ if (svm->ghcb_sa_free)
+ kfree(svm->ghcb_sa);
}

static void dump_ghcb(struct vcpu_svm *svm)
@@ -1436,6 +1439,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
!ghcb_rcx_is_valid(ghcb))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_MMIO_READ:
+ case SVM_VMGEXIT_MMIO_WRITE:
+ if (!ghcb_sw_scratch_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
break;
default:
@@ -1470,6 +1478,24 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
if (!svm->ghcb)
return;

+ if (svm->ghcb_sa_free) {
+ /*
+ * The scratch area lives outside the GHCB, so there is a
+ * buffer that, depending on the operation performed, may
+ * need to be synced, then freed.
+ */
+ if (svm->ghcb_sa_sync) {
+ kvm_write_guest(svm->vcpu.kvm,
+ ghcb_get_sw_scratch(svm->ghcb),
+ svm->ghcb_sa, svm->ghcb_sa_len);
+ svm->ghcb_sa_sync = false;
+ }
+
+ kfree(svm->ghcb_sa);
+ svm->ghcb_sa = NULL;
+ svm->ghcb_sa_free = false;
+ }
+
trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);

sev_es_sync_to_ghcb(svm);
@@ -1504,6 +1530,86 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
}

+#define GHCB_SCRATCH_AREA_LIMIT (16ULL * PAGE_SIZE)
+static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
+{
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ struct ghcb *ghcb = svm->ghcb;
+ u64 ghcb_scratch_beg, ghcb_scratch_end;
+ u64 scratch_gpa_beg, scratch_gpa_end;
+ void *scratch_va;
+
+ scratch_gpa_beg = ghcb_get_sw_scratch(ghcb);
+ if (!scratch_gpa_beg) {
+ pr_err("vmgexit: scratch gpa not provided\n");
+ return false;
+ }
+
+ scratch_gpa_end = scratch_gpa_beg + len;
+ if (scratch_gpa_end < scratch_gpa_beg) {
+ pr_err("vmgexit: scratch length (%#llx) not valid for scratch address (%#llx)\n",
+ len, scratch_gpa_beg);
+ return false;
+ }
+
+ if ((scratch_gpa_beg & PAGE_MASK) == control->ghcb_gpa) {
+ /* Scratch area begins within GHCB */
+ ghcb_scratch_beg = control->ghcb_gpa +
+ offsetof(struct ghcb, shared_buffer);
+ ghcb_scratch_end = control->ghcb_gpa +
+ offsetof(struct ghcb, reserved_1);
+
+ /*
+ * If the scratch area begins within the GHCB, it must be
+ * completely contained in the GHCB shared buffer area.
+ */
+ if (scratch_gpa_beg < ghcb_scratch_beg ||
+ scratch_gpa_end > ghcb_scratch_end) {
+ pr_err("vmgexit: scratch area is outside of GHCB shared buffer area (%#llx - %#llx)\n",
+ scratch_gpa_beg, scratch_gpa_end);
+ return false;
+ }
+
+ scratch_va = (void *)svm->ghcb;
+ scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
+ } else {
+ /*
+ * The guest memory must be read into a kernel buffer, so
+ * limit the size
+ */
+ if (len > GHCB_SCRATCH_AREA_LIMIT) {
+ pr_err("vmgexit: scratch area exceeds KVM limits (%#llx requested, %#llx limit)\n",
+ len, GHCB_SCRATCH_AREA_LIMIT);
+ return false;
+ }
+ scratch_va = kzalloc(len, GFP_KERNEL);
+ if (!scratch_va)
+ return false;
+
+ if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
+ /* Unable to copy scratch area from guest */
+ pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+
+ kfree(scratch_va);
+ return false;
+ }
+
+ /*
+ * The scratch area is outside the GHCB. The operation will
+ * dictate whether the buffer needs to be synced before running
+ * the vCPU next time (i.e. a read was requested so the data
+ * must be written back to the guest memory).
+ */
+ svm->ghcb_sa_sync = sync;
+ svm->ghcb_sa_free = true;
+ }
+
+ svm->ghcb_sa = scratch_va;
+ svm->ghcb_sa_len = len;
+
+ return true;
+}
+
static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
unsigned int pos)
{
@@ -1641,6 +1747,24 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)

ret = -EINVAL;
switch (exit_code) {
+ case SVM_VMGEXIT_MMIO_READ:
+ if (!setup_vmgexit_scratch(svm, true, control->exit_info_2))
+ break;
+
+ ret = kvm_sev_es_mmio_read(&svm->vcpu,
+ control->exit_info_1,
+ control->exit_info_2,
+ svm->ghcb_sa);
+ break;
+ case SVM_VMGEXIT_MMIO_WRITE:
+ if (!setup_vmgexit_scratch(svm, false, control->exit_info_2))
+ break;
+
+ ret = kvm_sev_es_mmio_write(&svm->vcpu,
+ control->exit_info_1,
+ control->exit_info_2,
+ svm->ghcb_sa);
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(&svm->vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index fc69bc2e0cad..9019ad6a8138 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -174,6 +174,12 @@ struct vcpu_svm {
struct vmcb_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
+
+ /* SEV-ES scratch area support */
+ void *ghcb_sa;
+ u64 ghcb_sa_len;
+ bool ghcb_sa_sync;
+ bool ghcb_sa_free;
};

struct svm_cpu_data {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ba26b62e0262..78e8c8b36f9b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11299,6 +11299,129 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
}
EXPORT_SYMBOL_GPL(kvm_handle_invpcid);

+static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
+{
+ struct kvm_run *run = vcpu->run;
+ struct kvm_mmio_fragment *frag;
+ unsigned int len;
+
+ BUG_ON(!vcpu->mmio_needed);
+
+ /* Complete previous fragment */
+ frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment];
+ len = min(8u, frag->len);
+ if (!vcpu->mmio_is_write)
+ memcpy(frag->data, run->mmio.data, len);
+
+ if (frag->len <= 8) {
+ /* Switch to the next fragment. */
+ frag++;
+ vcpu->mmio_cur_fragment++;
+ } else {
+ /* Go forward to the next mmio piece. */
+ frag->data += len;
+ frag->gpa += len;
+ frag->len -= len;
+ }
+
+ if (vcpu->mmio_cur_fragment >= vcpu->mmio_nr_fragments) {
+ vcpu->mmio_needed = 0;
+
+ // VMG change, at this point, we're always done
+ // RIP has already been advanced
+ return 1;
+ }
+
+ // More MMIO is needed
+ run->mmio.phys_addr = frag->gpa;
+ run->mmio.len = min(8u, frag->len);
+ run->mmio.is_write = vcpu->mmio_is_write;
+ if (run->mmio.is_write)
+ memcpy(run->mmio.data, frag->data, min(8u, frag->len));
+ run->exit_reason = KVM_EXIT_MMIO;
+
+ vcpu->arch.complete_userspace_io = complete_sev_es_emulated_mmio;
+
+ return 0;
+}
+
+int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
+ void *data)
+{
+ int handled;
+ struct kvm_mmio_fragment *frag;
+
+ if (!data)
+ return -EINVAL;
+
+ handled = write_emultor.read_write_mmio(vcpu, gpa, bytes, data);
+ if (handled == bytes)
+ return 1;
+
+ bytes -= handled;
+ gpa += handled;
+ data += handled;
+
+ /*TODO: Check if need to increment number of frags */
+ frag = vcpu->mmio_fragments;
+ vcpu->mmio_nr_fragments = 1;
+ frag->len = bytes;
+ frag->gpa = gpa;
+ frag->data = data;
+
+ vcpu->mmio_needed = 1;
+ vcpu->mmio_cur_fragment = 0;
+
+ vcpu->run->mmio.phys_addr = gpa;
+ vcpu->run->mmio.len = min(8u, frag->len);
+ vcpu->run->mmio.is_write = 1;
+ memcpy(vcpu->run->mmio.data, frag->data, min(8u, frag->len));
+ vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+ vcpu->arch.complete_userspace_io = complete_sev_es_emulated_mmio;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_write);
+
+int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
+ void *data)
+{
+ int handled;
+ struct kvm_mmio_fragment *frag;
+
+ if (!data)
+ return -EINVAL;
+
+ handled = read_emultor.read_write_mmio(vcpu, gpa, bytes, data);
+ if (handled == bytes)
+ return 1;
+
+ bytes -= handled;
+ gpa += handled;
+ data += handled;
+
+ /*TODO: Check if need to increment number of frags */
+ frag = vcpu->mmio_fragments;
+ vcpu->mmio_nr_fragments = 1;
+ frag->len = bytes;
+ frag->gpa = gpa;
+ frag->data = data;
+
+ vcpu->mmio_needed = 1;
+ vcpu->mmio_cur_fragment = 0;
+
+ vcpu->run->mmio.phys_addr = gpa;
+ vcpu->run->mmio.len = min(8u, frag->len);
+ vcpu->run->mmio.is_write = 0;
+ vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+ vcpu->arch.complete_userspace_io = complete_sev_es_emulated_mmio;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_read);
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 764c967a1993..804369fe45e3 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -407,4 +407,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
__reserved_bits; \
})

+int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
+ void *dst);
+int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
+ void *dst);
+
#endif
--
2.28.0

2020-12-11 15:06:07

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 12/34] KVM: SVM: Add initial support for a VMGEXIT VMEXIT

From: Tom Lendacky <[email protected]>

SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a
VMGEXIT includes mapping the GHCB based on the guest GPA, which is
obtained from a new VMCB field, and then validating the required inputs
for the VMGEXIT exit reason.

Since many of the VMGEXIT exit reasons correspond to existing VMEXIT
reasons, the information from the GHCB is copied into the VMCB control
exit code areas and KVM register areas. The standard exit handlers are
invoked, similar to standard VMEXIT processing. Before restarting the
vCPU, the GHCB is updated with any registers that have been updated by
the hypervisor.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 2 +-
arch/x86/include/uapi/asm/svm.h | 7 +
arch/x86/kvm/svm/sev.c | 272 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 8 +-
arch/x86/kvm/svm/svm.h | 8 +
5 files changed, 294 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index bce28482d63d..caa8628f5fba 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -130,7 +130,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 exit_int_info_err;
u64 nested_ctl;
u64 avic_vapic_bar;
- u8 reserved_4[8];
+ u64 ghcb_gpa;
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index f1d8307454e0..09f723945425 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -81,6 +81,7 @@
#define SVM_EXIT_NPF 0x400
#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402
+#define SVM_EXIT_VMGEXIT 0x403

/* SEV-ES software-defined VMGEXIT events */
#define SVM_VMGEXIT_MMIO_READ 0x80000001
@@ -187,6 +188,12 @@
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \
+ { SVM_EXIT_VMGEXIT, "vmgexit" }, \
+ { SVM_VMGEXIT_MMIO_READ, "vmgexit_mmio_read" }, \
+ { SVM_VMGEXIT_MMIO_WRITE, "vmgexit_mmio_write" }, \
+ { SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
+ { SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
+ { SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index fb4a411f7550..54e6894b26d2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,6 +18,7 @@

#include "x86.h"
#include "svm.h"
+#include "cpuid.h"

static int sev_flush_asids(void);
static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1257,11 +1258,226 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
__free_page(virt_to_page(svm->vmsa));
}

+static void dump_ghcb(struct vcpu_svm *svm)
+{
+ struct ghcb *ghcb = svm->ghcb;
+ unsigned int nbits;
+
+ /* Re-use the dump_invalid_vmcb module parameter */
+ if (!dump_invalid_vmcb) {
+ pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
+ return;
+ }
+
+ nbits = sizeof(ghcb->save.valid_bitmap) * 8;
+
+ pr_err("GHCB (GPA=%016llx):\n", svm->vmcb->control.ghcb_gpa);
+ pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code",
+ ghcb->save.sw_exit_code, ghcb_sw_exit_code_is_valid(ghcb));
+ pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1",
+ ghcb->save.sw_exit_info_1, ghcb_sw_exit_info_1_is_valid(ghcb));
+ pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2",
+ ghcb->save.sw_exit_info_2, ghcb_sw_exit_info_2_is_valid(ghcb));
+ pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
+ ghcb->save.sw_scratch, ghcb_sw_scratch_is_valid(ghcb));
+ pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
+}
+
+static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct ghcb *ghcb = svm->ghcb;
+
+ /*
+ * The GHCB protocol so far allows for the following data
+ * to be returned:
+ * GPRs RAX, RBX, RCX, RDX
+ *
+ * Copy their values to the GHCB if they are dirty.
+ */
+ if (kvm_register_is_dirty(vcpu, VCPU_REGS_RAX))
+ ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]);
+ if (kvm_register_is_dirty(vcpu, VCPU_REGS_RBX))
+ ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
+ if (kvm_register_is_dirty(vcpu, VCPU_REGS_RCX))
+ ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
+ if (kvm_register_is_dirty(vcpu, VCPU_REGS_RDX))
+ ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
+}
+
+static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
+{
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct ghcb *ghcb = svm->ghcb;
+ u64 exit_code;
+
+ /*
+ * The GHCB protocol so far allows for the following data
+ * to be supplied:
+ * GPRs RAX, RBX, RCX, RDX
+ * XCR0
+ * CPL
+ *
+ * VMMCALL allows the guest to provide extra registers. KVM also
+ * expects RSI for hypercalls, so include that, too.
+ *
+ * Copy their values to the appropriate location if supplied.
+ */
+ memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs));
+
+ vcpu->arch.regs[VCPU_REGS_RAX] = ghcb_get_rax_if_valid(ghcb);
+ vcpu->arch.regs[VCPU_REGS_RBX] = ghcb_get_rbx_if_valid(ghcb);
+ vcpu->arch.regs[VCPU_REGS_RCX] = ghcb_get_rcx_if_valid(ghcb);
+ vcpu->arch.regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb);
+ vcpu->arch.regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb);
+
+ svm->vmcb->save.cpl = ghcb_get_cpl_if_valid(ghcb);
+
+ if (ghcb_xcr0_is_valid(ghcb)) {
+ vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+ kvm_update_cpuid_runtime(vcpu);
+ }
+
+ /* Copy the GHCB exit information into the VMCB fields */
+ exit_code = ghcb_get_sw_exit_code(ghcb);
+ control->exit_code = lower_32_bits(exit_code);
+ control->exit_code_hi = upper_32_bits(exit_code);
+ control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
+ control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
+
+ /* Clear the valid entries fields */
+ memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
+}
+
+static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu;
+ struct ghcb *ghcb;
+ u64 exit_code = 0;
+
+ ghcb = svm->ghcb;
+
+ /* Only GHCB Usage code 0 is supported */
+ if (ghcb->ghcb_usage)
+ goto vmgexit_err;
+
+ /*
+ * Retrieve the exit code now even though is may not be marked valid
+ * as it could help with debugging.
+ */
+ exit_code = ghcb_get_sw_exit_code(ghcb);
+
+ if (!ghcb_sw_exit_code_is_valid(ghcb) ||
+ !ghcb_sw_exit_info_1_is_valid(ghcb) ||
+ !ghcb_sw_exit_info_2_is_valid(ghcb))
+ goto vmgexit_err;
+
+ switch (ghcb_get_sw_exit_code(ghcb)) {
+ case SVM_EXIT_READ_DR7:
+ break;
+ case SVM_EXIT_WRITE_DR7:
+ if (!ghcb_rax_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_EXIT_RDTSC:
+ break;
+ case SVM_EXIT_RDPMC:
+ if (!ghcb_rcx_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_EXIT_CPUID:
+ if (!ghcb_rax_is_valid(ghcb) ||
+ !ghcb_rcx_is_valid(ghcb))
+ goto vmgexit_err;
+ if (ghcb_get_rax(ghcb) == 0xd)
+ if (!ghcb_xcr0_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_EXIT_INVD:
+ break;
+ case SVM_EXIT_IOIO:
+ if (!(ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_TYPE_MASK))
+ if (!ghcb_rax_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_EXIT_MSR:
+ if (!ghcb_rcx_is_valid(ghcb))
+ goto vmgexit_err;
+ if (ghcb_get_sw_exit_info_1(ghcb)) {
+ if (!ghcb_rax_is_valid(ghcb) ||
+ !ghcb_rdx_is_valid(ghcb))
+ goto vmgexit_err;
+ }
+ break;
+ case SVM_EXIT_VMMCALL:
+ if (!ghcb_rax_is_valid(ghcb) ||
+ !ghcb_cpl_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_EXIT_RDTSCP:
+ break;
+ case SVM_EXIT_WBINVD:
+ break;
+ case SVM_EXIT_MONITOR:
+ if (!ghcb_rax_is_valid(ghcb) ||
+ !ghcb_rcx_is_valid(ghcb) ||
+ !ghcb_rdx_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_EXIT_MWAIT:
+ if (!ghcb_rax_is_valid(ghcb) ||
+ !ghcb_rcx_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
+ case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ break;
+ default:
+ goto vmgexit_err;
+ }
+
+ return 0;
+
+vmgexit_err:
+ vcpu = &svm->vcpu;
+
+ if (ghcb->ghcb_usage) {
+ vcpu_unimpl(vcpu, "vmgexit: ghcb usage %#x is not valid\n",
+ ghcb->ghcb_usage);
+ } else {
+ vcpu_unimpl(vcpu, "vmgexit: exit reason %#llx is not valid\n",
+ exit_code);
+ dump_ghcb(svm);
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
+ vcpu->run->internal.ndata = 2;
+ vcpu->run->internal.data[0] = exit_code;
+ vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
+
+ return -EINVAL;
+}
+
+static void pre_sev_es_run(struct vcpu_svm *svm)
+{
+ if (!svm->ghcb)
+ return;
+
+ sev_es_sync_to_ghcb(svm);
+
+ kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
+ svm->ghcb = NULL;
+}
+
void pre_sev_run(struct vcpu_svm *svm, int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int asid = sev_get_asid(svm->vcpu.kvm);

+ /* Perform any SEV-ES pre-run actions */
+ pre_sev_es_run(svm);
+
/* Assign the asid allocated with this SEV guest */
svm->vmcb->control.asid = asid;

@@ -1279,3 +1495,59 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
}
+
+static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
+{
+ return -EINVAL;
+}
+
+int sev_handle_vmgexit(struct vcpu_svm *svm)
+{
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ u64 ghcb_gpa, exit_code;
+ struct ghcb *ghcb;
+ int ret;
+
+ /* Validate the GHCB */
+ ghcb_gpa = control->ghcb_gpa;
+ if (ghcb_gpa & GHCB_MSR_INFO_MASK)
+ return sev_handle_vmgexit_msr_protocol(svm);
+
+ if (!ghcb_gpa) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB gpa is not set\n");
+ return -EINVAL;
+ }
+
+ if (kvm_vcpu_map(&svm->vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->ghcb_map)) {
+ /* Unable to map GHCB from guest */
+ vcpu_unimpl(&svm->vcpu, "vmgexit: error mapping GHCB [%#llx] from guest\n",
+ ghcb_gpa);
+ return -EINVAL;
+ }
+
+ svm->ghcb = svm->ghcb_map.hva;
+ ghcb = svm->ghcb_map.hva;
+
+ exit_code = ghcb_get_sw_exit_code(ghcb);
+
+ ret = sev_es_validate_vmgexit(svm);
+ if (ret)
+ return ret;
+
+ sev_es_sync_from_ghcb(svm);
+ ghcb_set_sw_exit_info_1(ghcb, 0);
+ ghcb_set_sw_exit_info_2(ghcb, 0);
+
+ ret = -EINVAL;
+ switch (exit_code) {
+ case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ vcpu_unimpl(&svm->vcpu,
+ "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
+ control->exit_info_1, control->exit_info_2);
+ break;
+ default:
+ ret = svm_invoke_exit_handler(svm, exit_code);
+ }
+
+ return ret;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ce7bcb9cf90c..ad1ec6ad558e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -195,7 +195,7 @@ module_param(sev, int, 0444);
int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
module_param(sev_es, int, 0444);

-static bool __read_mostly dump_invalid_vmcb = 0;
+bool __read_mostly dump_invalid_vmcb;
module_param(dump_invalid_vmcb, bool, 0644);

static u8 rsm_ins_bytes[] = "\x0f\xaa";
@@ -3031,6 +3031,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_RSM] = rsm_interception,
[SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception,
[SVM_EXIT_AVIC_UNACCELERATED_ACCESS] = avic_unaccelerated_access_interception,
+ [SVM_EXIT_VMGEXIT] = sev_handle_vmgexit,
};

static void dump_vmcb(struct kvm_vcpu *vcpu)
@@ -3072,6 +3073,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
+ pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
pr_err("%-20s%lld\n", "virt_ext:", control->virt_ext);
@@ -3168,7 +3170,7 @@ static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
return -EINVAL;
}

-static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
+int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
{
if (svm_handle_invalid_exit(&svm->vcpu, exit_code))
return 0;
@@ -3184,6 +3186,8 @@ static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
return halt_interception(svm);
else if (exit_code == SVM_EXIT_NPF)
return npf_interception(svm);
+ else if (exit_code == SVM_EXIT_VMGEXIT)
+ return sev_handle_vmgexit(svm);
#endif
return svm_exit_handlers[exit_code](svm);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index abfe53d6b3dc..89bcb26977e5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -17,6 +17,7 @@

#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
+#include <linux/bits.h>

#include <asm/svm.h>

@@ -172,6 +173,7 @@ struct vcpu_svm {
/* SEV-ES support */
struct vmcb_save_area *vmsa;
struct ghcb *ghcb;
+ struct kvm_host_map ghcb_map;
};

struct svm_cpu_data {
@@ -390,6 +392,7 @@ static inline bool gif_set(struct vcpu_svm *svm)

extern int sev;
extern int sev_es;
+extern bool dump_invalid_vmcb;

u32 svm_msrpm_offset(u32 msr);
u32 *svm_vcpu_alloc_msrpm(void);
@@ -405,6 +408,7 @@ bool svm_smi_blocked(struct kvm_vcpu *vcpu);
bool svm_nmi_blocked(struct kvm_vcpu *vcpu);
bool svm_interrupt_blocked(struct kvm_vcpu *vcpu);
void svm_set_gif(struct vcpu_svm *svm, bool value);
+int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code);

/* nested.c */

@@ -510,6 +514,9 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);

/* sev.c */

+#define GHCB_MSR_INFO_POS 0
+#define GHCB_MSR_INFO_MASK (BIT_ULL(12) - 1)
+
extern unsigned int max_sev_asid;

static inline bool svm_sev_enabled(void)
@@ -527,5 +534,6 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu);
void __init sev_hardware_setup(void);
void sev_hardware_teardown(void);
void sev_free_vcpu(struct kvm_vcpu *vcpu);
+int sev_handle_vmgexit(struct vcpu_svm *svm);

#endif
--
2.28.0

2020-12-11 15:10:58

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 16/34] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100

From: Tom Lendacky <[email protected]>

The GHCB specification defines a GHCB MSR protocol using the lower
12-bits of the GHCB MSR (in the hypervisor this corresponds to the
GHCB GPA field in the VMCB).

Function 0x100 is a request for termination of the guest. The guest has
encountered some situation for which it has requested to be terminated.
The GHCB MSR value contains the reason for the request.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 13 +++++++++++++
arch/x86/kvm/svm/svm.h | 6 ++++++
2 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 53bf3ff1d9cc..c2cc38e7400b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1574,6 +1574,19 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_TERM_REQ: {
+ u64 reason_set, reason_code;
+
+ reason_set = get_ghcb_msr_bits(svm,
+ GHCB_MSR_TERM_REASON_SET_MASK,
+ GHCB_MSR_TERM_REASON_SET_POS);
+ reason_code = get_ghcb_msr_bits(svm,
+ GHCB_MSR_TERM_REASON_MASK,
+ GHCB_MSR_TERM_REASON_POS);
+ pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
+ reason_set, reason_code);
+ fallthrough;
+ }
default:
ret = -EINVAL;
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9dd8429f2b27..fc69bc2e0cad 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -543,6 +543,12 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
#define GHCB_MSR_CPUID_REG_POS 30
#define GHCB_MSR_CPUID_REG_MASK 0x3

+#define GHCB_MSR_TERM_REQ 0x100
+#define GHCB_MSR_TERM_REASON_SET_POS 12
+#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
+#define GHCB_MSR_TERM_REASON_POS 16
+#define GHCB_MSR_TERM_REASON_MASK 0xff
+
extern unsigned int max_sev_asid;

static inline bool svm_sev_enabled(void)
--
2.28.0

2020-12-11 20:29:27

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 09/34] KVM: SVM: Do not allow instruction emulation under SEV-ES

From: Tom Lendacky <[email protected]>

When a guest is running as an SEV-ES guest, it is not possible to emulate
instructions. Add support to prevent instruction emulation.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 513cf667dff4..81572899b7ea 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4211,6 +4211,12 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int i
bool smep, smap, is_user;
unsigned long cr4;

+ /*
+ * When the guest is an SEV-ES guest, emulation is not possible.
+ */
+ if (sev_es_guest(vcpu->kvm))
+ return false;
+
/*
* Detect and workaround Errata 1096 Fam_17h_00_0Fh.
*
--
2.28.0

2020-12-11 20:29:27

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 05/34] KVM: SVM: Add support for the SEV-ES VMSA

From: Tom Lendacky <[email protected]>

Allocate a page during vCPU creation to be used as the encrypted VM save
area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch
structure that indicates whether the guest state is protected.

When freeing a VMSA page that has been encrypted, the cache contents must
be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page.

[ i386 build warnings ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++
arch/x86/kvm/svm/sev.c | 67 +++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 24 +++++++++++-
arch/x86/kvm/svm/svm.h | 5 +++
4 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f002cdb13a0b..8cf6b0493d49 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,6 +805,9 @@ struct kvm_vcpu_arch {
*/
bool enforce;
} pv_cpuid;
+
+ /* Protected Guests */
+ bool guest_state_protected;
};

struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9bf5e9dadff5..fb4a411f7550 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,6 +14,7 @@
#include <linux/psp-sev.h>
#include <linux/pagemap.h>
#include <linux/swap.h>
+#include <linux/processor.h>

#include "x86.h"
#include "svm.h"
@@ -1190,6 +1191,72 @@ void sev_hardware_teardown(void)
sev_flush_asids();
}

+/*
+ * Pages used by hardware to hold guest encrypted state must be flushed before
+ * returning them to the system.
+ */
+static void sev_flush_guest_memory(struct vcpu_svm *svm, void *va,
+ unsigned long len)
+{
+ /*
+ * If hardware enforced cache coherency for encrypted mappings of the
+ * same physical page is supported, nothing to do.
+ */
+ if (boot_cpu_has(X86_FEATURE_SME_COHERENT))
+ return;
+
+ /*
+ * If the VM Page Flush MSR is supported, use it to flush the page
+ * (using the page virtual address and the guest ASID).
+ */
+ if (boot_cpu_has(X86_FEATURE_VM_PAGE_FLUSH)) {
+ struct kvm_sev_info *sev;
+ unsigned long va_start;
+ u64 start, stop;
+
+ /* Align start and stop to page boundaries. */
+ va_start = (unsigned long)va;
+ start = (u64)va_start & PAGE_MASK;
+ stop = PAGE_ALIGN((u64)va_start + len);
+
+ if (start < stop) {
+ sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+ while (start < stop) {
+ wrmsrl(MSR_AMD64_VM_PAGE_FLUSH,
+ start | sev->asid);
+
+ start += PAGE_SIZE;
+ }
+
+ return;
+ }
+
+ WARN(1, "Address overflow, using WBINVD\n");
+ }
+
+ /*
+ * Hardware should always have one of the above features,
+ * but if not, use WBINVD and issue a warning.
+ */
+ WARN_ONCE(1, "Using WBINVD to flush guest memory\n");
+ wbinvd_on_all_cpus();
+}
+
+void sev_free_vcpu(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm;
+
+ if (!sev_es_guest(vcpu->kvm))
+ return;
+
+ svm = to_svm(vcpu);
+
+ if (vcpu->arch.guest_state_protected)
+ sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+ __free_page(virt_to_page(svm->vmsa));
+}
+
void pre_sev_run(struct vcpu_svm *svm, int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a1ea30c98629..cd4c9884e5a8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1289,6 +1289,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm;
struct page *vmcb_page;
+ struct page *vmsa_page = NULL;
int err;

BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1299,9 +1300,19 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
if (!vmcb_page)
goto out;

+ if (sev_es_guest(svm->vcpu.kvm)) {
+ /*
+ * SEV-ES guests require a separate VMSA page used to contain
+ * the encrypted register state of the guest.
+ */
+ vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!vmsa_page)
+ goto error_free_vmcb_page;
+ }
+
err = avic_init_vcpu(svm);
if (err)
- goto error_free_vmcb_page;
+ goto error_free_vmsa_page;

/* We initialize this flag to true to make sure that the is_running
* bit would be set the first time the vcpu is loaded.
@@ -1311,12 +1322,16 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)

svm->msrpm = svm_vcpu_alloc_msrpm();
if (!svm->msrpm)
- goto error_free_vmcb_page;
+ goto error_free_vmsa_page;

svm_vcpu_init_msrpm(vcpu, svm->msrpm);

svm->vmcb = page_address(vmcb_page);
svm->vmcb_pa = __sme_set(page_to_pfn(vmcb_page) << PAGE_SHIFT);
+
+ if (vmsa_page)
+ svm->vmsa = page_address(vmsa_page);
+
svm->asid_generation = 0;
init_vmcb(svm);

@@ -1325,6 +1340,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)

return 0;

+error_free_vmsa_page:
+ if (vmsa_page)
+ __free_page(vmsa_page);
error_free_vmcb_page:
__free_page(vmcb_page);
out:
@@ -1352,6 +1370,8 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)

svm_free_nested(svm);

+ sev_free_vcpu(vcpu);
+
__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 56d950df82e5..80a359f3cf20 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -168,6 +168,10 @@ struct vcpu_svm {
DECLARE_BITMAP(read, MAX_DIRECT_ACCESS_MSRS);
DECLARE_BITMAP(write, MAX_DIRECT_ACCESS_MSRS);
} shadow_msr_intercept;
+
+ /* SEV-ES support */
+ struct vmcb_save_area *vmsa;
+ struct ghcb *ghcb;
};

struct svm_cpu_data {
@@ -513,5 +517,6 @@ int svm_unregister_enc_region(struct kvm *kvm,
void pre_sev_run(struct vcpu_svm *svm, int cpu);
void __init sev_hardware_setup(void);
void sev_hardware_teardown(void);
+void sev_free_vcpu(struct kvm_vcpu *vcpu);

#endif
--
2.28.0

2020-12-11 20:29:59

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 23/34] KVM: SVM: Add support for CR8 write traps for an SEV-ES guest

From: Tom Lendacky <[email protected]>

For SEV-ES guests, the interception of control register write access
is not recommended. Control register interception occurs prior to the
control register being modified and the hypervisor is unable to modify
the control register itself because the register is located in the
encrypted register state.

SEV-ES guests introduce new control register write traps. These traps
provide intercept support of a control register write after the control
register has been modified. The new control register value is provided in
the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
of the guest control registers.

Add support to track the value of the guest CR8 register using the control
register write trap so that the hypervisor understands the guest operating
mode.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/uapi/asm/svm.h | 1 +
arch/x86/kvm/svm/svm.c | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index c4152689ea93..554f75fe013c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -204,6 +204,7 @@
{ SVM_EXIT_EFER_WRITE_TRAP, "write_efer_trap" }, \
{ SVM_EXIT_CR0_WRITE_TRAP, "write_cr0_trap" }, \
{ SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \
+ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \
{ SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e15e9e15defd..3fb1703f32f8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2475,6 +2475,7 @@ static int cr_trap(struct vcpu_svm *svm)
struct kvm_vcpu *vcpu = &svm->vcpu;
unsigned long old_value, new_value;
unsigned int cr;
+ int ret = 0;

new_value = (unsigned long)svm->vmcb->control.exit_info_1;

@@ -2492,13 +2493,16 @@ static int cr_trap(struct vcpu_svm *svm)

kvm_post_set_cr4(vcpu, old_value, new_value);
break;
+ case 8:
+ ret = kvm_set_cr8(&svm->vcpu, new_value);
+ break;
default:
WARN(1, "unhandled CR%d write trap", cr);
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}

- return kvm_complete_insn_gp(vcpu, 0);
+ return kvm_complete_insn_gp(vcpu, ret);
}

static int dr_interception(struct vcpu_svm *svm)
@@ -3084,6 +3088,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_EFER_WRITE_TRAP] = efer_trap,
[SVM_EXIT_CR0_WRITE_TRAP] = cr_trap,
[SVM_EXIT_CR4_WRITE_TRAP] = cr_trap,
+ [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap,
[SVM_EXIT_INVPCID] = invpcid_interception,
[SVM_EXIT_NPF] = npf_interception,
[SVM_EXIT_RSM] = rsm_interception,
--
2.28.0

2020-12-11 20:30:31

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 22/34] KVM: SVM: Add support for CR4 write traps for an SEV-ES guest

From: Tom Lendacky <[email protected]>

For SEV-ES guests, the interception of control register write access
is not recommended. Control register interception occurs prior to the
control register being modified and the hypervisor is unable to modify
the control register itself because the register is located in the
encrypted register state.

SEV-ES guests introduce new control register write traps. These traps
provide intercept support of a control register write after the control
register has been modified. The new control register value is provided in
the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
of the guest control registers.

Add support to track the value of the guest CR4 register using the control
register write trap so that the hypervisor understands the guest operating
mode.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/svm.h | 1 +
arch/x86/kvm/svm/svm.c | 7 +++++++
arch/x86/kvm/x86.c | 16 ++++++++++++----
4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2714ae0adeab..256869c9f37b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1477,6 +1477,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);

void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
+void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 14b0d97b50e2..c4152689ea93 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -203,6 +203,7 @@
{ SVM_EXIT_XSETBV, "xsetbv" }, \
{ SVM_EXIT_EFER_WRITE_TRAP, "write_efer_trap" }, \
{ SVM_EXIT_CR0_WRITE_TRAP, "write_cr0_trap" }, \
+ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \
{ SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e35050eafe3a..e15e9e15defd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2486,6 +2486,12 @@ static int cr_trap(struct vcpu_svm *svm)

kvm_post_set_cr0(vcpu, old_value, new_value);
break;
+ case 4:
+ old_value = kvm_read_cr4(vcpu);
+ svm_set_cr4(vcpu, new_value);
+
+ kvm_post_set_cr4(vcpu, old_value, new_value);
+ break;
default:
WARN(1, "unhandled CR%d write trap", cr);
kvm_queue_exception(vcpu, UD_VECTOR);
@@ -3077,6 +3083,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_RDPRU] = rdpru_interception,
[SVM_EXIT_EFER_WRITE_TRAP] = efer_trap,
[SVM_EXIT_CR0_WRITE_TRAP] = cr_trap,
+ [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap,
[SVM_EXIT_INVPCID] = invpcid_interception,
[SVM_EXIT_NPF] = npf_interception,
[SVM_EXIT_RSM] = rsm_interception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b3f1f326e9c..c46da0d0f7f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -983,12 +983,22 @@ bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
}
EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);

+void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
+{
+ unsigned long mmu_role_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
+ X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
+
+ if (((cr4 ^ old_cr4) & mmu_role_bits) ||
+ (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
+ kvm_mmu_reset_context(vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
+
int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
unsigned long old_cr4 = kvm_read_cr4(vcpu);
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
X86_CR4_SMEP;
- unsigned long mmu_role_bits = pdptr_bits | X86_CR4_SMAP | X86_CR4_PKE;

if (!kvm_is_valid_cr4(vcpu, cr4))
return 1;
@@ -1015,9 +1025,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)

kvm_x86_ops.set_cr4(vcpu, cr4);

- if (((cr4 ^ old_cr4) & mmu_role_bits) ||
- (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
- kvm_mmu_reset_context(vcpu);
+ kvm_post_set_cr4(vcpu, old_cr4, cr4);

return 0;
}
--
2.28.0

2020-12-11 20:33:11

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 33/34] KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests

From: Tom Lendacky <[email protected]>

The run sequence is different for an SEV-ES guest compared to a legacy or
even an SEV guest. The guest vCPU register state of an SEV-ES guest will
be restored on VMRUN and saved on VMEXIT. There is no need to restore the
guest registers directly and through VMLOAD before VMRUN and no need to
save the guest registers directly and through VMSAVE on VMEXIT.

Update the svm_vcpu_run() function to skip register state saving and
restoring and provide an alternative function for running an SEV-ES guest
in vmenter.S

Additionally, certain host state is restored across an SEV-ES VMRUN. As
a result certain register states are not required to be restored upon
VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES
guest.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 25 ++++++++++++-------
arch/x86/kvm/svm/svm.h | 5 ++++
arch/x86/kvm/svm/vmenter.S | 50 ++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 6 +++++
4 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8fcee4cf4a62..e5a4e9032732 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3756,16 +3756,20 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
guest_enter_irqoff();
lockdep_hardirqs_on(CALLER_ADDR0);

- __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ __svm_sev_es_vcpu_run(svm->vmcb_pa);
+ } else {
+ __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);

#ifdef CONFIG_X86_64
- native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+ native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
#else
- loadsegment(fs, svm->host.fs);
+ loadsegment(fs, svm->host.fs);
#ifndef CONFIG_X86_32_LAZY_GS
- loadsegment(gs, svm->host.gs);
+ loadsegment(gs, svm->host.gs);
#endif
#endif
+ }

/*
* VMEXIT disables interrupts (host state), but tracing and lockdep
@@ -3863,14 +3867,17 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);

- reload_tss(vcpu);
+ if (!sev_es_guest(svm->vcpu.kvm))
+ reload_tss(vcpu);

x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);

- vcpu->arch.cr2 = svm->vmcb->save.cr2;
- vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
- vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
- vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+ if (!sev_es_guest(svm->vcpu.kvm)) {
+ vcpu->arch.cr2 = svm->vmcb->save.cr2;
+ vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
+ vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
+ vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+ }

if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_interrupt(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 657a4fc0e41f..868d30d7b6bf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -594,4 +594,9 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
void sev_es_vcpu_put(struct vcpu_svm *svm);

+/* vmenter.S */
+
+void __svm_sev_es_vcpu_run(unsigned long vmcb_pa);
+void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs);
+
#endif
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 1ec1ac40e328..6feb8c08f45a 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -168,3 +168,53 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
ret
SYM_FUNC_END(__svm_vcpu_run)
+
+/**
+ * __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
+ * @vmcb_pa: unsigned long
+ */
+SYM_FUNC_START(__svm_sev_es_vcpu_run)
+ push %_ASM_BP
+#ifdef CONFIG_X86_64
+ push %r15
+ push %r14
+ push %r13
+ push %r12
+#else
+ push %edi
+ push %esi
+#endif
+ push %_ASM_BX
+
+ /* Enter guest mode */
+ mov %_ASM_ARG1, %_ASM_AX
+ sti
+
+1: vmrun %_ASM_AX
+ jmp 3f
+2: cmpb $0, kvm_rebooting
+ jne 3f
+ ud2
+ _ASM_EXTABLE(1b, 2b)
+
+3: cli
+
+#ifdef CONFIG_RETPOLINE
+ /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
+ FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+#endif
+
+ pop %_ASM_BX
+
+#ifdef CONFIG_X86_64
+ pop %r12
+ pop %r13
+ pop %r14
+ pop %r15
+#else
+ pop %esi
+ pop %edi
+#endif
+ pop %_ASM_BP
+ ret
+SYM_FUNC_END(__svm_sev_es_vcpu_run)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47cb63a2d079..7cbdca29e39e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -880,6 +880,9 @@ EXPORT_SYMBOL_GPL(kvm_lmsw);

void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
{
+ if (vcpu->arch.guest_state_protected)
+ return;
+
if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {

if (vcpu->arch.xcr0 != host_xcr0)
@@ -900,6 +903,9 @@ EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);

void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
{
+ if (vcpu->arch.guest_state_protected)
+ return;
+
if (static_cpu_has(X86_FEATURE_PKU) &&
(kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
(vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
--
2.28.0

2020-12-11 21:03:27

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 13/34] KVM: SVM: Create trace events for VMGEXIT processing

From: Tom Lendacky <[email protected]>

Add trace events for entry to and exit from VMGEXIT processing. The vCPU
id and the exit reason will be common for the trace events. The exit info
fields will represent the input and output values for the entry and exit
events, respectively.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 6 +++++
arch/x86/kvm/trace.h | 53 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 2 ++
3 files changed, 61 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 54e6894b26d2..da473c6b725e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -15,10 +15,12 @@
#include <linux/pagemap.h>
#include <linux/swap.h>
#include <linux/processor.h>
+#include <linux/trace_events.h>

#include "x86.h"
#include "svm.h"
#include "cpuid.h"
+#include "trace.h"

static int sev_flush_asids(void);
static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1464,6 +1466,8 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
if (!svm->ghcb)
return;

+ trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
+
sev_es_sync_to_ghcb(svm);

kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
@@ -1528,6 +1532,8 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
svm->ghcb = svm->ghcb_map.hva;
ghcb = svm->ghcb_map.hva;

+ trace_kvm_vmgexit_enter(svm->vcpu.vcpu_id, ghcb);
+
exit_code = ghcb_get_sw_exit_code(ghcb);

ret = sev_es_validate_vmgexit(svm);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index aef960f90f26..7da931a511c9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1578,6 +1578,59 @@ TRACE_EVENT(kvm_hv_syndbg_get_msr,
__entry->vcpu_id, __entry->vp_index, __entry->msr,
__entry->data)
);
+
+/*
+ * Tracepoint for the start of VMGEXIT processing
+ */
+TRACE_EVENT(kvm_vmgexit_enter,
+ TP_PROTO(unsigned int vcpu_id, struct ghcb *ghcb),
+ TP_ARGS(vcpu_id, ghcb),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(u64, exit_reason)
+ __field(u64, info1)
+ __field(u64, info2)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->exit_reason = ghcb->save.sw_exit_code;
+ __entry->info1 = ghcb->save.sw_exit_info_1;
+ __entry->info2 = ghcb->save.sw_exit_info_2;
+ ),
+
+ TP_printk("vcpu %u, exit_reason %llx, exit_info1 %llx, exit_info2 %llx",
+ __entry->vcpu_id, __entry->exit_reason,
+ __entry->info1, __entry->info2)
+);
+
+/*
+ * Tracepoint for the end of VMGEXIT processing
+ */
+TRACE_EVENT(kvm_vmgexit_exit,
+ TP_PROTO(unsigned int vcpu_id, struct ghcb *ghcb),
+ TP_ARGS(vcpu_id, ghcb),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(u64, exit_reason)
+ __field(u64, info1)
+ __field(u64, info2)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->exit_reason = ghcb->save.sw_exit_code;
+ __entry->info1 = ghcb->save.sw_exit_info_1;
+ __entry->info2 = ghcb->save.sw_exit_info_2;
+ ),
+
+ TP_printk("vcpu %u, exit_reason %llx, exit_info1 %llx, exit_info2 %llx",
+ __entry->vcpu_id, __entry->exit_reason,
+ __entry->info1, __entry->info2)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index de0e35083df5..d89736066b39 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11321,3 +11321,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
--
2.28.0

2020-12-11 21:03:55

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 08/34] KVM: SVM: Prevent debugging under SEV-ES

From: Tom Lendacky <[email protected]>

Since the guest register state of an SEV-ES guest is encrypted, debugging
is not supported. Update the code to prevent guest debugging when the
guest has protected state.

Additionally, an SEV-ES guest must only and always intercept DR7 reads and
writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for
this.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 9 +++++
arch/x86/kvm/svm/svm.h | 81 +++++++++++++++++++++++-------------------
arch/x86/kvm/x86.c | 3 ++
3 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857d0d3f2752..513cf667dff4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1806,6 +1806,9 @@ static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
{
struct vmcb *vmcb = svm->vmcb;

+ if (svm->vcpu.arch.guest_state_protected)
+ return;
+
if (unlikely(value != vmcb->save.dr6)) {
vmcb->save.dr6 = value;
vmcb_mark_dirty(vmcb, VMCB_DR);
@@ -1816,6 +1819,9 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ if (vcpu->arch.guest_state_protected)
+ return;
+
get_debugreg(vcpu->arch.db[0], 0);
get_debugreg(vcpu->arch.db[1], 1);
get_debugreg(vcpu->arch.db[2], 2);
@@ -1834,6 +1840,9 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ if (vcpu->arch.guest_state_protected)
+ return;
+
svm->vmcb->save.dr7 = value;
vmcb_mark_dirty(svm->vmcb, VMCB_DR);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 80a359f3cf20..abfe53d6b3dc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,6 +199,28 @@ static inline struct kvm_svm *to_kvm_svm(struct kvm *kvm)
return container_of(kvm, struct kvm_svm, kvm);
}

+static inline bool sev_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev->active;
+#else
+ return false;
+#endif
+}
+
+static inline bool sev_es_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_guest(kvm) && sev->es_active;
+#else
+ return false;
+#endif
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
@@ -250,21 +272,24 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm)
{
struct vmcb *vmcb = get_host_vmcb(svm);

- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+ if (!sev_es_guest(svm->vcpu.kvm)) {
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
+ }
+
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);

recalc_intercepts(svm);
@@ -276,6 +301,12 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)

vmcb->control.intercepts[INTERCEPT_DR] = 0;

+ /* DR7 access must remain intercepted for an SEV-ES guest */
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+ }
+
recalc_intercepts(svm);
}

@@ -481,28 +512,6 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);

extern unsigned int max_sev_asid;

-static inline bool sev_guest(struct kvm *kvm)
-{
-#ifdef CONFIG_KVM_AMD_SEV
- struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-
- return sev->active;
-#else
- return false;
-#endif
-}
-
-static inline bool sev_es_guest(struct kvm *kvm)
-{
-#ifdef CONFIG_KVM_AMD_SEV
- struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-
- return sev_guest(kvm) && sev->es_active;
-#else
- return false;
-#endif
-}
-
static inline bool svm_sev_enabled(void)
{
return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b6809a2851d2..de0e35083df5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9671,6 +9671,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
unsigned long rflags;
int i, r;

+ if (vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
vcpu_load(vcpu);

if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
--
2.28.0

2020-12-11 21:04:25

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5 15/34] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004

From: Tom Lendacky <[email protected]>

The GHCB specification defines a GHCB MSR protocol using the lower
12-bits of the GHCB MSR (in the hypervisor this corresponds to the
GHCB GPA field in the VMCB).

Function 0x004 is a request for CPUID information. Only a single CPUID
result register can be sent per invocation, so the protocol defines the
register that is requested. The GHCB MSR value is set to the CPUID
register value as per the specification via the VMCB GHCB GPA field.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.h | 9 +++++++
2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 58861515d3e3..53bf3ff1d9cc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1504,6 +1504,18 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
}

+static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
+ unsigned int pos)
+{
+ svm->vmcb->control.ghcb_gpa &= ~(mask << pos);
+ svm->vmcb->control.ghcb_gpa |= (value & mask) << pos;
+}
+
+static u64 get_ghcb_msr_bits(struct vcpu_svm *svm, u64 mask, unsigned int pos)
+{
+ return (svm->vmcb->control.ghcb_gpa >> pos) & mask;
+}
+
static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
{
svm->vmcb->control.ghcb_gpa = value;
@@ -1512,7 +1524,9 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
u64 ghcb_info;
+ int ret = 1;

ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;

@@ -1522,11 +1536,49 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_VERSION_MIN,
sev_enc_bit));
break;
+ case GHCB_MSR_CPUID_REQ: {
+ u64 cpuid_fn, cpuid_reg, cpuid_value;
+
+ cpuid_fn = get_ghcb_msr_bits(svm,
+ GHCB_MSR_CPUID_FUNC_MASK,
+ GHCB_MSR_CPUID_FUNC_POS);
+
+ /* Initialize the registers needed by the CPUID intercept */
+ vcpu->arch.regs[VCPU_REGS_RAX] = cpuid_fn;
+ vcpu->arch.regs[VCPU_REGS_RCX] = 0;
+
+ ret = svm_invoke_exit_handler(svm, SVM_EXIT_CPUID);
+ if (!ret) {
+ ret = -EINVAL;
+ break;
+ }
+
+ cpuid_reg = get_ghcb_msr_bits(svm,
+ GHCB_MSR_CPUID_REG_MASK,
+ GHCB_MSR_CPUID_REG_POS);
+ if (cpuid_reg == 0)
+ cpuid_value = vcpu->arch.regs[VCPU_REGS_RAX];
+ else if (cpuid_reg == 1)
+ cpuid_value = vcpu->arch.regs[VCPU_REGS_RBX];
+ else if (cpuid_reg == 2)
+ cpuid_value = vcpu->arch.regs[VCPU_REGS_RCX];
+ else
+ cpuid_value = vcpu->arch.regs[VCPU_REGS_RDX];
+
+ set_ghcb_msr_bits(svm, cpuid_value,
+ GHCB_MSR_CPUID_VALUE_MASK,
+ GHCB_MSR_CPUID_VALUE_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_CPUID_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
default:
- return -EINVAL;
+ ret = -EINVAL;
}

- return 1;
+ return ret;
}

int sev_handle_vmgexit(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 546f8d05e81e..9dd8429f2b27 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -534,6 +534,15 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
(((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) | \
GHCB_MSR_SEV_INFO_RESP)

+#define GHCB_MSR_CPUID_REQ 0x004
+#define GHCB_MSR_CPUID_RESP 0x005
+#define GHCB_MSR_CPUID_FUNC_POS 32
+#define GHCB_MSR_CPUID_FUNC_MASK 0xffffffff
+#define GHCB_MSR_CPUID_VALUE_POS 32
+#define GHCB_MSR_CPUID_VALUE_MASK 0xffffffff
+#define GHCB_MSR_CPUID_REG_POS 30
+#define GHCB_MSR_CPUID_REG_MASK 0x3
+
extern unsigned int max_sev_asid;

static inline bool svm_sev_enabled(void)
--
2.28.0

2020-12-14 14:49:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 02/34] KVM: SVM: Remove the call to sev_platform_status() during setup

On 10/12/20 18:09, Tom Lendacky wrote:
> From: Tom Lendacky <[email protected]>
>
> When both KVM support and the CCP driver are built into the kernel instead
> of as modules, KVM initialization can happen before CCP initialization. As
> a result, sev_platform_status() will return a failure when it is called
> from sev_hardware_setup(), when this isn't really an error condition.
>
> Since sev_platform_status() doesn't need to be called at this time anyway,
> remove the invocation from sev_hardware_setup().
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 22 +---------------------
> 1 file changed, 1 insertion(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index c0b14106258a..a4ba5476bf42 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1127,9 +1127,6 @@ void sev_vm_destroy(struct kvm *kvm)
>
> int __init sev_hardware_setup(void)
> {
> - struct sev_user_data_status *status;
> - int rc;
> -
> /* Maximum number of encrypted guests supported simultaneously */
> max_sev_asid = cpuid_ecx(0x8000001F);
>
> @@ -1148,26 +1145,9 @@ int __init sev_hardware_setup(void)
> if (!sev_reclaim_asid_bitmap)
> return 1;
>
> - status = kmalloc(sizeof(*status), GFP_KERNEL);
> - if (!status)
> - return 1;
> -
> - /*
> - * Check SEV platform status.
> - *
> - * PLATFORM_STATUS can be called in any state, if we failed to query
> - * the PLATFORM status then either PSP firmware does not support SEV
> - * feature or SEV firmware is dead.
> - */
> - rc = sev_platform_status(status, NULL);
> - if (rc)
> - goto err;
> -
> pr_info("SEV supported\n");
>
> -err:
> - kfree(status);
> - return rc;
> + return 0;
> }
>
> void sev_hardware_teardown(void)
>

Queued with Cc: stable.

Note that sev_platform_status now can become static within
drivers/crypto/ccp/sev-dev.c.

Paolo

2020-12-14 15:41:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 08/34] KVM: SVM: Prevent debugging under SEV-ES

On 10/12/20 18:09, Tom Lendacky wrote:
> +static inline bool sev_guest(struct kvm *kvm)
> +{
> +#ifdef CONFIG_KVM_AMD_SEV
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> + return sev->active;
> +#else
> + return false;
> +#endif
> +}
> +
> +static inline bool sev_es_guest(struct kvm *kvm)
> +{
> +#ifdef CONFIG_KVM_AMD_SEV
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> + return sev_guest(kvm) && sev->es_active;
> +#else
> + return false;
> +#endif
> +}
> +

This code movement could have been done before.

Paolo

2020-12-14 15:42:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 07/34] KVM: SVM: Add required changes to support intercepts under SEV-ES

On 10/12/20 18:09, Tom Lendacky wrote:
> @@ -2797,7 +2838,27 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>
> static int wrmsr_interception(struct vcpu_svm *svm)
> {
> - return kvm_emulate_wrmsr(&svm->vcpu);
> + u32 ecx;
> + u64 data;
> +
> + if (!sev_es_guest(svm->vcpu.kvm))
> + return kvm_emulate_wrmsr(&svm->vcpu);
> +
> + ecx = kvm_rcx_read(&svm->vcpu);
> + data = kvm_read_edx_eax(&svm->vcpu);
> + if (kvm_set_msr(&svm->vcpu, ecx, data)) {
> + trace_kvm_msr_write_ex(ecx, data);
> + ghcb_set_sw_exit_info_1(svm->ghcb, 1);
> + ghcb_set_sw_exit_info_2(svm->ghcb,
> + X86_TRAP_GP |
> + SVM_EVTINJ_TYPE_EXEPT |
> + SVM_EVTINJ_VALID);
> + return 1;
> + }
> +
> + trace_kvm_msr_write(ecx, data);
> +
> + return kvm_skip_emulated_instruction(&svm->vcpu);
> }
>
> static int msr_interception(struct vcpu_svm *svm)

This code duplication is ugly, and does not work with userspace MSR
filters too.

But we can instead trap the completion of the MSR read/write to use
ghcb_set_sw_exit_info_1 instead of kvm_inject_gp, with a callback like

static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
{
if (!sev_es_guest(svm->vcpu.kvm) || !err)
return kvm_complete_insn_gp(&svm->vcpu, err);

ghcb_set_sw_exit_info_1(svm->ghcb, 1);
ghcb_set_sw_exit_info_2(svm->ghcb,
X86_TRAP_GP |
SVM_EVTINJ_TYPE_EXEPT |
SVM_EVTINJ_VALID);
return 1;
}


...
.complete_emulated_msr = svm_complete_emulated_msr,

> @@ -2827,7 +2888,14 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
> static int pause_interception(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> - bool in_kernel = (svm_get_cpl(vcpu) == 0);
> + bool in_kernel;
> +
> + /*
> + * CPL is not made available for an SEV-ES guest, so just set in_kernel
> + * to true.
> + */
> + in_kernel = (sev_es_guest(svm->vcpu.kvm)) ? true
> + : (svm_get_cpl(vcpu) == 0);
>
> if (!kvm_pause_in_guest(vcpu->kvm))
> grow_ple_window(vcpu);

See below.

> @@ -3273,6 +3351,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb;
>
> + /*
> + * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
> + * bit to determine the state of the IF flag.
> + */
> + if (sev_es_guest(svm->vcpu.kvm))
> + return !(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK);

This seems wrong, you have to take into account
SVM_INTERRUPT_SHADOW_MASK as well. Also, even though GIF is not really
used by SEV-ES guests, I think it's nicer to put this check afterwards.

That is:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4372e45c8f06..2dd9c9698480 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3247,7 +3247,14 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
if (!gif_set(svm))
return true;

- if (is_guest_mode(vcpu)) {
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ /*
+ * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
+ * bit to determine the state of the IF flag.
+ */
+ if (!(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK))
+ return true;
+ } else if (is_guest_mode(vcpu)) {
/* As long as interrupts are being delivered... */
if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
? !(svm->nested.hsave->save.rflags & X86_EFLAGS_IF)



> if (!gif_set(svm))
> return true;
>
> @@ -3458,6 +3543,12 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
> svm->vcpu.arch.nmi_injected = true;
> break;
> case SVM_EXITINTINFO_TYPE_EXEPT:
> + /*
> + * Never re-inject a #VC exception.
> + */
> + if (vector == X86_TRAP_VC)
> + break;
> +
> /*
> * In case of software exceptions, do not reinject the vector,
> * but re-execute the instruction instead. Rewind RIP first
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a3fdc16cfd6f..b6809a2851d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4018,7 +4018,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> int idx;
>
> - if (vcpu->preempted)
> + if (vcpu->preempted && !vcpu->arch.guest_state_protected)
> vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);

This has to be true, otherwise no directed yield will be done at all:

if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
!kvm_arch_vcpu_in_kernel(vcpu))
continue;

Or more easily, just use in_kernel == false in pause_interception, like

+ /*
+ * CPL is not made available for an SEV-ES guest, therefore
+ * vcpu->arch.preempted_in_kernel can never be true. Just
+ * set in_kernel to false as well.
+ */
+ in_kernel = !sev_es_guest(svm->vcpu.kvm) && svm_get_cpl(vcpu) == 0;

>
> /*
> @@ -8161,7 +8161,9 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
> {
> struct kvm_run *kvm_run = vcpu->run;
>
> - kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
> + kvm_run->if_flag = (vcpu->arch.guest_state_protected)
> + ? kvm_arch_interrupt_allowed(vcpu)
> + : (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;

Here indeed you only want the interrupt allowed bit, not the interrupt
window. But we can just be bold and always set it to true.

- for userspace irqchip, kvm_run->ready_for_interrupt_injection is set
just below and it will always be false if kvm_arch_interrupt_allowed is
false

- for in-kernel APIC, if_flag is documented to be invalid (though it
actually is valid). For split irqchip, they can just use
kvm_run->ready_for_interrupt_injection; for entirely in-kernel interrupt
handling, userspace does not need if_flag at all.

Paolo

> kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
> kvm_run->cr8 = kvm_get_cr8(vcpu);
> kvm_run->apic_base = kvm_get_apic_base(vcpu);
>


2020-12-14 15:45:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 08/34] KVM: SVM: Prevent debugging under SEV-ES

On 10/12/20 18:09, Tom Lendacky wrote:
> Additionally, an SEV-ES guest must only and always intercept DR7 reads and
> writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for
> this.

I cannot see it, where is this documented?

Paolo

2020-12-14 15:49:41

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 12/34] KVM: SVM: Add initial support for a VMGEXIT VMEXIT

On 10/12/20 18:09, Tom Lendacky wrote:
> @@ -3184,6 +3186,8 @@ static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
> return halt_interception(svm);
> else if (exit_code == SVM_EXIT_NPF)
> return npf_interception(svm);
> + else if (exit_code == SVM_EXIT_VMGEXIT)
> + return sev_handle_vmgexit(svm);

Are these common enough to warrant putting them in this short list?

Paolo

> #endif
> return svm_exit_handlers[exit_code](svm);
> }

2020-12-14 15:54:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 16/34] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100

On 10/12/20 18:09, Tom Lendacky wrote:
> + pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
> + reason_set, reason_code);
> + fallthrough;
> + }

It would be nice to send these to userspace instead as a follow-up.

Paolo

2020-12-14 16:08:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

On 10/12/20 18:10, Tom Lendacky wrote:
> From: Tom Lendacky <[email protected]>
>
> Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
> where the guest vCPU register state is updated and then the vCPU is VMRUN
> to begin execution of the AP. For an SEV-ES guest, this won't work because
> the guest register state is encrypted.
>
> Following the GHCB specification, the hypervisor must not alter the guest
> register state, so KVM must track an AP/vCPU boot. Should the guest want
> to park the AP, it must use the AP Reset Hold exit event in place of, for
> example, a HLT loop.
>
> First AP boot (first INIT-SIPI-SIPI sequence):
> Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
> support. It is up to the guest to transfer control of the AP to the
> proper location.
>
> Subsequent AP boot:
> KVM will expect to receive an AP Reset Hold exit event indicating that
> the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
> awaken it. When the AP Reset Hold exit event is received, KVM will place
> the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
> sequence, KVM will make the vCPU runnable. It is again up to the guest
> to then transfer control of the AP to the proper location.
>
> The GHCB specification also requires the hypervisor to save the address of
> an AP Jump Table so that, for example, vCPUs that have been parked by UEFI
> can be started by the OS. Provide support for the AP Jump Table set/get
> exit code.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/svm/sev.c | 50 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 7 +++++
> arch/x86/kvm/svm/svm.h | 3 ++
> arch/x86/kvm/x86.c | 9 ++++++
> 5 files changed, 71 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 048b08437c33..60a3b9d33407 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1286,6 +1286,8 @@ struct kvm_x86_ops {
>
> void (*migrate_timers)(struct kvm_vcpu *vcpu);
> void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
> +
> + void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> };
>
> struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a7531de760b5..b47285384b1f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -17,6 +17,8 @@
> #include <linux/processor.h>
> #include <linux/trace_events.h>
>
> +#include <asm/trapnr.h>
> +
> #include "x86.h"
> #include "svm.h"
> #include "cpuid.h"
> @@ -1449,6 +1451,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> if (!ghcb_sw_scratch_is_valid(ghcb))
> goto vmgexit_err;
> break;
> + case SVM_VMGEXIT_AP_HLT_LOOP:
> + case SVM_VMGEXIT_AP_JUMP_TABLE:
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> break;
> default:
> @@ -1770,6 +1774,35 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
> control->exit_info_2,
> svm->ghcb_sa);
> break;
> + case SVM_VMGEXIT_AP_HLT_LOOP:
> + svm->ap_hlt_loop = true;

This value needs to be communicated to userspace. Let's get this right
from the beginning and use a new KVM_MP_STATE_* value instead (perhaps
reuse KVM_MP_STATE_STOPPED but for x86 #define it as
KVM_MP_STATE_AP_HOLD_RECEIVED?).

> @@ -68,6 +68,7 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> + u64 ap_jump_table; /* SEV-ES AP Jump Table address */

Do you have any plans for migration of this value? How does the guest
ensure that the hypervisor does not screw with it?

Paolo

> + ret = kvm_emulate_halt(&svm->vcpu);
> + break;
> + case SVM_VMGEXIT_AP_JUMP_TABLE: {
> + struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> +
> + switch (control->exit_info_1) {
> + case 0:
> + /* Set AP jump table address */
> + sev->ap_jump_table = control->exit_info_2;
> + break;
> + case 1:
> + /* Get AP jump table address */
> + ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
> + break;
> + default:
> + pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
> + control->exit_info_1);
> + ghcb_set_sw_exit_info_1(ghcb, 1);
> + ghcb_set_sw_exit_info_2(ghcb,
> + X86_TRAP_UD |
> + SVM_EVTINJ_TYPE_EXEPT |
> + SVM_EVTINJ_VALID);
> + }
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(&svm->vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> @@ -1790,3 +1823,20 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
> return kvm_sev_es_string_io(&svm->vcpu, size, port,
> svm->ghcb_sa, svm->ghcb_sa_len, in);
> }
> +
> +void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + /* First SIPI: Use the values as initially set by the VMM */
> + if (!svm->ap_hlt_loop)
> + return;
> +
> + /*
> + * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
> + * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
> + * non-zero value.
> + */
> + ghcb_set_sw_exit_info_2(svm->ghcb, 1);
> + svm->ap_hlt_loop = false;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8d22ae25a0f8..2dbc20701ef5 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4400,6 +4400,11 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> (vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
> }
>
> +static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> +{
> + sev_vcpu_deliver_sipi_vector(vcpu, vector);
> +}
> +
> static void svm_vm_destroy(struct kvm *kvm)
> {
> avic_vm_destroy(kvm);
> @@ -4541,6 +4546,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>
> .msr_filter_changed = svm_msr_filter_changed,
> +
> + .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> };
>
> static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index b3f03dede6ac..5d570d5a6a2c 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -68,6 +68,7 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> + u64 ap_jump_table; /* SEV-ES AP Jump Table address */
> };
>
> struct kvm_svm {
> @@ -174,6 +175,7 @@ struct vcpu_svm {
> struct vmcb_save_area *vmsa;
> struct ghcb *ghcb;
> struct kvm_host_map ghcb_map;
> + bool ap_hlt_loop;
>
> /* SEV-ES scratch area support */
> void *ghcb_sa;
> @@ -574,5 +576,6 @@ void sev_hardware_teardown(void);
> void sev_free_vcpu(struct kvm_vcpu *vcpu);
> int sev_handle_vmgexit(struct vcpu_svm *svm);
> int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
> +void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>
> #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ddd614a76744..4fd216b61a89 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10144,6 +10144,15 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> {
> struct kvm_segment cs;
>
> + /*
> + * Guests with protected state can't have their state altered by KVM,
> + * call the vcpu_deliver_sipi_vector() x86 op for processing.
> + */
> + if (vcpu->arch.guest_state_protected) {
> + kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, vector);
> + return;
> + }
> +
> kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
> cs.selector = vector << 8;
> cs.base = vector << 12;
>

2020-12-14 16:12:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

On 10/12/20 18:10, Tom Lendacky wrote:
> @@ -10144,6 +10144,15 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> {
> struct kvm_segment cs;
>
> + /*
> + * Guests with protected state can't have their state altered by KVM,
> + * call the vcpu_deliver_sipi_vector() x86 op for processing.
> + */
> + if (vcpu->arch.guest_state_protected) {
> + kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, vector);
> + return;
> + }
> +

Also, I don't mind that you just call
kvm_x86_ops.vcpu_deliver_sipi_vector from lapic.c, and make VMX just do

.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,

(SVM would do it if !guest_state_protected). This matches more or less
how I redid the MSR part.

Paolo

2020-12-14 18:52:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 00/34] SEV-ES hypervisor support

On 10/12/20 18:09, Tom Lendacky wrote:
> From: Tom Lendacky <[email protected]>
>
> This patch series provides support for running SEV-ES guests under KVM.
>
> Secure Encrypted Virtualization - Encrypted State (SEV-ES) expands on the
> SEV support to protect the guest register state from the hypervisor. See
> "AMD64 Architecture Programmer's Manual Volume 2: System Programming",
> section "15.35 Encrypted State (SEV-ES)" [1].
>
> In order to allow a hypervisor to perform functions on behalf of a guest,
> there is architectural support for notifying a guest's operating system
> when certain types of VMEXITs are about to occur. This allows the guest to
> selectively share information with the hypervisor to satisfy the requested
> function. The notification is performed using a new exception, the VMM
> Communication exception (#VC). The information is shared through the
> Guest-Hypervisor Communication Block (GHCB) using the VMGEXIT instruction.
> The GHCB format and the protocol for using it is documented in "SEV-ES
> Guest-Hypervisor Communication Block Standardization" [2].
>
> Under SEV-ES, a vCPU save area (VMSA) must be encrypted. SVM is updated to
> build the initial VMSA and then encrypt it before running the guest. Once
> encrypted, it must not be modified by the hypervisor. Modification of the
> VMSA will result in the VMRUN instruction failing with a SHUTDOWN exit
> code. KVM must support the VMGEXIT exit code in order to perform the
> necessary functions required of the guest. The GHCB is used to exchange
> the information needed by both the hypervisor and the guest.
>
> Register data from the GHCB is copied into the KVM register variables and
> accessed as usual during handling of the exit. Upon return to the guest,
> updated registers are copied back to the GHCB for the guest to act upon.
>
> There are changes to some of the intercepts that are needed under SEV-ES.
> For example, CR0 writes cannot be intercepted, so the code needs to ensure
> that the intercept is not enabled during execution or that the hypervisor
> does not try to read the register as part of exit processing. Another
> example is shutdown processing, where the vCPU cannot be directly reset.
>
> Support is added to handle VMGEXIT events and implement the GHCB protocol.
> This includes supporting standard exit events, like a CPUID instruction
> intercept, to new support, for things like AP processor booting. Much of
> the existing SVM intercept support can be re-used by setting the exit
> code information from the VMGEXIT and calling the appropriate intercept
> handlers.
>
> Finally, to launch and run an SEV-ES guest requires changes to the vCPU
> initialization, loading and execution.
>
> [1] https://www.amd.com/system/files/TechDocs/24593.pdf
> [2] https://developer.amd.com/wp-content/resources/56421.pdf
>
> ---
>
> These patches are based on the KVM queue branch:
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
>
> dc924b062488 ("KVM: SVM: check CR4 changes against vcpu->arch")
>
> A version of the tree can also be found at:
> https://github.com/AMDESE/linux/tree/sev-es-v5
> This tree has one addition patch that is not yet part of the queue
> tree that is required to run any SEV guest:
> [PATCH] KVM: x86: adjust SEV for commit 7e8e6eed75e
> https://lore.kernel.org/kvm/[email protected]/
>
> Changes from v4:
> - Updated the tracking support for CR0/CR4
>
> Changes from v3:
> - Some krobot fixes.
> - Some checkpatch cleanups.
>
> Changes from v2:
> - Update the freeing of the VMSA page to account for the encrypted memory
> cache coherency feature as well as the VM page flush feature.
> - Update the GHCB dump function with a bit more detail.
> - Don't check for RAX being present as part of a string IO operation.
> - Include RSI when syncing from GHCB to support KVM hypercall arguments.
> - Add GHCB usage field validation check.
>
> Changes from v1:
> - Removed the VMSA indirection support:
> - On LAUNCH_UPDATE_VMSA, sync traditional VMSA over to the new SEV-ES
> VMSA area to be encrypted.
> - On VMGEXIT VMEXIT, directly copy valid registers into vCPU arch
> register array from GHCB. On VMRUN (following a VMGEXIT), directly
> copy dirty vCPU arch registers to GHCB.
> - Removed reg_read_override()/reg_write_override() KVM ops.
> - Added VMGEXIT exit-reason validation.
> - Changed kvm_vcpu_arch variable vmsa_encrypted to guest_state_protected
> - Updated the tracking support for EFER/CR0/CR4/CR8 to minimize changes
> to the x86.c code
> - Updated __set_sregs to not set any register values (previously supported
> setting the tracked values of EFER/CR0/CR4/CR8)
> - Added support for reporting SMM capability at the VM-level. This allows
> an SEV-ES guest to indicate SMM is not supported
> - Updated FPU support to check for a guest FPU save area before using it.
> Updated SVM to free guest FPU for an SEV-ES guest during KVM create_vcpu
> op.
> - Removed changes to the kvm_skip_emulated_instruction()
> - Added VMSA validity checks before invoking LAUNCH_UPDATE_VMSA
> - Minor code restructuring in areas for better readability
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> Cc: Wanpeng Li <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Brijesh Singh <[email protected]>

I'm queuing everything except patch 27, there's time to include it later
in 5.11.

Regarding MSRs, take a look at the series I'm sending shortly (or
perhaps in a couple hours). For now I'll keep it in kvm/queue, but the
plan is to get acks quickly and/or just include it in 5.11. Please try
the kvm/queue branch to see if I screwed up anything.

Paolo

2020-12-14 19:16:12

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 07/34] KVM: SVM: Add required changes to support intercepts under SEV-ES

On 12/14/20 9:33 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> @@ -2797,7 +2838,27 @@ static int svm_set_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr)
>>     static int wrmsr_interception(struct vcpu_svm *svm)
>>   {
>> -    return kvm_emulate_wrmsr(&svm->vcpu);
>> +    u32 ecx;
>> +    u64 data;
>> +
>> +    if (!sev_es_guest(svm->vcpu.kvm))
>> +        return kvm_emulate_wrmsr(&svm->vcpu);
>> +
>> +    ecx = kvm_rcx_read(&svm->vcpu);
>> +    data = kvm_read_edx_eax(&svm->vcpu);
>> +    if (kvm_set_msr(&svm->vcpu, ecx, data)) {
>> +        trace_kvm_msr_write_ex(ecx, data);
>> +        ghcb_set_sw_exit_info_1(svm->ghcb, 1);
>> +        ghcb_set_sw_exit_info_2(svm->ghcb,
>> +                    X86_TRAP_GP |
>> +                    SVM_EVTINJ_TYPE_EXEPT |
>> +                    SVM_EVTINJ_VALID);
>> +        return 1;
>> +    }
>> +
>> +    trace_kvm_msr_write(ecx, data);
>> +
>> +    return kvm_skip_emulated_instruction(&svm->vcpu);
>>   }
>>     static int msr_interception(struct vcpu_svm *svm)
>
> This code duplication is ugly, and does not work with userspace MSR
> filters too.

Agree and I missed that the userspace MSR support went in.

>
> But we can instead trap the completion of the MSR read/write to use
> ghcb_set_sw_exit_info_1 instead of kvm_inject_gp, with a callback like
>
> static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
> {
>         if (!sev_es_guest(svm->vcpu.kvm) || !err)
>                 return kvm_complete_insn_gp(&svm->vcpu, err);
>
>         ghcb_set_sw_exit_info_1(svm->ghcb, 1);
>         ghcb_set_sw_exit_info_2(svm->ghcb,
>                                 X86_TRAP_GP |
>                                 SVM_EVTINJ_TYPE_EXEPT |
>                                 SVM_EVTINJ_VALID);
>         return 1;
> }

If we use the kvm_complete_insn_gp() we lose the tracing and it needs to
be able to deal with read completion setting the registers. It also needs
to deal with both kvm_emulate_rdmsr/wrmsr() when not bouncing to
userspace. Let me take a shot at covering all the cases and see what I can
come up with.

I noticed that the userspace completion path doesn't have tracing
invocations, trace_kvm_msr_read/write_ex() or trace_kvm_msr_read/write(),
is that by design?

>
>
> ...
>     .complete_emulated_msr = svm_complete_emulated_msr,
>
>> @@ -2827,7 +2888,14 @@ static int interrupt_window_interception(struct
>> vcpu_svm *svm)
>>   static int pause_interception(struct vcpu_svm *svm)
>>   {
>>       struct kvm_vcpu *vcpu = &svm->vcpu;
>> -    bool in_kernel = (svm_get_cpl(vcpu) == 0);
>> +    bool in_kernel;
>> +
>> +    /*
>> +     * CPL is not made available for an SEV-ES guest, so just set
>> in_kernel
>> +     * to true.
>> +     */
>> +    in_kernel = (sev_es_guest(svm->vcpu.kvm)) ? true
>> +                          : (svm_get_cpl(vcpu) == 0);
>>         if (!kvm_pause_in_guest(vcpu->kvm))
>>           grow_ple_window(vcpu);
>
> See below.
>
>> @@ -3273,6 +3351,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>>       struct vcpu_svm *svm = to_svm(vcpu);
>>       struct vmcb *vmcb = svm->vmcb;
>>   +    /*
>> +     * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
>> +     * bit to determine the state of the IF flag.
>> +     */
>> +    if (sev_es_guest(svm->vcpu.kvm))
>> +        return !(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK);
>
> This seems wrong, you have to take into account SVM_INTERRUPT_SHADOW_MASK
> as well.  Also, even though GIF is not really used by SEV-ES guests, I
> think it's nicer to put this check afterwards.
>
> That is:
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 4372e45c8f06..2dd9c9698480 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3247,7 +3247,14 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>      if (!gif_set(svm))
>          return true;
>
> -    if (is_guest_mode(vcpu)) {
> +    if (sev_es_guest(svm->vcpu.kvm)) {
> +        /*
> +         * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
> +         * bit to determine the state of the IF flag.
> +         */
> +        if (!(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK))
> +            return true;
> +    } else if (is_guest_mode(vcpu)) {
>          /* As long as interrupts are being delivered...  */
>          if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
>              ? !(svm->nested.hsave->save.rflags & X86_EFLAGS_IF)
>
>

Yup, I'll make that change.

>
>>       if (!gif_set(svm))
>>           return true;
>>   @@ -3458,6 +3543,12 @@ static void svm_complete_interrupts(struct
>> vcpu_svm *svm)
>>           svm->vcpu.arch.nmi_injected = true;
>>           break;
>>       case SVM_EXITINTINFO_TYPE_EXEPT:
>> +        /*
>> +         * Never re-inject a #VC exception.
>> +         */
>> +        if (vector == X86_TRAP_VC)
>> +            break;
>> +
>>           /*
>>            * In case of software exceptions, do not reinject the vector,
>>            * but re-execute the instruction instead. Rewind RIP first
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a3fdc16cfd6f..b6809a2851d2 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4018,7 +4018,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>   {
>>       int idx;
>>   -    if (vcpu->preempted)
>> +    if (vcpu->preempted && !vcpu->arch.guest_state_protected)
>>           vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);
>
> This has to be true, otherwise no directed yield will be done at all:
>
>     if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
>         !kvm_arch_vcpu_in_kernel(vcpu))
>         continue;
>
> Or more easily, just use in_kernel == false in pause_interception, like
>
> +    /*
> +     * CPL is not made available for an SEV-ES guest, therefore
> +     * vcpu->arch.preempted_in_kernel can never be true.  Just
> +     * set in_kernel to false as well.
> +     */
> +    in_kernel = !sev_es_guest(svm->vcpu.kvm) && svm_get_cpl(vcpu) == 0;

Sounds good, I'll make that change.

>
>>         /*
>> @@ -8161,7 +8161,9 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
>>   {
>>       struct kvm_run *kvm_run = vcpu->run;
>>   -    kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
>> +    kvm_run->if_flag = (vcpu->arch.guest_state_protected)
>> +        ? kvm_arch_interrupt_allowed(vcpu)
>> +        : (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
>
> Here indeed you only want the interrupt allowed bit, not the interrupt
> window.  But we can just be bold and always set it to true.
>
> - for userspace irqchip, kvm_run->ready_for_interrupt_injection is set
> just below and it will always be false if kvm_arch_interrupt_allowed is false
>
> - for in-kernel APIC, if_flag is documented to be invalid (though it
> actually is valid).  For split irqchip, they can just use
> kvm_run->ready_for_interrupt_injection; for entirely in-kernel interrupt
> handling, userspace does not need if_flag at all.

Ok, I'll make that change.

Thanks,
Tom

>
> Paolo
>
>>       kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
>>       kvm_run->cr8 = kvm_get_cr8(vcpu);
>>       kvm_run->apic_base = kvm_get_apic_base(vcpu);
>>
>
>

2020-12-14 19:25:08

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 16/34] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100



On 12/14/20 9:49 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> +        pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
>> +            reason_set, reason_code);
>> +        fallthrough;
>> +    }
>
> It would be nice to send these to userspace instead as a follow-up.

I'll look into doing that.

Thanks,
Tom

>
> Paolo
>

2020-12-14 19:26:36

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 12/34] KVM: SVM: Add initial support for a VMGEXIT VMEXIT

On 12/14/20 9:45 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> @@ -3184,6 +3186,8 @@ static int svm_invoke_exit_handler(struct vcpu_svm
>> *svm, u64 exit_code)
>>           return halt_interception(svm);
>>       else if (exit_code == SVM_EXIT_NPF)
>>           return npf_interception(svm);
>> +    else if (exit_code == SVM_EXIT_VMGEXIT)
>> +        return sev_handle_vmgexit(svm);
>
> Are these common enough to warrant putting them in this short list?

A VMGEXIT exit occurs for any of the listed NAE events in the GHCB
specification (e.g. CPUID, RDMSR/WRMSR, MMIO, port IO, etc.) if those
events are being intercepted (or triggered in the case of MMIO). It will
depend on what is considered common. Since SVM_EXIT_MSR was already in the
list, I figured I should add VMGEXIT.

Thanks,
Tom

>
> Paolo
>
>>   #endif
>>       return svm_exit_handlers[exit_code](svm);
>>   }
>

2020-12-14 19:52:39

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

On 12/14/20 10:03 AM, Paolo Bonzini wrote:
> On 10/12/20 18:10, Tom Lendacky wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
>> where the guest vCPU register state is updated and then the vCPU is VMRUN
>> to begin execution of the AP. For an SEV-ES guest, this won't work because
>> the guest register state is encrypted.
>>
>> Following the GHCB specification, the hypervisor must not alter the guest
>> register state, so KVM must track an AP/vCPU boot. Should the guest want
>> to park the AP, it must use the AP Reset Hold exit event in place of, for
>> example, a HLT loop.
>>
>> First AP boot (first INIT-SIPI-SIPI sequence):
>>    Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
>>    support. It is up to the guest to transfer control of the AP to the
>>    proper location.
>>
>> Subsequent AP boot:
>>    KVM will expect to receive an AP Reset Hold exit event indicating that
>>    the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
>>    awaken it. When the AP Reset Hold exit event is received, KVM will place
>>    the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
>>    sequence, KVM will make the vCPU runnable. It is again up to the guest
>>    to then transfer control of the AP to the proper location.
>>
>> The GHCB specification also requires the hypervisor to save the address of
>> an AP Jump Table so that, for example, vCPUs that have been parked by UEFI
>> can be started by the OS. Provide support for the AP Jump Table set/get
>> exit code.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  2 ++
>>   arch/x86/kvm/svm/sev.c          | 50 +++++++++++++++++++++++++++++++++
>>   arch/x86/kvm/svm/svm.c          |  7 +++++
>>   arch/x86/kvm/svm/svm.h          |  3 ++
>>   arch/x86/kvm/x86.c              |  9 ++++++
>>   5 files changed, 71 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> index 048b08437c33..60a3b9d33407 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1286,6 +1286,8 @@ struct kvm_x86_ops {
>>         void (*migrate_timers)(struct kvm_vcpu *vcpu);
>>       void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
>> +
>> +    void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
>>   };
>>     struct kvm_x86_nested_ops {
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index a7531de760b5..b47285384b1f 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -17,6 +17,8 @@
>>   #include <linux/processor.h>
>>   #include <linux/trace_events.h>
>>   +#include <asm/trapnr.h>
>> +
>>   #include "x86.h"
>>   #include "svm.h"
>>   #include "cpuid.h"
>> @@ -1449,6 +1451,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm
>> *svm)
>>           if (!ghcb_sw_scratch_is_valid(ghcb))
>>               goto vmgexit_err;
>>           break;
>> +    case SVM_VMGEXIT_AP_HLT_LOOP:
>> +    case SVM_VMGEXIT_AP_JUMP_TABLE:
>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>           break;
>>       default:
>> @@ -1770,6 +1774,35 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
>>                           control->exit_info_2,
>>                           svm->ghcb_sa);
>>           break;
>> +    case SVM_VMGEXIT_AP_HLT_LOOP:
>> +        svm->ap_hlt_loop = true;
>
> This value needs to be communicated to userspace.  Let's get this right
> from the beginning and use a new KVM_MP_STATE_* value instead (perhaps
> reuse KVM_MP_STATE_STOPPED but for x86 #define it as
> KVM_MP_STATE_AP_HOLD_RECEIVED?).

Ok, let me look into this.

>
>> @@ -68,6 +68,7 @@ struct kvm_sev_info {
>>      int fd;            /* SEV device fd */
>>      unsigned long pages_locked; /* Number of pages locked */
>>      struct list_head regions_list;  /* List of registered regions */
>> +    u64 ap_jump_table;    /* SEV-ES AP Jump Table address */
>
> Do you have any plans for migration of this value?  How does the guest
> ensure that the hypervisor does not screw with it?

I'll be sure that this is part of the SEV-ES live migration support.

For SEV-ES, we can't guarantee that the hypervisor doesn't screw with it.
This is something that SEV-SNP will be able to address.

Thanks,
Tom

>
> Paolo
>

2020-12-15 00:08:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 12/34] KVM: SVM: Add initial support for a VMGEXIT VMEXIT

On Mon, Dec 14, 2020, Tom Lendacky wrote:
> On 12/14/20 9:45 AM, Paolo Bonzini wrote:
> > On 10/12/20 18:09, Tom Lendacky wrote:
> >> @@ -3184,6 +3186,8 @@ static int svm_invoke_exit_handler(struct vcpu_svm
> >> *svm, u64 exit_code)
> >> ????????? return halt_interception(svm);
> >> ????? else if (exit_code == SVM_EXIT_NPF)
> >> ????????? return npf_interception(svm);
> >> +??? else if (exit_code == SVM_EXIT_VMGEXIT)
> >> +??????? return sev_handle_vmgexit(svm);
> >
> > Are these common enough to warrant putting them in this short list?
>
> A VMGEXIT exit occurs for any of the listed NAE events in the GHCB
> specification (e.g. CPUID, RDMSR/WRMSR, MMIO, port IO, etc.) if those
> events are being intercepted (or triggered in the case of MMIO). It will
> depend on what is considered common. Since SVM_EXIT_MSR was already in the
> list, I figured I should add VMGEXIT.

I agree VMGEXIT should be added to the hot path, it could very well be the most
common exit reason due to all instruction-based emulation getting funneled
through VMGEXIT.

2020-12-15 10:22:26

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 12/34] KVM: SVM: Add initial support for a VMGEXIT VMEXIT

On 14/12/20 20:41, Sean Christopherson wrote:
> I agree VMGEXIT should be added to the hot path, it could very well be the most
> common exit reason due to all instruction-based emulation getting funneled
> through VMGEXIT.

Yeah, I was thinking that not many guests will be SEV-ES. On the other
hand, it's quite likely to have an SEV-ES guest in your hands once the
really common non-VMGEXITs have been eliminated.

Paolo

2020-12-15 16:51:52

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 00/34] SEV-ES hypervisor support

On 12/14/20 12:13 PM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> This patch series provides support for running SEV-ES guests under KVM.
>>
>
> I'm queuing everything except patch 27, there's time to include it later
> in 5.11.
>
> Regarding MSRs, take a look at the series I'm sending shortly (or perhaps
> in a couple hours). For now I'll keep it in kvm/queue, but the plan is to
> get acks quickly and/or just include it in 5.11. Please try the kvm/queue
> branch to see if I screwed up anything.

I pulled and built kvm/queue and was able to launch a single vCPU SEV-ES
guest through OVMF and part way into the kernel before I hit an error. The
kernel tries to get the AP jump table address (which was part of patch
27). If I apply the following patch (just the jump table support from
patch 27), I can successfully boot a single vCPU SEV-ES guest:

KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting

From: Tom Lendacky <[email protected]>

The GHCB specification requires the hypervisor to save the address of an
AP Jump Table so that, for example, vCPUs that have been parked by UEFI
can be started by the OS. Provide support for the AP Jump Table set/get
exit code.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 28 ++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
2 files changed, 29 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6eb097714d43..8b5ef0fe4490 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,6 +18,8 @@
#include <linux/trace_events.h>
#include <asm/fpu/internal.h>

+#include <asm/trapnr.h>
+
#include "x86.h"
#include "svm.h"
#include "cpuid.h"
@@ -1559,6 +1561,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
break;
case SVM_VMGEXIT_NMI_COMPLETE:
+ case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
break;
default:
@@ -1883,6 +1886,31 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_NMI_COMPLETE:
ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
break;
+ case SVM_VMGEXIT_AP_JUMP_TABLE: {
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+ switch (control->exit_info_1) {
+ case 0:
+ /* Set AP jump table address */
+ sev->ap_jump_table = control->exit_info_2;
+ break;
+ case 1:
+ /* Get AP jump table address */
+ ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+ break;
+ default:
+ pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
+ control->exit_info_1);
+ ghcb_set_sw_exit_info_1(ghcb, 1);
+ ghcb_set_sw_exit_info_2(ghcb,
+ X86_TRAP_UD |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ }
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(&svm->vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a5067f776ce0..5431e6335e2e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -78,6 +78,7 @@ struct kvm_sev_info {
int fd; /* SEV device fd */
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list; /* List of registered regions */
+ u64 ap_jump_table; /* SEV-ES AP Jump Table address */
};

struct kvm_svm {

2020-12-15 17:48:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 00/34] SEV-ES hypervisor support

On 15/12/20 17:46, Tom Lendacky wrote:
> KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting

Great, thanks!

Paolo

> From: Tom Lendacky<[email protected]>
>
> The GHCB specification requires the hypervisor to save the address of an
> AP Jump Table so that, for example, vCPUs that have been parked by UEFI
> can be started by the OS. Provide support for the AP Jump Table set/get
> exit code.
>
> Signed-off-by: Tom Lendacky<[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 28 ++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.h | 1 +
> 2 files changed, 29 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6eb097714d43..8b5ef0fe4490 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -18,6 +18,8 @@
> #include <linux/trace_events.h>
> #include <asm/fpu/internal.h>
>
> +#include <asm/trapnr.h>
> +
> #include "x86.h"
> #include "svm.h"
> #include "cpuid.h"
> @@ -1559,6 +1561,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> goto vmgexit_err;
> break;
> case SVM_VMGEXIT_NMI_COMPLETE:
> + case SVM_VMGEXIT_AP_JUMP_TABLE:
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> break;
> default:
> @@ -1883,6 +1886,31 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_NMI_COMPLETE:
> ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
> break;
> + case SVM_VMGEXIT_AP_JUMP_TABLE: {
> + struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> +
> + switch (control->exit_info_1) {
> + case 0:
> + /* Set AP jump table address */
> + sev->ap_jump_table = control->exit_info_2;
> + break;
> + case 1:
> + /* Get AP jump table address */
> + ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
> + break;
> + default:
> + pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
> + control->exit_info_1);
> + ghcb_set_sw_exit_info_1(ghcb, 1);
> + ghcb_set_sw_exit_info_2(ghcb,
> + X86_TRAP_UD |
> + SVM_EVTINJ_TYPE_EXEPT |
> + SVM_EVTINJ_VALID);
> + }
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(&svm->vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a5067f776ce0..5431e6335e2e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -78,6 +78,7 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> + u64 ap_jump_table; /* SEV-ES AP Jump Table address */
> };
>
> struct kvm_svm {
>

2021-01-04 17:42:15

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

On 12/15/20 2:25 PM, Tom Lendacky wrote:
> On 12/14/20 1:46 PM, Tom Lendacky wrote:
>> On 12/14/20 10:03 AM, Paolo Bonzini wrote:
>>> On 10/12/20 18:10, Tom Lendacky wrote:
>>>> From: Tom Lendacky <[email protected]>
>>>>
>>>> + case SVM_VMGEXIT_AP_HLT_LOOP:
>>>> + svm->ap_hlt_loop = true;
>>>
>>> This value needs to be communicated to userspace. Let's get this right
>>> from the beginning and use a new KVM_MP_STATE_* value instead (perhaps
>>> reuse KVM_MP_STATE_STOPPED but for x86 #define it as
>>> KVM_MP_STATE_AP_HOLD_RECEIVED?).
>>
>> Ok, let me look into this.
>
> Paolo, is this something along the lines of what you were thinking, or am
> I off base? I created kvm_emulate_ap_reset_hold() to keep the code
> consolidated and remove the duplication, but can easily make those changes
> local to sev.c. I'd also like to rename SVM_VMGEXIT_AP_HLT_LOOP to
> SVM_VMGEXIT_AP_RESET_HOLD to more closely match the GHBC document, but
> that can be done later (if possible, since it is already part of the uapi
> include file).

Paolo, a quick ping after the holidays as to whether this is the approach
you were thinking. I think there are a couple of places in x86.c to update
(vcpu_block() and kvm_arch_vcpu_ioctl_get_mpstate()), also.

Thanks,
Tom

>
> Thanks,
> Tom
>
> ---
> KVM: SVM: Add support for booting APs for an SEV-ES guest
>
> From: Tom Lendacky <[email protected]>
>
> Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
> where the guest vCPU register state is updated and then the vCPU is VMRUN
> to begin execution of the AP. For an SEV-ES guest, this won't work because
> the guest register state is encrypted.
>
> Following the GHCB specification, the hypervisor must not alter the guest
> register state, so KVM must track an AP/vCPU boot. Should the guest want
> to park the AP, it must use the AP Reset Hold exit event in place of, for
> example, a HLT loop.
>
> First AP boot (first INIT-SIPI-SIPI sequence):
> Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
> support. It is up to the guest to transfer control of the AP to the
> proper location.
>
> Subsequent AP boot:
> KVM will expect to receive an AP Reset Hold exit event indicating that
> the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
> awaken it. When the AP Reset Hold exit event is received, KVM will place
> the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
> sequence, KVM will make the vCPU runnable. It is again up to the guest
> to then transfer control of the AP to the proper location.
>
> To differentiate between an actual HLT and an AP Reset Hold, a new MP
> state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
> placed in upon receiving the AP Reset Hold exit event. Additionally, to
> communicate the AP Reset Hold exit event up to userspace (if needed), a
> new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
>
> A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
> to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
> original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
> a new function that, for non SEV-ES guests, invokes the original SIPI
> delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
> implements the logic above.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/kvm/lapic.c | 2 +-
> arch/x86/kvm/svm/sev.c | 22 ++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 10 ++++++++++
> arch/x86/kvm/svm/svm.h | 2 ++
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 20 +++++++++++++++++---
> include/uapi/linux/kvm.h | 2 ++
> 8 files changed, 59 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 39707e72b062..23d7b203c060 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1287,6 +1287,8 @@ struct kvm_x86_ops {
> void (*migrate_timers)(struct kvm_vcpu *vcpu);
> void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
> int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> +
> + void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> };
>
> struct kvm_x86_nested_ops {
> @@ -1468,6 +1470,7 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
> int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
> int kvm_emulate_halt(struct kvm_vcpu *vcpu);
> int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
> +int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
> int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>
> void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 6a87623aa578..a2f08ed777d8 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> /* evaluate pending_events before reading the vector */
> smp_rmb();
> sipi_vector = apic->sipi_vector;
> - kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> + kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> }
> }
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 8b5ef0fe4490..4045de7f8f8b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1561,6 +1561,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> goto vmgexit_err;
> break;
> case SVM_VMGEXIT_NMI_COMPLETE:
> + case SVM_VMGEXIT_AP_HLT_LOOP:
> case SVM_VMGEXIT_AP_JUMP_TABLE:
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> break;
> @@ -1886,6 +1887,9 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_NMI_COMPLETE:
> ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
> break;
> + case SVM_VMGEXIT_AP_HLT_LOOP:
> + ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
> + break;
> case SVM_VMGEXIT_AP_JUMP_TABLE: {
> struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
>
> @@ -2038,3 +2042,21 @@ void sev_es_vcpu_put(struct vcpu_svm *svm)
> wrmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
> }
> }
> +
> +void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + /* First SIPI: Use the values as initially set by the VMM */
> + if (!svm->received_first_sipi) {
> + svm->received_first_sipi = true;
> + return;
> + }
> +
> + /*
> + * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
> + * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
> + * non-zero value.
> + */
> + ghcb_set_sw_exit_info_2(svm->ghcb, 1);
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 941e5251e13f..5c37fa68ee56 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4382,6 +4382,14 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> (vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
> }
>
> +static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> +{
> + if (!sev_es_guest(vcpu->kvm))
> + return kvm_vcpu_deliver_sipi_vector(vcpu, vector);
> +
> + sev_vcpu_deliver_sipi_vector(vcpu, vector);
> +}
> +
> static void svm_vm_destroy(struct kvm *kvm)
> {
> avic_vm_destroy(kvm);
> @@ -4524,6 +4532,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .msr_filter_changed = svm_msr_filter_changed,
> .complete_emulated_msr = svm_complete_emulated_msr,
> +
> + .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> };
>
> static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5431e6335e2e..0fe874ae5498 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -185,6 +185,7 @@ struct vcpu_svm {
> struct vmcb_save_area *vmsa;
> struct ghcb *ghcb;
> struct kvm_host_map ghcb_map;
> + bool received_first_sipi;
>
> /* SEV-ES scratch area support */
> void *ghcb_sa;
> @@ -591,6 +592,7 @@ void sev_es_init_vmcb(struct vcpu_svm *svm);
> void sev_es_create_vcpu(struct vcpu_svm *svm);
> void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
> void sev_es_vcpu_put(struct vcpu_svm *svm);
> +void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>
> /* vmenter.S */
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 75c9c6a0a3a4..2af05d3b0590 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7707,6 +7707,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> .msr_filter_changed = vmx_msr_filter_changed,
> .complete_emulated_msr = kvm_complete_insn_gp,
> .cpu_dirty_log_size = vmx_cpu_dirty_log_size,
> +
> + .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
> };
>
> static __init int hardware_setup(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 648c677b12e9..622612f88da7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7974,17 +7974,22 @@ void kvm_arch_exit(void)
> kmem_cache_destroy(x86_fpu_cache);
> }
>
> -int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> +int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
> {
> ++vcpu->stat.halt_exits;
> if (lapic_in_kernel(vcpu)) {
> - vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> + vcpu->arch.mp_state = state;
> return 1;
> } else {
> - vcpu->run->exit_reason = KVM_EXIT_HLT;
> + vcpu->run->exit_reason = reason;
> return 0;
> }
> }
> +
> +int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> +{
> + return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
> +}
> EXPORT_SYMBOL_GPL(kvm_vcpu_halt);
>
> int kvm_emulate_halt(struct kvm_vcpu *vcpu)
> @@ -7998,6 +8003,14 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvm_emulate_halt);
>
> +int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
> +{
> + int ret = kvm_skip_emulated_instruction(vcpu);
> +
> + return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
> +
> #ifdef CONFIG_X86_64
> static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
> unsigned long clock_type)
> @@ -10150,6 +10163,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
> kvm_rip_write(vcpu, 0);
> }
> +EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
>
> int kvm_arch_hardware_enable(void)
> {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 886802b8ffba..374c67875cdb 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -251,6 +251,7 @@ struct kvm_hyperv_exit {
> #define KVM_EXIT_X86_RDMSR 29
> #define KVM_EXIT_X86_WRMSR 30
> #define KVM_EXIT_DIRTY_RING_FULL 31
> +#define KVM_EXIT_AP_RESET_HOLD 32
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -573,6 +574,7 @@ struct kvm_vapic_addr {
> #define KVM_MP_STATE_CHECK_STOP 6
> #define KVM_MP_STATE_OPERATING 7
> #define KVM_MP_STATE_LOAD 8
> +#define KVM_MP_STATE_AP_RESET_HOLD 9
>
> struct kvm_mp_state {
> __u32 mp_state;
>

2021-01-04 17:54:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

On 04/01/21 18:38, Tom Lendacky wrote:
>>
>> Paolo, is this something along the lines of what you were thinking, or am
>> I off base? I created kvm_emulate_ap_reset_hold() to keep the code
>> consolidated and remove the duplication, but can easily make those
>> changes
>> local to sev.c. I'd also like to rename SVM_VMGEXIT_AP_HLT_LOOP to
>> SVM_VMGEXIT_AP_RESET_HOLD to more closely match the GHBC document, but
>> that can be done later (if possible, since it is already part of the uapi
>> include file).
>
> Paolo, a quick ping after the holidays as to whether this is the
> approach you were thinking. I think there are a couple of places in
> x86.c to update (vcpu_block() and kvm_arch_vcpu_ioctl_get_mpstate()), also.

Yes, this is the basic idea.

Paolo

2021-01-04 20:22:45

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v5.1 27/34] KVM: SVM: Add support for booting APs in an SEV-ES guest

From: Tom Lendacky <[email protected]>

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.

Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.

To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/svm/sev.c | 22 ++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 10 ++++++++++
arch/x86/kvm/svm/svm.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/x86.c | 26 +++++++++++++++++++++-----
include/uapi/linux/kvm.h | 2 ++
8 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 39707e72b062..23d7b203c060 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1287,6 +1287,8 @@ struct kvm_x86_ops {
void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
+
+ void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
};

struct kvm_x86_nested_ops {
@@ -1468,6 +1470,7 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
int kvm_emulate_halt(struct kvm_vcpu *vcpu);
int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);

void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6a87623aa578..a2f08ed777d8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
- kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
+ kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
}
}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e57847ff8bd2..a08cbc04cb4d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1563,6 +1563,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
break;
case SVM_VMGEXIT_NMI_COMPLETE:
+ case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
break;
@@ -1888,6 +1889,9 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_NMI_COMPLETE:
ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
break;
+ case SVM_VMGEXIT_AP_HLT_LOOP:
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+ break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;

@@ -2040,3 +2044,21 @@ void sev_es_vcpu_put(struct vcpu_svm *svm)
wrmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
}
}
+
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ /* First SIPI: Use the values as initially set by the VMM */
+ if (!svm->received_first_sipi) {
+ svm->received_first_sipi = true;
+ return;
+ }
+
+ /*
+ * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
+ * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
+ * non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 941e5251e13f..5c37fa68ee56 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4382,6 +4382,14 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
(vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
}

+static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+ if (!sev_es_guest(vcpu->kvm))
+ return kvm_vcpu_deliver_sipi_vector(vcpu, vector);
+
+ sev_vcpu_deliver_sipi_vector(vcpu, vector);
+}
+
static void svm_vm_destroy(struct kvm *kvm)
{
avic_vm_destroy(kvm);
@@ -4524,6 +4532,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.msr_filter_changed = svm_msr_filter_changed,
.complete_emulated_msr = svm_complete_emulated_msr,
+
+ .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
};

static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5431e6335e2e..0fe874ae5498 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -185,6 +185,7 @@ struct vcpu_svm {
struct vmcb_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
+ bool received_first_sipi;

/* SEV-ES scratch area support */
void *ghcb_sa;
@@ -591,6 +592,7 @@ void sev_es_init_vmcb(struct vcpu_svm *svm);
void sev_es_create_vcpu(struct vcpu_svm *svm);
void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
void sev_es_vcpu_put(struct vcpu_svm *svm);
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);

/* vmenter.S */

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 75c9c6a0a3a4..2af05d3b0590 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7707,6 +7707,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.msr_filter_changed = vmx_msr_filter_changed,
.complete_emulated_msr = kvm_complete_insn_gp,
.cpu_dirty_log_size = vmx_cpu_dirty_log_size,
+
+ .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
};

static __init int hardware_setup(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 648c677b12e9..660683a70b79 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7974,17 +7974,22 @@ void kvm_arch_exit(void)
kmem_cache_destroy(x86_fpu_cache);
}

-int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
+int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
{
++vcpu->stat.halt_exits;
if (lapic_in_kernel(vcpu)) {
- vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+ vcpu->arch.mp_state = state;
return 1;
} else {
- vcpu->run->exit_reason = KVM_EXIT_HLT;
+ vcpu->run->exit_reason = reason;
return 0;
}
}
+
+int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
+{
+ return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
+}
EXPORT_SYMBOL_GPL(kvm_vcpu_halt);

int kvm_emulate_halt(struct kvm_vcpu *vcpu)
@@ -7998,6 +8003,14 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_halt);

+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
+{
+ int ret = kvm_skip_emulated_instruction(vcpu);
+
+ return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
+
#ifdef CONFIG_X86_64
static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
unsigned long clock_type)
@@ -9092,6 +9105,7 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
kvm_apic_accept_events(vcpu);
switch(vcpu->arch.mp_state) {
case KVM_MP_STATE_HALTED:
+ case KVM_MP_STATE_AP_RESET_HOLD:
vcpu->arch.pv.pv_unhalted = false;
vcpu->arch.mp_state =
KVM_MP_STATE_RUNNABLE;
@@ -9518,8 +9532,9 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
kvm_load_guest_fpu(vcpu);

kvm_apic_accept_events(vcpu);
- if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED &&
- vcpu->arch.pv.pv_unhalted)
+ if ((vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
+ vcpu->arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD) &&
+ vcpu->arch.pv.pv_unhalted)
mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
else
mp_state->mp_state = vcpu->arch.mp_state;
@@ -10150,6 +10165,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
kvm_rip_write(vcpu, 0);
}
+EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

int kvm_arch_hardware_enable(void)
{
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 886802b8ffba..374c67875cdb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -251,6 +251,7 @@ struct kvm_hyperv_exit {
#define KVM_EXIT_X86_RDMSR 29
#define KVM_EXIT_X86_WRMSR 30
#define KVM_EXIT_DIRTY_RING_FULL 31
+#define KVM_EXIT_AP_RESET_HOLD 32

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -573,6 +574,7 @@ struct kvm_vapic_addr {
#define KVM_MP_STATE_CHECK_STOP 6
#define KVM_MP_STATE_OPERATING 7
#define KVM_MP_STATE_LOAD 8
+#define KVM_MP_STATE_AP_RESET_HOLD 9

struct kvm_mp_state {
__u32 mp_state;
--
2.29.2

2021-01-07 18:16:18

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5.1 27/34] KVM: SVM: Add support for booting APs in an SEV-ES guest

On 04/01/21 21:20, Tom Lendacky wrote:
> From: Tom Lendacky <[email protected]>
>
> Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
> where the guest vCPU register state is updated and then the vCPU is VMRUN
> to begin execution of the AP. For an SEV-ES guest, this won't work because
> the guest register state is encrypted.
>
> Following the GHCB specification, the hypervisor must not alter the guest
> register state, so KVM must track an AP/vCPU boot. Should the guest want
> to park the AP, it must use the AP Reset Hold exit event in place of, for
> example, a HLT loop.
>
> First AP boot (first INIT-SIPI-SIPI sequence):
> Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
> support. It is up to the guest to transfer control of the AP to the
> proper location.
>
> Subsequent AP boot:
> KVM will expect to receive an AP Reset Hold exit event indicating that
> the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
> awaken it. When the AP Reset Hold exit event is received, KVM will place
> the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
> sequence, KVM will make the vCPU runnable. It is again up to the guest
> to then transfer control of the AP to the proper location.
>
> To differentiate between an actual HLT and an AP Reset Hold, a new MP
> state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
> placed in upon receiving the AP Reset Hold exit event. Additionally, to
> communicate the AP Reset Hold exit event up to userspace (if needed), a
> new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
>
> A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
> to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
> original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
> a new function that, for non SEV-ES guests, invokes the original SIPI
> delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
> implements the logic above.
>
> Signed-off-by: Tom Lendacky <[email protected]>

Queued, thanks.

Paolo

> ---
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/kvm/lapic.c | 2 +-
> arch/x86/kvm/svm/sev.c | 22 ++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 10 ++++++++++
> arch/x86/kvm/svm/svm.h | 2 ++
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 26 +++++++++++++++++++++-----
> include/uapi/linux/kvm.h | 2 ++
> 8 files changed, 63 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 39707e72b062..23d7b203c060 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1287,6 +1287,8 @@ struct kvm_x86_ops {
> void (*migrate_timers)(struct kvm_vcpu *vcpu);
> void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
> int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> +
> + void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> };
>
> struct kvm_x86_nested_ops {
> @@ -1468,6 +1470,7 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
> int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
> int kvm_emulate_halt(struct kvm_vcpu *vcpu);
> int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
> +int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
> int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>
> void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 6a87623aa578..a2f08ed777d8 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> /* evaluate pending_events before reading the vector */
> smp_rmb();
> sipi_vector = apic->sipi_vector;
> - kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> + kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> }
> }
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index e57847ff8bd2..a08cbc04cb4d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1563,6 +1563,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> goto vmgexit_err;
> break;
> case SVM_VMGEXIT_NMI_COMPLETE:
> + case SVM_VMGEXIT_AP_HLT_LOOP:
> case SVM_VMGEXIT_AP_JUMP_TABLE:
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> break;
> @@ -1888,6 +1889,9 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_NMI_COMPLETE:
> ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
> break;
> + case SVM_VMGEXIT_AP_HLT_LOOP:
> + ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
> + break;
> case SVM_VMGEXIT_AP_JUMP_TABLE: {
> struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
>
> @@ -2040,3 +2044,21 @@ void sev_es_vcpu_put(struct vcpu_svm *svm)
> wrmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
> }
> }
> +
> +void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + /* First SIPI: Use the values as initially set by the VMM */
> + if (!svm->received_first_sipi) {
> + svm->received_first_sipi = true;
> + return;
> + }
> +
> + /*
> + * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
> + * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
> + * non-zero value.
> + */
> + ghcb_set_sw_exit_info_2(svm->ghcb, 1);
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 941e5251e13f..5c37fa68ee56 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4382,6 +4382,14 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> (vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
> }
>
> +static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> +{
> + if (!sev_es_guest(vcpu->kvm))
> + return kvm_vcpu_deliver_sipi_vector(vcpu, vector);
> +
> + sev_vcpu_deliver_sipi_vector(vcpu, vector);
> +}
> +
> static void svm_vm_destroy(struct kvm *kvm)
> {
> avic_vm_destroy(kvm);
> @@ -4524,6 +4532,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .msr_filter_changed = svm_msr_filter_changed,
> .complete_emulated_msr = svm_complete_emulated_msr,
> +
> + .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> };
>
> static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5431e6335e2e..0fe874ae5498 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -185,6 +185,7 @@ struct vcpu_svm {
> struct vmcb_save_area *vmsa;
> struct ghcb *ghcb;
> struct kvm_host_map ghcb_map;
> + bool received_first_sipi;
>
> /* SEV-ES scratch area support */
> void *ghcb_sa;
> @@ -591,6 +592,7 @@ void sev_es_init_vmcb(struct vcpu_svm *svm);
> void sev_es_create_vcpu(struct vcpu_svm *svm);
> void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
> void sev_es_vcpu_put(struct vcpu_svm *svm);
> +void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>
> /* vmenter.S */
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 75c9c6a0a3a4..2af05d3b0590 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7707,6 +7707,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> .msr_filter_changed = vmx_msr_filter_changed,
> .complete_emulated_msr = kvm_complete_insn_gp,
> .cpu_dirty_log_size = vmx_cpu_dirty_log_size,
> +
> + .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
> };
>
> static __init int hardware_setup(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 648c677b12e9..660683a70b79 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7974,17 +7974,22 @@ void kvm_arch_exit(void)
> kmem_cache_destroy(x86_fpu_cache);
> }
>
> -int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> +int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
> {
> ++vcpu->stat.halt_exits;
> if (lapic_in_kernel(vcpu)) {
> - vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> + vcpu->arch.mp_state = state;
> return 1;
> } else {
> - vcpu->run->exit_reason = KVM_EXIT_HLT;
> + vcpu->run->exit_reason = reason;
> return 0;
> }
> }
> +
> +int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> +{
> + return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
> +}
> EXPORT_SYMBOL_GPL(kvm_vcpu_halt);
>
> int kvm_emulate_halt(struct kvm_vcpu *vcpu)
> @@ -7998,6 +8003,14 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvm_emulate_halt);
>
> +int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
> +{
> + int ret = kvm_skip_emulated_instruction(vcpu);
> +
> + return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
> +
> #ifdef CONFIG_X86_64
> static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
> unsigned long clock_type)
> @@ -9092,6 +9105,7 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> kvm_apic_accept_events(vcpu);
> switch(vcpu->arch.mp_state) {
> case KVM_MP_STATE_HALTED:
> + case KVM_MP_STATE_AP_RESET_HOLD:
> vcpu->arch.pv.pv_unhalted = false;
> vcpu->arch.mp_state =
> KVM_MP_STATE_RUNNABLE;
> @@ -9518,8 +9532,9 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> kvm_load_guest_fpu(vcpu);
>
> kvm_apic_accept_events(vcpu);
> - if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED &&
> - vcpu->arch.pv.pv_unhalted)
> + if ((vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
> + vcpu->arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD) &&
> + vcpu->arch.pv.pv_unhalted)
> mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
> else
> mp_state->mp_state = vcpu->arch.mp_state;
> @@ -10150,6 +10165,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
> kvm_rip_write(vcpu, 0);
> }
> +EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
>
> int kvm_arch_hardware_enable(void)
> {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 886802b8ffba..374c67875cdb 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -251,6 +251,7 @@ struct kvm_hyperv_exit {
> #define KVM_EXIT_X86_RDMSR 29
> #define KVM_EXIT_X86_WRMSR 30
> #define KVM_EXIT_DIRTY_RING_FULL 31
> +#define KVM_EXIT_AP_RESET_HOLD 32
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -573,6 +574,7 @@ struct kvm_vapic_addr {
> #define KVM_MP_STATE_CHECK_STOP 6
> #define KVM_MP_STATE_OPERATING 7
> #define KVM_MP_STATE_LOAD 8
> +#define KVM_MP_STATE_AP_RESET_HOLD 9
>
> struct kvm_mp_state {
> __u32 mp_state;
>

2021-01-07 19:55:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5.1 27/34] KVM: SVM: Add support for booting APs in an SEV-ES guest

On 1/7/21 12:13 PM, Paolo Bonzini wrote:
> On 04/01/21 21:20, Tom Lendacky wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
>> where the guest vCPU register state is updated and then the vCPU is VMRUN
>> to begin execution of the AP. For an SEV-ES guest, this won't work because
>> the guest register state is encrypted.
>>
>> Following the GHCB specification, the hypervisor must not alter the guest
>> register state, so KVM must track an AP/vCPU boot. Should the guest want
>> to park the AP, it must use the AP Reset Hold exit event in place of, for
>> example, a HLT loop.
>>
>> First AP boot (first INIT-SIPI-SIPI sequence):
>>    Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
>>    support. It is up to the guest to transfer control of the AP to the
>>    proper location.
>>
>> Subsequent AP boot:
>>    KVM will expect to receive an AP Reset Hold exit event indicating that
>>    the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
>>    awaken it. When the AP Reset Hold exit event is received, KVM will place
>>    the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
>>    sequence, KVM will make the vCPU runnable. It is again up to the guest
>>    to then transfer control of the AP to the proper location.
>>
>>    To differentiate between an actual HLT and an AP Reset Hold, a new MP
>>    state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
>>    placed in upon receiving the AP Reset Hold exit event. Additionally, to
>>    communicate the AP Reset Hold exit event up to userspace (if needed), a
>>    new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
>>
>> A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
>> to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
>> original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
>> a new function that, for non SEV-ES guests, invokes the original SIPI
>> delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
>> implements the logic above.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>
> Queued, thanks.

Thanks, Paolo!

Tom

>
> Paolo
>