2021-08-23 16:30:01

by Peter Gonda

[permalink] [raw]
Subject: [PATCH 0/2 V5] Add AMD SEV and SEV-ES intra host migration support

Intra host migration provides a low-cost mechanism for userspace VMM
upgrades. It is an alternative to traditional (i.e., remote) live
migration. Whereas remote migration handles moving a guest to a new host,
intra host migration only handles moving a guest to a new userspace VMM
within a host. This can be used to update, rollback, change flags of the
VMM, etc. The lower cost compared to live migration comes from the fact
that the guest's memory does not need to be copied between processes. A
handle to the guest memory simply gets passed to the new VMM, this could
be done via /dev/shm with share=on or similar feature.

The guest state can be transferred from an old VMM to a new VMM as follows:
1. Export guest state from KVM to the old user-space VMM via a getter
user-space/kernel API 2. Transfer guest state from old VMM to new VMM via
IPC communication 3. Import guest state into KVM from the new user-space
VMM via a setter user-space/kernel API VMMs by exporting from KVM using
getters, sending that data to the new VMM, then setting it again in KVM.

In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant. As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next. For
example, we need to move the SEV enabled ASID, VMSAs, and GHCB metadata
from one VMM to the next. In general, we need to be able to hand off any
data that would be unsafe/impossible for the kernel to hand directly to
userspace (and cannot be reproduced using data that can be handed safely to
userspace).

For the intra host operation the SEV required metadata, the source VM FD is
sent to the target VMM. The target VMM calls the new cap ioctl with the
source VM FD, KVM then moves all the SEV state to the target VM from the
source VM.

V5:
* Fix up locking scheme
* Address marcorr@ comments.

V4:
* Move to seanjc@'s suggestion of source VM FD based single ioctl design.

v3:
* Fix memory leak found by dan.carpenter@

v2:
* Added marcorr@ reviewed by tag
* Renamed function introduced in 1/3
* Edited with seanjc@'s review comments
** Cleaned up WARN usage
** Userspace makes random token now
* Edited with brijesh.singh@'s review comments
** Checks for different LAUNCH_* states in send function

v1: https://lore.kernel.org/kvm/[email protected]/

Peter Gonda (2):
KVM, SEV: Add support for SEV intra host migration
KVM, SEV: Add support for SEV-ES intra host migration

Documentation/virt/kvm/api.rst | 15 +++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/sev.c | 157 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 +
arch/x86/kvm/x86.c | 5 +
include/uapi/linux/kvm.h | 1 +
7 files changed, 182 insertions(+)

base-commit: a3e0b8bd99ab

Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dr. David Alan Gilbert <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]

--
2.33.0.rc2.250.ged5fa647cd-goog


2021-08-23 16:31:06

by Peter Gonda

[permalink] [raw]
Subject: [PATCH 2/2 V5] KVM, SEV: Add support for SEV-ES intra host migration

For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
and other SEV-ES info needs to be preserved along with the guest's
memory.

Signed-off-by: Peter Gonda <[email protected]>
Cc: Marc Orr <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dr. David Alan Gilbert <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3467e18d63e0..f17bdf5ce723 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1545,6 +1545,59 @@ static void migrate_info_from(struct kvm_sev_info *dst,
list_replace_init(&src->regions_list, &dst->regions_list);
}

+static int migrate_vmsa_from(struct kvm *dst, struct kvm *src)
+{
+ int i, num_vcpus;
+ struct kvm_vcpu *dst_vcpu, *src_vcpu;
+ struct vcpu_svm *dst_svm, *src_svm;
+
+ num_vcpus = atomic_read(&dst->online_vcpus);
+ if (num_vcpus != atomic_read(&src->online_vcpus)) {
+ pr_warn_ratelimited(
+ "Source and target VMs must have same number of vCPUs.\n");
+ return -EINVAL;
+ }
+
+ for (i = 0; i < num_vcpus; ++i) {
+ src_vcpu = src->vcpus[i];
+ if (!src_vcpu->arch.guest_state_protected) {
+ pr_warn_ratelimited(
+ "Source ES VM vCPUs must have protected state.\n");
+ return -EINVAL;
+ }
+ }
+
+ for (i = 0; i < num_vcpus; ++i) {
+ src_vcpu = src->vcpus[i];
+ src_svm = to_svm(src_vcpu);
+ dst_vcpu = dst->vcpus[i];
+ dst_svm = to_svm(dst_vcpu);
+
+ /*
+ * Copy VMSA and GHCB fields from the source to the destination.
+ * Clear them on the source to prevent the VM running and
+ * changing the state of the VMSA/GHCB unexpectedly.
+ */
+ dst_vcpu->vcpu_id = src_vcpu->vcpu_id;
+ dst_svm->vmsa = src_svm->vmsa;
+ src_svm->vmsa = NULL;
+ dst_svm->ghcb = src_svm->ghcb;
+ src_svm->ghcb = NULL;
+ dst_svm->vmcb->control.ghcb_gpa =
+ src_svm->vmcb->control.ghcb_gpa;
+ src_svm->vmcb->control.ghcb_gpa = 0;
+ dst_svm->ghcb_sa = src_svm->ghcb_sa;
+ src_svm->ghcb_sa = NULL;
+ dst_svm->ghcb_sa_len = src_svm->ghcb_sa_len;
+ src_svm->ghcb_sa_len = 0;
+ dst_svm->ghcb_sa_sync = src_svm->ghcb_sa_sync;
+ src_svm->ghcb_sa_sync = false;
+ dst_svm->ghcb_sa_free = src_svm->ghcb_sa_free;
+ src_svm->ghcb_sa_free = false;
+ }
+ return 0;
+}
+
int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
{
struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
@@ -1556,7 +1609,7 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
if (ret)
return ret;

- if (!sev_guest(kvm) || sev_es_guest(kvm)) {
+ if (!sev_guest(kvm)) {
ret = -EINVAL;
pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
goto out_unlock;
@@ -1580,13 +1633,18 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
if (ret)
goto out_fput;

- if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+ if (!sev_guest(source_kvm)) {
ret = -EINVAL;
pr_warn_ratelimited(
"Source VM must be SEV enabled to migrate from.\n");
goto out_source;
}

+ if (sev_es_guest(kvm)) {
+ ret = migrate_vmsa_from(kvm, source_kvm);
+ if (ret)
+ goto out_source;
+ }
migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
ret = 0;

--
2.33.0.rc2.250.ged5fa647cd-goog

2021-08-23 17:23:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/2 V5] Add AMD SEV and SEV-ES intra host migration support

On Mon, Aug 23, 2021, Peter Gonda wrote:
> V5:
> * Fix up locking scheme

Please add a selftest to prove/verify the anti-deadlock scheme actually works.
Unless I'm mistaken, only KVM_SEV_INIT needs to be invoked, i.e. the selftest
wouldn't need anything remotely close to full SEV support. And that means it
should be trivial to verify the success path as well. E.g. create three SEV VMs
(A, B, and C) and verify migrating from any VM to any other VM works (since none
of the VMs have memory regions). Then spin up eight pthreads and have each thread
concurrently migrate a specific combination an arbitrary number of times. Ignore
whether the migration failed or succeeded, "success" from the test's perspective
is purely that it completed, i.e. didn't deadlock.