Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1658243pxf; Fri, 12 Mar 2021 15:49:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJziw2idRLeFoUllJJ2n0FYQ4nBBnO48aY9LBi8ve9Ju7lXJdNP/evaXfozrdsYZmupMYuFe X-Received: by 2002:a17:906:cd12:: with SMTP id oz18mr11252809ejb.498.1615592980784; Fri, 12 Mar 2021 15:49:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615592980; cv=none; d=google.com; s=arc-20160816; b=Chsb6RUevZ+B8ViesGgP2Hpx3XrwOR2sHw9fcL4/rd2q5lJ7M1H/BAm8e0hDo2Q/7s TNHMrvMk+lrWLubYqwxci6PHK9Ee1vAVCzMyr1+HX57I0ESCnPvb0ExMY4bVtypa8owf XULPe+YRz3hG1j/lJxx+BTPm9NzphzclcCCfc9RQ40dce55XH1Ejahi5DYofs9acxeJx iBloT+Wllfn6aTAq0t3J5/iiON8jXfv0l4U4dmv6V8fqAwVPJblAO7qucrCoW8IySszc XyuHZfgq6F3tlCMqkx03609QXMYvGwSGe+MdZoyaTGxB4TUpZ7Ckq56eudQuAMh5G0SF kABg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=bjFvm7GKsUKfRs9e/UxDco4aKi8EfcNZsKLV6pl4eOs=; b=ih6Uz6QosLHndfWC1MPIQY+xbGyi4AmbpA0dc5/opPmIdjiOVJKUXDPqX7Fp7yVXtw oyniMI9y8PD59Wndp0p/07tMdspFJNSpfb9Tf99TlVNAvnmOtg8U0bKLz5pyv5YfdOAf b67bdpp0ErozRZTLtk/e9YmMpfb7lbHNhONf8oqQqoMCTrjZA+58goFZJVtZ1KrdbsAQ V97CC0oBhfxp0ppZdwGlAdeYrZp1xD5dpsRbM/FAwpmB7NagXXX4PPXlzv09+FvHmJuR TDoBzYjbPnJZKUdKboULO3ofvQvePL28GGJkxPVOkF1Qu4VPKq5rreTeNiuvCk6tIDTl obyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=DJ7sCUKj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t13si4986322ejr.172.2021.03.12.15.49.18; Fri, 12 Mar 2021 15:49:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=DJ7sCUKj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235690AbhCLXrv (ORCPT + 99 others); Fri, 12 Mar 2021 18:47:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235895AbhCLXrl (ORCPT ); Fri, 12 Mar 2021 18:47:41 -0500 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15642C061574 for ; Fri, 12 Mar 2021 15:47:41 -0800 (PST) Received: by mail-pg1-x52e.google.com with SMTP id t26so16861190pgv.3 for ; Fri, 12 Mar 2021 15:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bjFvm7GKsUKfRs9e/UxDco4aKi8EfcNZsKLV6pl4eOs=; b=DJ7sCUKjOc5WbvYqhvNsNgc2FV/Cksy6090XmuWgRU3SVAWAux8J2qQhk8DkhTd6gW 0Bm75UMGlzQ9ITZJHOvUeH39Py+FHn6B0ZubvFeKCwrcSw8K7OUw7SyvEugtxpDXJRKn g52E5mfgsz2Nea7VPWnLbo3Y0QUzkaCFbmnaJ/MYUR3VDPyQV+/uUkEhuhqOq7u0A0FB 4HD80ckZLBRY/CD1d5/zZ2IazLbxnlnIBNv3cdRWvGsY7WqEoXDrWLcolabRuEhP/t4j 6UoPS+Up41yjOHthZtBfbHSWOA5+w60XpojbYYj84sDWe0O8eQtymO/zaVEmPH0xvcKh 7gwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bjFvm7GKsUKfRs9e/UxDco4aKi8EfcNZsKLV6pl4eOs=; b=jyO8FOgVssicORTXkVqe3VnQpzn1yA1K9qI7sR/IpKEpkKGrCih3KQ8o9OUrRXBvsf kK7vRv5yO0I5m9ETjDNgbcP+8nZzUOzIdPCl++mi0b36ShPW/4gB8ocPEvdwuFYJydiU ysyFSfw6kTm87K94HHS8nO6Ed31jfovvSA/+vWmwanSLpSJwTBNYJ1YNQh21y6hzUdw+ 33uREBWnltM8ZCtwl+OYnHZ3aK65jbVIJlnl2MeqNTAhTGqT6a1pqNzzI4b/sfEQhONT uN8PE7X4QzXowmwsr+IpnWRDzprsePBSKdBxW9yAki+G4ThNpWN6V4e/blwsOD3ZZxiu Nj5w== X-Gm-Message-State: AOAM531DPp63bcbTO3BzcE+zQX78x/bTFsRHK7Dz97N74lU+pAIN+AQH vJXRcqz9Wr1rXjIvjz4gOflBre9QVQylxGKY3EBvag== X-Received: by 2002:a62:ea09:0:b029:1ee:3bac:8012 with SMTP id t9-20020a62ea090000b02901ee3bac8012mr551470pfh.35.1615592859872; Fri, 12 Mar 2021 15:47:39 -0800 (PST) MIME-Version: 1.0 References: <20210224085915.28751-1-natet@google.com> In-Reply-To: From: Nathan Tempelman Date: Fri, 12 Mar 2021 15:47:28 -0800 Message-ID: Subject: Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context To: Sean Christopherson Cc: Paolo Bonzini , Thomas Lendacky , x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Steve Rutherford , David Rientjes , Brijesh Singh , Ashish Kalra Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 24, 2021 at 9:37 AM Sean Christopherson wrote: > > On Wed, Feb 24, 2021, Nathan Tempelman wrote: > > static bool __sev_recycle_asids(int min_asid, int max_asid) > > { > > @@ -1124,6 +1129,10 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) > > if (copy_from_user(&sev_cmd, argp, sizeof(struct kvm_sev_cmd))) > > return -EFAULT; > > > > + /* enc_context_owner handles all memory enc operations */ > > + if (is_mirroring_enc_context(kvm)) > > + return -ENOTTY; > > Is this strictly necessary? Honest question, as I don't know the hardware/PSP > flows well enough to understand how asids are tied to the state managed by the > PSP. > > > + > > mutex_lock(&kvm->lock); > > > > switch (sev_cmd.id) { > > @@ -1186,6 +1195,10 @@ int svm_register_enc_region(struct kvm *kvm, > > if (!sev_guest(kvm)) > > return -ENOTTY; > > > > + /* If kvm is mirroring encryption context it isn't responsible for it */ > > + if (is_mirroring_enc_context(kvm)) > > Hmm, preventing the mirror from pinning memory only works if the two VMs are in > the same address space (process), which isn't guaranteed/enforced by the ioctl(). > Obviously we could check and enforce that, but do we really need to? > > Part of me thinks it would be better to treat the new ioctl() as a variant of > sev_guest_init(), i.e. purely make this a way to share asids. > > > + return -ENOTTY; > > + > > if (range->addr > ULONG_MAX || range->size > ULONG_MAX) > > return -EINVAL; > > > > @@ -1252,6 +1265,10 @@ int svm_unregister_enc_region(struct kvm *kvm, > > struct enc_region *region; > > int ret; > > > > + /* If kvm is mirroring encryption context it isn't responsible for it */ > > + if (is_mirroring_enc_context(kvm)) > > + return -ENOTTY; > > + > > mutex_lock(&kvm->lock); > > > > if (!sev_guest(kvm)) { > > @@ -1282,6 +1299,65 @@ int svm_unregister_enc_region(struct kvm *kvm, > > return ret; > > } > > > > +int svm_vm_copy_asid_to(struct kvm *kvm, unsigned int mirror_kvm_fd) > > +{ > > + struct file *mirror_kvm_file; > > + struct kvm *mirror_kvm; > > + struct kvm_sev_info *mirror_kvm_sev; > > What about using src and dst, e.g. src_kvm, dest_kvm_fd, dest_kvm, etc...? For > my brain, the mirror terminology adds an extra layer of translation. I like source, but I think I'll keep mirror. I think it captures the current state of it better--this isn't it's own full featured sev vm, in a sense it's a reflection of the source. Unless everyone found this confusing? > > > + unsigned int asid; > > + int ret; > > + > > + if (!sev_guest(kvm)) > > + return -ENOTTY; > > + > > + mutex_lock(&kvm->lock); > > + > > + /* Mirrors of mirrors should work, but let's not get silly */ > > Do we really care? > > > + if (is_mirroring_enc_context(kvm)) { > > + ret = -ENOTTY; > > + goto failed; > > + } > > + > > + mirror_kvm_file = fget(mirror_kvm_fd); > > + if (!kvm_is_kvm(mirror_kvm_file)) { > > + ret = -EBADF; > > + goto failed; > > + } > > + > > + mirror_kvm = mirror_kvm_file->private_data; > > + > > + if (mirror_kvm == kvm || is_mirroring_enc_context(mirror_kvm)) { > > This is_mirroring_enc_context() check needs to be after mirror_kvm->lock is > acquired, else there's a TOCTOU race. Nice. Yeah, I've flipped it around as per Paolo's point and this problem goes away. > > I also suspect there needs to be more checks on the destination. E.g. what > happens if the destination already has vCPUs that are currently running? Though > on that front, sev_guest_init() also doesn't guard against this. Feels like > that flow and this one should check kvm->created_vcpus. > > > + ret = -ENOTTY; > > + fput(mirror_kvm_file); > > Nit, probably worth adding a second error label to handle this fput(), e.g. in > case additional checks are needed in the future. Actually, I suspect that's > already needed to fix the TOCTOU bug. > > > + goto failed; > > + } > > + > > + asid = *&to_kvm_svm(kvm)->sev_info.asid; > > Don't think "*&" is necessary. :-) :') > > > + > > + /* > > + * The mirror_kvm holds an enc_context_owner ref so its asid can't > > + * disappear until we're done with it > > + */ > > + kvm_get_kvm(kvm); > > Do we really need/want to take a reference to the source 'struct kvm'? IMO, > the so called mirror should never be doing operations with its source context, > i.e. should not have easy access to 'struct kvm'. We already have a reference > to the fd, any reason not to use that to ensure liveliness of the source? I agree the mirror should never be running operations on the source. I don't know that holding the fd instead of the kvm makes that much better though, are there advantages to that I'm not seeing? > > > + > > + mutex_unlock(&kvm->lock); > > + mutex_lock(&mirror_kvm->lock); > > + > > + /* Set enc_context_owner and copy its encryption context over */ > > + mirror_kvm_sev = &to_kvm_svm(mirror_kvm)->sev_info; > > + mirror_kvm_sev->enc_context_owner = kvm; > > + mirror_kvm_sev->asid = asid; > > + mirror_kvm_sev->active = true; > > I would prefer a prep patch to move "INIT_LIST_HEAD(&sev->regions_list);" from > sev_guest_init() to when the VM is instantiated. Shaving a few cycles in that > flow is meaningless, and not initializing the list of regions is odd, and will > cause problems if mirrors are allowed to pin memory (or do PSP commands). It seems like we can keep this a lot simpler and easier to reason about by not allowing mirrors to pin memory or do psp commands. That was the intent. We don't gain anything but complexity by allowing this to be a fully featured SEV VM. Unless anyone can think of a good reason we'd want to have a mirror vm be able to do more than this? > > @@ -5321,6 +5321,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > > kvm->arch.bus_lock_detection_enabled = true; > > r = 0; > > break; > > + case KVM_CAP_VM_COPY_ENC_CONTEXT_TO: > > + r = -ENOTTY; > > + if (kvm_x86_ops.vm_copy_enc_context_to) > > + r = kvm_x86_ops.vm_copy_enc_context_to(kvm, cap->args[0]); > > This can be a static call. > > On a related topic, does this really need to be a separate ioctl()? TDX can't > share encryption contexts, everything that KVM can do for a TDX guest requires > the per-VM context. Unless there is a known non-x86 use case, it might be > better to make this a mem_enc_op, and then it can be named SEV_SHARE_ASID or > something. I'd prefer to leave this as a capability in the same way the register_enc_region calls work. Moving it into mem_enc_ops means we'll have to do some messy locking to avoid race conditions with the second vm since kvm gets locked in enc_ops. Also seems wierd to me having this hack grouped in with all the PSP commands. If i'm the only one that thinks this is cleaner, I'll move it though. Interesting about the platform, too. If you're sure we'll never need to build this for any other platform I'll at least rename it to be amd specific. There's no non-sev scenario anyone can think of that might want to do this? > > > + return r; > > default: > > r = -EINVAL; > > break; > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index e126ebda36d0..18491638f070 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -637,6 +637,7 @@ void kvm_exit(void); > > > > void kvm_get_kvm(struct kvm *kvm); > > void kvm_put_kvm(struct kvm *kvm); > > +bool kvm_is_kvm(struct file *file); > > void kvm_put_kvm_no_destroy(struct kvm *kvm); > > > > static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id) > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index 63f8f6e95648..5b6296772db9 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1077,6 +1077,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_SYS_HYPERV_CPUID 191 > > #define KVM_CAP_DIRTY_LOG_RING 192 > > #define KVM_CAP_X86_BUS_LOCK_EXIT 193 > > +#define KVM_CAP_VM_COPY_ENC_CONTEXT_TO 194 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 001b9de4e727..5f31fcda4777 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -739,6 +739,8 @@ void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm) > > { > > } > > > > +static struct file_operations kvm_vm_fops; > > I'd probably prefer to put the helper just below kvm_vm_fops instead of adding > a forward declaration. IMO it's not all that important to add the helper close > to kvm_get/put_kvm(). > > > + > > static struct kvm *kvm_create_vm(unsigned long type) > > { > > struct kvm *kvm = kvm_arch_alloc_vm(); > > @@ -903,6 +905,12 @@ void kvm_put_kvm(struct kvm *kvm) > > } > > EXPORT_SYMBOL_GPL(kvm_put_kvm); > > > > +bool kvm_is_kvm(struct file *file) > > Heh, maybe kvm_file_is_kvm()? or just file_is_kvm()? > > > +{ > > + return file && file->f_op == &kvm_vm_fops; > > +} > > +EXPORT_SYMBOL_GPL(kvm_is_kvm); > > + > > /* > > * Used to put a reference that was taken on behalf of an object associated > > * with a user-visible file descriptor, e.g. a vcpu or device, if installation > > -- > > 2.30.0.617.g56c4b15f3c-goog > >