Received: by 2002:a9a:4c47:0:b029:116:c383:538 with SMTP id u7csp1022350lko; Tue, 13 Jul 2021 15:16:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyWJ3EF0GCz71fqCXuLu3GFLMQoO7Pou4frWDkQrnMvLJcFCy6UDrhVviFvzoxPhG61nEmS X-Received: by 2002:a92:2a10:: with SMTP id r16mr4294191ile.223.1626214619400; Tue, 13 Jul 2021 15:16:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626214619; cv=none; d=google.com; s=arc-20160816; b=WkaIM3cXTh5eThWBRw9YvzU2VwRzbhbq4vJAo2YDNVXVHN99K+tywdBL5imZZBpZji y8migAmHBnkNpyQIvMGaQqUxqjKNNAOL33qBE+moJghtcB7s+2XX7IQrtrhA0r0DQQUW 6yRU1PHhRG+Lf8/YJkr+vOYna0tWPoAmUXzR1NIxr2GbOjsvcEFqsjmzyAlSBR7AjT74 rchbovW5hVJRzpXtNYOrFUquzHc1nbvY1dJvU+rXSe42Y/xkluH7+Wmbc/ra7vqLTmIY YVgCcr8ZlLKoEvVSzgXtLAk7ApKpnawmdsj++nhKtSuTJdc47kPt3c2NO7xE4qS+PjYJ DpcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=j68Vg8JAk/DXfpZmo3qhmVL+R64WqjblQlTWOBouqH8=; b=z8F7P5W5P6DaHZB0v6eO4tcOYW2s3zKV0lB4alY3LAE5Nv3r2YEwCz4TSSGZtrzbbG iMBWEJ1QeyLQ7KcgcS8hTBBm4c8PftieiEo87hiuSHlooUnENIiGpaf4jh37NU8NUj7b +BkCOpbmfAntFBWDtXCVCR1SjDbtQWvmZAQPpf2SFf7FR0emxy8iIJWiPl7HAxU82NXH p/1+nX2UTtq14nfgDuk44gOqruKROr9MR7oJs004XYqLXMsJ8iFm190W0ytEa1PKAhqX yXEodH4WJpAwXsAOxAvgu24V2knCknTZD2X+QOcLrA92iAIbLYgq187nCVX4XBp5sHdC eGzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Zq9rf7UE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n12si268385iob.27.2021.07.13.15.16.46; Tue, 13 Jul 2021 15:16:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Zq9rf7UE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236342AbhGMWPt (ORCPT + 99 others); Tue, 13 Jul 2021 18:15:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235973AbhGMWPs (ORCPT ); Tue, 13 Jul 2021 18:15:48 -0400 Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4ABA1C0613DD for ; Tue, 13 Jul 2021 15:12:57 -0700 (PDT) Received: by mail-pg1-x529.google.com with SMTP id y4so20280106pgl.10 for ; Tue, 13 Jul 2021 15:12:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j68Vg8JAk/DXfpZmo3qhmVL+R64WqjblQlTWOBouqH8=; b=Zq9rf7UE1v3HVgiI5AwVMY+vEWQd1f73YRsIZENDV1gx525H7BqmqTc5lB4s4U9UFM zMB0H/nFgNI0h1gayu3s0mgM63FBr+1PEKCdOda5WoAC3dZ+xoc0u1xM1WRWwOFYdSah X2wDVVoYvtrCVSC8xBvbUG9Ri9n5T5YtuRvBwrwxqfxwS0RSuJcy0hAjW7ojXobc9YlU XAEgY7jCe8qPrlsJzTM9fBBiOMNpZ/WhiSS87movxaE6D2I86es8EPibRSRHosHOGCdv PsABqxnJdcVH08qewgtjA7E1VcfCoKp5Enoe5PKpdH6700zpdzJh4hOl9BbLghliOgzl 8rxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j68Vg8JAk/DXfpZmo3qhmVL+R64WqjblQlTWOBouqH8=; b=My3bEyE2+c5unIn3K2roJdy46dIY6myb3jJCc/fgJS4uR//wRFMlhOJSAPv+Pjx6/5 wwScpWsAY7UaIA+u0Qc6BCwdc0pbWbBGybuH18VQWfqfKZBNv9lyUmlrcJwM4mek5t2Q R9JAJXbShu3L44ovbHUzTi2Gu9Zo3q5wRsQlRKooWUUsYCsEuT+sdQXVfzviok9xaOAS +DlBiXqPBAyJxS5bpO0QV01w3RGzKrtP1ajnRpNpC/U+V15vLjEqkwq8SLusnyT85Rc/ KRB8gmXy3f22moaY53oKRIYLv5Qj4tyyib5pDY1SczodpB+ZfBA3GvB2l9rFFos5aPNd AcWQ== X-Gm-Message-State: AOAM530GMkR73IIiAXOKJ2J36aw3/qLzyT/4ypaKb3Wiq1MSxtENTdw7 QriwPplJKt7NqTJcszGCU/Mx5g== X-Received: by 2002:a65:6412:: with SMTP id a18mr6086932pgv.445.1626214376546; Tue, 13 Jul 2021 15:12:56 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id c136sm150428pfc.201.2021.07.13.15.12.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 15:12:55 -0700 (PDT) Date: Tue, 13 Jul 2021 22:12:52 +0000 From: Sean Christopherson To: Peter Gonda Cc: kvm@vger.kernel.org, Lars Bull , Paolo Bonzini , David Rientjes , "Dr . David Alan Gilbert" , Brijesh Singh , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/3] KVM, SEV: Add support for SEV local migration Message-ID: References: <20210621163118.1040170-1-pgonda@google.com> <20210621163118.1040170-3-pgonda@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210621163118.1040170-3-pgonda@google.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 21, 2021, Peter Gonda wrote: > Local migration provides a low-cost mechanism for userspace VMM upgrades. > It is an alternative to traditional (i.e., remote) live migration. Whereas > remote migration handles move a guest to a new host, local migration only > handles moving a guest to a new userspace VMM within a host. Maybe use intra-host vs. inter-host instead of local vs. remote? That'll save having to define local and remote. KVM_SEV_INTRA_HOST_{SEND,RECEIVE} is a bit wordy, but also very specific. > For SEV to work with local migration, contents of the SEV info struct > such as the ASID (used to index the encryption key in the AMD SP) and > the list > of memory regions need to be transferred to the target VM. Adds > commands for sending and receiving the sev info. > > To avoid exposing this internal state to userspace and prevent other > processes from importing state they shouldn't have access to, the send > returns a token to userspace that is handed off to the target VM. The > target passes in this token to receive the sent state. The token is only > valid for one-time use. Functionality on the source becomes limited > after > send has been performed. If the source is destroyed before the target > has > received, the token becomes invalid. Something appears to be mangling the changelogs, or maybe you have a cat that likes stepping on the Enter key? :-D > The target is expected to be initialized (sev_guest_init), but not > launched > state (sev_launch_start) when performing receive. Once the target has > received, it will be in a launched state and will not need to perform > the > typical SEV launch commands. ... > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 5af46ff6ec48..7c33ad2b910d 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -57,6 +58,8 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444); > #define sev_es_enabled false > #endif /* CONFIG_KVM_AMD_SEV */ > > +#define MAX_RAND_RETRY 3 > + > static u8 sev_enc_bit; > static DECLARE_RWSEM(sev_deactivate_lock); > static DEFINE_MUTEX(sev_bitmap_lock); > @@ -74,6 +77,22 @@ struct enc_region { > unsigned long size; > }; > > +struct sev_info_migration_node { > + struct hlist_node hnode; > + u64 token; > + bool valid; > + > + unsigned int asid; > + unsigned int handle; > + unsigned long pages_locked; > + struct list_head regions_list; > + struct misc_cg *misc_cg; > +}; > + > +#define SEV_INFO_MIGRATION_HASH_BITS 7 > +static DEFINE_HASHTABLE(sev_info_migration_hash, SEV_INFO_MIGRATION_HASH_BITS); > +static DEFINE_SPINLOCK(sev_info_migration_hash_lock); > + > /* Called with the sev_bitmap_lock held, or on shutdown */ > static int sev_flush_asids(int min_asid, int max_asid) > { > @@ -1094,6 +1113,185 @@ static int sev_get_attestation_report(struct kvm *kvm, struct kvm_sev_cmd *argp) > return ret; > } > > +static struct sev_info_migration_node *find_migration_info(unsigned long token) > +{ > + struct sev_info_migration_node *entry; > + > + hash_for_each_possible(sev_info_migration_hash, entry, hnode, token) { > + if (entry->token == token) > + return entry; > + } > + > + return NULL; > +} > + > +/* > + * Places @entry into the |sev_info_migration_hash|. Returns 0 if successful > + * and ownership of @entry is transferred to the hashmap. > + */ > +static int place_migration_node(struct sev_info_migration_node *entry) > +{ > + u64 token = 0; > + unsigned int retries; > + int ret = -EFAULT; > + > + /* > + * Generate a token associated with this VM's SEV info that userspace > + * can use to import on the other side. We use 0 to indicate a not- > + * present token. The token cannot collide with other existing ones, so > + * reroll a few times until we get a valid token. In the unlikely event > + * we're having trouble generating a unique token, give up and let > + * userspace retry if it needs to. > + */ > + spin_lock(&sev_info_migration_hash_lock); > + for (retries = 0; retries < MAX_RAND_RETRY; retries++) { > + get_random_bytes((void *)&token, sizeof(token)); Why is the kernel responsible for generating the token? IIUC, the purpose of the random generation is to make the token difficult to guess by a process other than the intended recipient, e.g. by a malicious process. But that's a userspace problem that can be better solved by the sender. > + > + if (find_migration_info(token)) > + continue; > + > + entry->token = token; > + entry->valid = true; > + > + hash_add(sev_info_migration_hash, &entry->hnode, token); > + ret = 0; > + goto out; > + } > + > +out: > + spin_unlock(&sev_info_migration_hash_lock); > + return ret; > +} > + > +static int sev_local_send(struct kvm *kvm, struct kvm_sev_cmd *argp) > +{ > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; > + struct sev_info_migration_node *entry; > + struct kvm_sev_local_send params; > + u64 token; > + int ret = -EFAULT; > + > + if (!sev_guest(kvm)) > + return -ENOTTY; > + > + if (sev->es_active) > + return -EPERM; > + > + if (sev->info_token != 0) > + return -EEXIST; > + > + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, > + sizeof(params))) > + return -EFAULT; > + > + entry = kzalloc(sizeof(*entry), GFP_KERNEL); > + if (!entry) > + return -ENOMEM; > + > + entry->asid = sev->asid; > + entry->handle = sev->handle; > + entry->pages_locked = sev->pages_locked; > + entry->misc_cg = sev->misc_cg; > + > + INIT_LIST_HEAD(&entry->regions_list); > + list_replace_init(&sev->regions_list, &entry->regions_list); > + > + if (place_migration_node(entry)) > + goto e_listdel; > + > + token = entry->token; > + > + params.info_token = token; > + if (copy_to_user((void __user *)(uintptr_t)argp->data, ¶ms, > + sizeof(params))) > + goto e_hashdel; > + > + sev->info_token = token; > + > + return 0; > + > +e_hashdel: err_ is the more standard label for this sort of thing. > + spin_lock(&sev_info_migration_hash_lock); > + hash_del(&entry->hnode); > + spin_unlock(&sev_info_migration_hash_lock); > + > +e_listdel: listdel is a bit of an odd name, though I can't think of a better one. > + list_replace_init(&entry->regions_list, &sev->regions_list); > + > + kfree(entry); > + > + return ret; > +} > + > +static int sev_local_receive(struct kvm *kvm, struct kvm_sev_cmd *argp) > +{ > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; > + struct sev_info_migration_node *entry; > + struct kvm_sev_local_receive params; > + struct kvm_sev_info old_info; > + > + if (!sev_guest(kvm)) > + return -ENOTTY; > + > + if (sev->es_active) > + return -EPERM; > + > + if (sev->handle != 0) > + return -EPERM; > + > + if (!list_empty(&sev->regions_list)) > + return -EPERM; > + > + if (copy_from_user(¶ms, > + (void __user *)(uintptr_t)argp->data, If you capture argp in a local var, this ugly cast can be done once and you'll save lines overall, e.g. void __user *udata = (void __user *)(uintptr_t)argp->data; if (copy_from_user(¶ms, udata, sizeof(params)) return -EFAULT; ... if (copy_to_user(udata, ¶ms, sizeof(params))) return -EFAULT; > + sizeof(params))) > + return -EFAULT; > + > + spin_lock(&sev_info_migration_hash_lock); > + entry = find_migration_info(params.info_token); > + if (!entry || !entry->valid) > + goto err_unlock; > + > + memcpy(&old_info, sev, sizeof(old_info)); > + > + /* > + * The source VM always frees @entry On the target we simply > + * mark the token as invalid to notify the source the sev info > + * has been moved successfully. > + */ > + entry->valid = false; > + sev->active = true; > + sev->asid = entry->asid; > + sev->handle = entry->handle; > + sev->pages_locked = entry->pages_locked; > + sev->misc_cg = entry->misc_cg; > + > + INIT_LIST_HEAD(&sev->regions_list); > + list_replace_init(&entry->regions_list, &sev->regions_list); > + > + spin_unlock(&sev_info_migration_hash_lock); > + > + params.handle = sev->handle; > + > + if (copy_to_user((void __user *)(uintptr_t)argp->data, ¶ms, > + sizeof(params))) > + goto err_unwind; > + > + sev_asid_free(&old_info); > + return 0; > + > +err_unwind: > + spin_lock(&sev_info_migration_hash_lock); Why does the lock need to be reacquired, and can anything go sideways if something else grabbed the lock while it was dropped? > + list_replace_init(&sev->regions_list, &entry->regions_list); > + entry->valid = true; > + memcpy(sev, &old_info, sizeof(*sev)); > + > +err_unlock: > + spin_unlock(&sev_info_migration_hash_lock); > + > + return -EFAULT; > +} > + ... > @@ -1553,6 +1763,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) > case KVM_SEV_GET_ATTESTATION_REPORT: > r = sev_get_attestation_report(kvm, &sev_cmd); > break; > + case KVM_SEV_LOCAL_SEND: > + r = sev_local_send(kvm, &sev_cmd); > + break; > + case KVM_SEV_LOCAL_RECEIVE: > + r = sev_local_receive(kvm, &sev_cmd); > + break; > case KVM_SEV_SEND_START: > r = sev_send_start(kvm, &sev_cmd); > break; > @@ -1786,6 +2002,8 @@ static void __unregister_region_list_locked(struct kvm *kvm, > void sev_vm_destroy(struct kvm *kvm) > { > struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; > + struct sev_info_migration_node *entry = NULL; > + bool info_migrated = false; > > if (!sev_guest(kvm)) > return; > @@ -1796,25 +2014,59 @@ void sev_vm_destroy(struct kvm *kvm) > return; > } > > + /* > + * If userspace has requested that we migrate the SEV info to a new VM, > + * then we own and must remove an entry node in the tracking data > + * structure. Whether we clean up the data in our SEV info struct and > + * entry node depends on whether userspace has done the migration, > + * which transfers ownership to a new VM. We can identify that > + * migration has occurred by checking if the node is marked invalid. > + */ > + if (sev->info_token != 0) { > + spin_lock(&sev_info_migration_hash_lock); > + entry = find_migration_info(sev->info_token); > + if (entry) { > + info_migrated = !entry->valid; > + hash_del(&entry->hnode); Isn't info_migrated unnecessary? Grabbing ->valid under the lock is a bit misleading because it's unnecessary, e.g. once the entry is deleted from the list then this flow owns it. If you do s/entry/migration_entry (or mig_entry), then I think the code will be sufficiently self-documenting. > + } else Needs curly braces. > + WARN(1, > + "SEV VM was marked for export, but does not have associated export node.\n"); But an even better way to write this (IMO the msg isn't necessary, the issue is quite obvious at a quick glance): if (!WARN_ON(!entry)) hash_del(&entry->node); > + spin_unlock(&sev_info_migration_hash_lock); > + } > + > mutex_lock(&kvm->lock); > > /* > - * Ensure that all guest tagged cache entries are flushed before > - * releasing the pages back to the system for use. CLFLUSH will > - * not do this, so issue a WBINVD. > + * Adding memory regions after a local send has started > + * is dangerous. > */ > - wbinvd_on_all_cpus(); > + if (sev->info_token != 0 && !list_empty(&sev->regions_list)) { Kernel style usually omits the "!= 0". > + WARN(1, Similarly, WARN(1, ...) in an if statement is usually a sign that you're doing things backwards: if (WARN_ON(sev->info_token && !list_empty(&sev->regions_list))) unregister_enc_regions(kvm, &sev->regions_list); In addition to saving code, the WARN will display the failing condition, which obviates the need for a free form message in most cases (including this one). Oh, and I think you've got a bug here. If info_token is '0', won't regions_list be leaked? I.e. shouldn't this be (the helper gracefully handles the empty case): WARN_ON(sev->info_token && !list_empty(&sev->regions_list)); unregister_enc_regions(kvm, &sev->regions_list); That will generate a smaller diff, since the exiting call for regions_list will be unchanged. > + "Source SEV regions list non-empty after export request. List is not expected to be modified after export request.\n"); > + __unregister_region_list_locked(kvm, &sev->regions_list); > + } > > /* > - * if userspace was terminated before unregistering the memory regions > - * then lets unpin all the registered memory. > + * If userspace was terminated before unregistering the memory Unnecessary new newline. That said, this comment also appears to be stale? > + * regions then lets unpin all the registered memory. > */ > - __unregister_region_list_locked(kvm, &sev->regions_list); > + if (entry) > + __unregister_region_list_locked(kvm, &entry->regions_list); > > mutex_unlock(&kvm->lock); > > - sev_unbind_asid(kvm, sev->handle); > - sev_asid_free(sev); > + /* > + * Ensure that all guest tagged cache entries are flushed before > + * releasing the pages back to the system for use. CLFLUSH will > + * not do this, so issue a WBINVD. > + */ > + wbinvd_on_all_cpus(); > + if (!info_migrated) { As above, this can be: if (!migration_entry || !migration_entry->valid) { > + sev_unbind_asid(kvm, sev->handle); > + sev_asid_free(sev); > + } > + > + kfree(entry); > }