Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp940780ybb; Fri, 10 Apr 2020 13:20:38 -0700 (PDT) X-Google-Smtp-Source: APiQypJVGcg1bOwg6zeFXyh6V6Ynh6wGhYSapG3NzYwLI2kGuNTBY6TChCJzHquK59U+tdhPPDZu X-Received: by 2002:ad4:43c1:: with SMTP id o1mr6835717qvs.56.1586550037924; Fri, 10 Apr 2020 13:20:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586550037; cv=none; d=google.com; s=arc-20160816; b=GVPZPkmq/bukNHdDWqJR3MtvIqQRX62w24Y954ZpbfFpOHBoymKo1w92kZP/Jq9xx+ W9PztvpltNCUd3EgDboPZxDXCL95QZbKPXZ3kJ1bKZlOUqY9kD22schbyx590RvxHsBM ZPxbNN+5ADhlikQ6v5Qi3hVe7NRsqO1ILw7/I6P5yzv/xU23bbarR3Qc26KsMQzC5gWU bHkzCs+mjwTFlRGMWBKqchgZ1Kohua2Xz2QIPKFFeJqzQQFQ83cdI2/30g9sx8jLGc0r Y2UbOffYkEdhe9oCDNCn58yofQToOZ5L2ufX4OkKBhL2R6Q09uNCBPfo+P2Z3vurhbQV x0pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=s00vWxZiMFv7sIeUz4sCBW8H5Ld38bGP5NjecrSGQr4=; b=h9LFTwVLtO7xwFohtGYvX+SGv++sz4YkQcnmekU1buMwTNuthxrd9NVn8fVjLwM9IR XIG2dldeOHf7GDSlIr+h94SldUOrlLVoI825bhgBh7zt+1YuQeJOZfCQu4u2qNwDD6jV UCQ2jM6gqCFGUu6zsdnuD0/9kmhAuPuYRX1OQbr9ssxreg8dTS+88k0fSadLXMdN4rnh q81nZzRRteU70PI1AuoyTigHaB6Q5ju1kZj04JdTI9iMEZs+l8fiFEabLPgnYtb78bZY obi+kGX6E21hyCmlo5fj+FJyKAeWQqJXkrM0bqy8p8ah+pakZ4Mq/x4NTHbBKu9fGbF5 eDWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=IrzIdtmk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m12si2044523qkg.200.2020.04.10.13.20.22; Fri, 10 Apr 2020 13:20:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=IrzIdtmk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726671AbgDJUTS (ORCPT + 99 others); Fri, 10 Apr 2020 16:19:18 -0400 Received: from mail-lj1-f194.google.com ([209.85.208.194]:39359 "EHLO mail-lj1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726263AbgDJUTS (ORCPT ); Fri, 10 Apr 2020 16:19:18 -0400 Received: by mail-lj1-f194.google.com with SMTP id i20so3078935ljn.6 for ; Fri, 10 Apr 2020 13:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=s00vWxZiMFv7sIeUz4sCBW8H5Ld38bGP5NjecrSGQr4=; b=IrzIdtmkuN4ludWrmbgCqGXCZd1uw4SDcHHwGkmEBFm7qq94yUGdWHxiSNvq7lBwV6 k/2ZuxI5O05UjvGJ5+gVuqnF0pq9CUvhm/Jno90jaAIIC65/fuPlEU4hfiiEFDkRSmx6 S/bMUXRSRY3lOFG+oIUgFIR/bTN0liThDZaRlUVLqujD9IZ8Jd6utuWMVnzNaBexavVy +IUZYXiXiNNFpNQ2K2u1zvusmvJHLvL2ye5LUsDfnfeYRohHu2jl16s0PdjpFHllSy2K H8Hy926HBmtEuJ/aCppR+RHvL5KdQil9eTwSfsUFaxE5jC4RXbWUdI/RRg148mldDlWr lnsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=s00vWxZiMFv7sIeUz4sCBW8H5Ld38bGP5NjecrSGQr4=; b=TeMSP4cCB3dk3lUoUqY3QeIK8Y1sSIlYMjKjn39EGt7HkRxW/mXRYUAYynScoOn/R+ 2Qs6sXf7qdnwJhSTNGhJhvk9yspV+jYYjjloky+TtOdIDh2Wf/Tdp747MH4mf+RjI/lE 42xnh38oe2nVPR/CuqFAZQ1Yc9Wz4EJjBcR969M5Cs5ymjEOW58MI4KSQ1iJm5xItx3/ CSvU+KP6sG8omNlJhw1IBg453VXP10eHe4jI/0n7caAiV7zQk1JJkHYXQp156WUiF3ge /4XWyX3z7zGf1p5jaG1Vj0iyoOBal0q1KYVYOJWLA1i7MoSAtmw5DUlBaf5C0Mt1IzgY L1gg== X-Gm-Message-State: AGi0PubYKwJRJBGdnNr5/hD7A9DKjlIGFOqefg/ukf14cbNXDr9L3eCD rtaefgARdw4I8/hh5efzskhocSEvfxGQ8WsfSeRIZg== X-Received: by 2002:a05:651c:c7:: with SMTP id 7mr3884058ljr.124.1586549953114; Fri, 10 Apr 2020 13:19:13 -0700 (PDT) MIME-Version: 1.0 References: <9e959ee134ad77f62c9881b8c54cd27e35055072.1585548051.git.ashish.kalra@amd.com> <20200403214559.GB28747@ashkalra_ubuntu_server> <65c09963-2027-22c1-e04d-4c8c3658b2c3@oracle.com> <20200408015221.GB27608@ashkalra_ubuntu_server> <20200410013418.GB19168@ashkalra_ubuntu_server> In-Reply-To: From: Steve Rutherford Date: Fri, 10 Apr 2020 13:18:36 -0700 Message-ID: Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl To: Ashish Kalra Cc: Krish Sadhukhan , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Joerg Roedel , Borislav Petkov , Tom Lendacky , X86 ML , KVM list , LKML , David Rientjes , Andy Lutomirski , Brijesh Singh Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford wrote: > > On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford > wrote: > > > > On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra wrote: > > > > > > Hello Steve, > > > > > > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote: > > > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra wrote: > > > > > > > > > > Hello Steve, > > > > > > > > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote: > > > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote: > > > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote: > > > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote: > > > > > > > >>> From: Ashish Kalra > > > > > > > >>> > > > > > > > >>> This ioctl can be used by the application to reset the page > > > > > > > >>> encryption bitmap managed by the KVM driver. A typical usage > > > > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize > > > > > > > >>> the bitmap. > > > > > > > >>> > > > > > > > >>> Signed-off-by: Ashish Kalra > > > > > > > >>> --- > > > > > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++ > > > > > > > >>> arch/x86/include/asm/kvm_host.h | 1 + > > > > > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++ > > > > > > > >>> arch/x86/kvm/x86.c | 6 ++++++ > > > > > > > >>> include/uapi/linux/kvm.h | 1 + > > > > > > > >>> 5 files changed, 37 insertions(+) > > > > > > > >>> > > > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > > > > > >>> index 4d1004a154f6..a11326ccc51d 100644 > > > > > > > >>> --- a/Documentation/virt/kvm/api.rst > > > > > > > >>> +++ b/Documentation/virt/kvm/api.rst > > > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption > > > > > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption > > > > > > > >>> bitmap for an incoming guest. > > > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl) > > > > > > > >>> +----------------------------------------- > > > > > > > >>> + > > > > > > > >>> +:Capability: basic > > > > > > > >>> +:Architectures: x86 > > > > > > > >>> +:Type: vm ioctl > > > > > > > >>> +:Parameters: none > > > > > > > >>> +:Returns: 0 on success, -1 on error > > > > > > > >>> + > > > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption > > > > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU. > > > > > > > >>> + > > > > > > > >>> + > > > > > > > >>> 5. The kvm_run structure > > > > > > > >>> ======================== > > > > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > > > > > >>> index d30f770aaaea..a96ef6338cd2 100644 > > > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h > > > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h > > > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops { > > > > > > > >>> struct kvm_page_enc_bitmap *bmap); > > > > > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm, > > > > > > > >>> struct kvm_page_enc_bitmap *bmap); > > > > > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm); > > > > > > > >>> }; > > > > > > > >>> struct kvm_arch_async_pf { > > > > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > > > > > > > >>> index 313343a43045..c99b0207a443 100644 > > > > > > > >>> --- a/arch/x86/kvm/svm.c > > > > > > > >>> +++ b/arch/x86/kvm/svm.c > > > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm, > > > > > > > >>> return ret; > > > > > > > >>> } > > > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm) > > > > > > > >>> +{ > > > > > > > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; > > > > > > > >>> + > > > > > > > >>> + if (!sev_guest(kvm)) > > > > > > > >>> + return -ENOTTY; > > > > > > > >>> + > > > > > > > >>> + mutex_lock(&kvm->lock); > > > > > > > >>> + /* by default all pages should be marked encrypted */ > > > > > > > >>> + if (sev->page_enc_bmap_size) > > > > > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size); > > > > > > > >>> + mutex_unlock(&kvm->lock); > > > > > > > >>> + return 0; > > > > > > > >>> +} > > > > > > > >>> + > > > > > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp) > > > > > > > >>> { > > > > > > > >>> struct kvm_sev_cmd sev_cmd; > > > > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { > > > > > > > >>> .page_enc_status_hc = svm_page_enc_status_hc, > > > > > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap, > > > > > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap, > > > > > > > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap, > > > > > > > >> > > > > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in > > > > > > > >> the previous patch either. > > > > > > > >> > > > > > > > >>> }; > > > > > > > > This struct is declared as "static storage", so won't the non-initialized > > > > > > > > members be 0 ? > > > > > > > > > > > > > > > > > > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly > > > > > > > initialized. We should maintain the convention, perhaps. > > > > > > > > > > > > > > > > > > > > > > >>> static int __init svm_init(void) > > > > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > > > >>> index 05e953b2ec61..2127ed937f53 100644 > > > > > > > >>> --- a/arch/x86/kvm/x86.c > > > > > > > >>> +++ b/arch/x86/kvm/x86.c > > > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp, > > > > > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap); > > > > > > > >>> break; > > > > > > > >>> } > > > > > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: { > > > > > > > >>> + r = -ENOTTY; > > > > > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap) > > > > > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm); > > > > > > > >>> + break; > > > > > > > >>> + } > > > > > > > >>> default: > > > > > > > >>> r = -ENOTTY; > > > > > > > >>> } > > > > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > > > > > > >>> index b4b01d47e568..0884a581fc37 100644 > > > > > > > >>> --- a/include/uapi/linux/kvm.h > > > > > > > >>> +++ b/include/uapi/linux/kvm.h > > > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region { > > > > > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap) > > > > > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap) > > > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7) > > > > > > > >>> /* Secure Encrypted Virtualization command */ > > > > > > > >>> enum sev_cmd_id { > > > > > > > >> Reviewed-by: Krish Sadhukhan > > > > > > > > > > > > > > > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to > > > > > > copy the new value down and do a bit more work, but I don't think > > > > > > resetting the bitmap is going to be the bottleneck on reboot. Seems > > > > > > excessive to add another ioctl for this. > > > > > > > > > > The set ioctl is generally available/provided for the incoming VM to setup > > > > > the page encryption bitmap, this reset ioctl is meant for the source VM > > > > > as a simple interface to reset the whole page encryption bitmap. > > > > > > > > > > Thanks, > > > > > Ashish > > > > > > > > > > > > Hey Ashish, > > > > > > > > These seem very overlapping. I think this API should be refactored a bit. > > > > > > > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this > > > > hypercall (and related feature bit) is offered to the VM, and also the > > > > size of the buffer. > > > > > > If you look at patch 13/14, i have added a new kvm para feature called > > > "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV > > > Live Migration and a new Custom MSR which the guest does a wrmsr to > > > enable the Live Migration feature, so this is like the enable cap > > > support. > > > > > > There are further extensions to this support i am adding, so patch 13/14 > > > of this patch-set is still being enhanced and will have full support > > > when i repost next. > > > > > > > 2) Use set for manipulating values in the bitmap, including resetting > > > > the bitmap. Set the bitmap pointer to null if you want to reset to all > > > > 0xFFs. When the bitmap pointer is set, it should set the values to > > > > exactly what is pointed at, instead of only clearing bits, as is done > > > > currently. > > > > > > As i mentioned in my earlier email, the set api is supposed to be for > > > the incoming VM, but if you really need to use it for the outgoing VM > > > then it can be modified. > > > > > > > 3) Use get for fetching values from the kernel. Personally, I'd > > > > require alignment of the base GFN to a multiple of 8 (but the number > > > > of pages could be whatever), so you can just use a memcpy. Optionally, > > > > you may want some way to tell userspace the size of the existing > > > > buffer, so it can ensure that it can ask for the entire buffer without > > > > having to track the size in usermode (not strictly necessary, but nice > > > > to have since it ensures that there is only one place that has to > > > > manage this value). > > > > > > > > If you want to expand or contract the bitmap, you can use enable cap > > > > to adjust the size. > > > > > > As being discussed on the earlier mail thread, we are doing this > > > dynamically now by computing the guest RAM size when the > > > set_user_memory_region ioctl is invoked. I believe that should handle > > > the hot-plug and hot-unplug events too, as any hot memory updates will > > > need KVM memslots to be updated. > > Ahh, sorry, forgot you mentioned this: yes this can work. Host needs > > to be able to decide not to allocate, but this should be workable. > > > > > > > If you don't want to offer the hypercall to the guest, don't call the > > > > enable cap. > > > > This API avoids using up another ioctl. Ioctl space is somewhat > > > > scarce. It also gives userspace fine grained control over the buffer, > > > > so it can support both hot-plug and hot-unplug (or at the very least > > > > it is not obviously incompatible with those). It also gives userspace > > > > control over whether or not the feature is offered. The hypercall > > > > isn't free, and being able to tell guests to not call when the host > > > > wasn't going to migrate it anyway will be useful. > > > > > > > > > > As i mentioned above, now the host indicates if it supports the Live > > > Migration feature and the feature and the hypercall are only enabled on > > > the host when the guest checks for this support and does a wrmsr() to > > > enable the feature. Also the guest will not make the hypercall if the > > > host does not indicate support for it. > > If my read of those patches was correct, the host will always > > advertise support for the hypercall. And the only bit controlling > > whether or not the hypercall is advertised is essentially the kernel > > version. You need to rollout a new kernel to disable the hypercall. > > Ahh, awesome, I see I misunderstood how the CPUID bits get passed > through: usermode can still override them. Forgot about the back and > forth for CPUID with usermode. My point about informing the guest > kernel is clearly moot. The host still needs the ability to prevent > allocations, but that is more minor. Maybe use a flag on the memslots > directly? On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.