Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp216828imu; Wed, 12 Dec 2018 15:21:36 -0800 (PST) X-Google-Smtp-Source: AFSGD/WIFxJ4McSp6b/+gEG+yYteIIueiRz9m9KZAkkPIYrJGGa1Jlc7gR/VroPv+CFhm6IX1spC X-Received: by 2002:a62:fb07:: with SMTP id x7mr21833052pfm.71.1544656896762; Wed, 12 Dec 2018 15:21:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544656896; cv=none; d=google.com; s=arc-20160816; b=RR2tMu16S/393xZHI5WpnucLl8qtR17QNBlsQ8esYn2IJfbt2/MwsRJHIYCEbu7H5l Ap/EkCCeZd/9uAjSUZw3UzMnhxPVPOguJ1cRESuDiQwERh29zObY3Y4bWmjYcNKLdFjy TDyWbJfeU/BPj6uahbcIpffAzD6WfWdgPhuMShogVnuVL9eetp8RR8qWDIs2EA23Tukd onfSagT4YZC/p4kCW2aIGD9VzRnuBZApJMz261P7tNcq6L54kt/KqHXrKYy36JuOIyku D2CmxOSCtJvf9/UZj8C//KKq+hp+d4htcK4p1aIg3PjrbtPOs0QCHYoTE+3XxYEfO4Lz Y32Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=k6iJu6y4M0WOt5PU1EDhJkC4+HR3/BXBZzq4HJfVoa8=; b=CGw1z6sRC0EnPX5b5nNt9lpAyo8u5MsP6ljOOtFBrnBIw2sW3jIIGpPaYjjuqe2E7N 9Zdl2ZGbJnm8/hognWgCz6adQaOmPbf38oPQKhMhFjKSffnj4Fx4MRMCkctsERXblUu5 6MyGnwiTFz0RbjpR0tZ479JuKAiFro4fsdpjfhv9oACoCEEOkXghQGm/xdGOlIf+fW2r 07yJO/Zt+RMXcYuJwmy0bpGXr2WFczVsoT9ngaT4gIVLX/WemaZ5JRVI7CD3O+E5iVHt F+GDOIcoPWtkhu2NQ8c8kSgMoKMim6BD+haZMV+5PDd5HRBGrg/8XXTshkCdk6fRnCpX AHGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fuOlYnDm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w23si108475plq.198.2018.12.12.15.21.20; Wed, 12 Dec 2018 15:21:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fuOlYnDm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728460AbeLLXUH (ORCPT + 99 others); Wed, 12 Dec 2018 18:20:07 -0500 Received: from mail-ot1-f68.google.com ([209.85.210.68]:47040 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726337AbeLLXUH (ORCPT ); Wed, 12 Dec 2018 18:20:07 -0500 Received: by mail-ot1-f68.google.com with SMTP id w25so113966otm.13 for ; Wed, 12 Dec 2018 15:20:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=k6iJu6y4M0WOt5PU1EDhJkC4+HR3/BXBZzq4HJfVoa8=; b=fuOlYnDmNNgDl5nC0ZyoCZAyRSqdqKvTteSJ3zTcVgF74L8on4nh1q2nw6Yki5QVvh EChmVIGCeVPu9DLJnTbKPjW5EnRkW3HZ66RkhvtQWI+wuxq68+dvUDo2xx3AH4nbyaO2 OZspSpk4M62WZImXkIQ/5qy8xQ+KFK2YxCZ2xoEIOMDxwU7eRPipoQ4ezVjk/Ta2OzqG K3mPR484UTJ7Rk0mPRNlMPo3QbZEFcKQ1zxPjQeQx97WE5gEQa7rLKMnDpbcx7XZ+nwS AjH128ggC4lLRrJcckknoTsGgwU86xuU9Un1jePYAAw20KhYSGeyKVGuM/hSVzrzZiAN 3zRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k6iJu6y4M0WOt5PU1EDhJkC4+HR3/BXBZzq4HJfVoa8=; b=GLhky+c4V1qIip/AqB0tG2c9/eXNbwKQILjrRKjZS0XLFXB0f5SeYXffPkBudr7KtL bR7cGolRSVO/pzN3HhiQQEz0p2n0WucUCVfNnhrvT8qjZNmXe2FfqeiW8xhCD5tOZziP ZTlxkPYHoNuDiZ3Ju8LodSGAn2WvXlb8F4Xwy8BGDaSQR44CMWVKW53uVNJqUawdS8BN nNEjGR11XQNuZRGktZxZ7HDXtzUKRxumN+4tbldf1HpN7awQBBuPOUfcpqKH7U8helMF iVKWrlCX6KZQSPhmSgVDm0eL0w2zollcXZcQuBl9QLtICmeJ0XJF+/DJK0+e+QzBInif jLpA== X-Gm-Message-State: AA+aEWbdKHyYF0KpbzxbJUpEdLSCcJeBc7MOMsewZHABAjsCghJVpzOz oiiOsl9XVTQ7089STCjkrHfGQUCLtxeNeR6V4BbpfA== X-Received: by 2002:a9d:6297:: with SMTP id x23mr14469017otk.63.1544656804541; Wed, 12 Dec 2018 15:20:04 -0800 (PST) MIME-Version: 1.0 References: <20181016165011.6607-1-vkuznets@redhat.com> <20181016165011.6607-6-vkuznets@redhat.com> In-Reply-To: <20181016165011.6607-6-vkuznets@redhat.com> From: Jim Mattson Date: Wed, 12 Dec 2018 15:19:53 -0800 Message-ID: Subject: Re: [PATCH v6 05/13] KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR To: Vitaly Kuznetsov Cc: kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Roman Kagan , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , "Michael Kelley (EOSG)" , LKML , Liran Alon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 16, 2018 at 9:50 AM Vitaly Kuznetsov wrote: > > Per Hyper-V TLFS 5.0b: > > "The L1 hypervisor may choose to use enlightened VMCSs by writing 1 to > the corresponding field in the VP assist page (see section 7.8.7). > Another field in the VP assist page controls the currently active > enlightened VMCS. Each enlightened VMCS is exactly one page (4 KB) in > size and must be initially zeroed. No VMPTRLD instruction must be > executed to make an enlightened VMCS active or current. > > After the L1 hypervisor performs a VM entry with an enlightened VMCS, > the VMCS is considered active on the processor. An enlightened VMCS > can only be active on a single processor at the same time. The L1 > hypervisor can execute a VMCLEAR instruction to transition an > enlightened VMCS from the active to the non-active state. Any VMREAD > or VMWRITE instructions while an enlightened VMCS is active is > unsupported and can result in unexpected behavior." > > Keep Enlightened VMCS structure for the current L2 guest permanently mapped > from struct nested_vmx instead of mapping it every time. > > Suggested-by: Ladi Prosek > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/kvm/vmx.c | 115 ++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 108 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index aebd008ccccb..cfb44acd4291 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -20,6 +20,7 @@ > #include "mmu.h" > #include "cpuid.h" > #include "lapic.h" > +#include "hyperv.h" > > #include > #include > @@ -889,6 +890,8 @@ struct nested_vmx { > bool guest_mode; > } smm; > > + gpa_t hv_evmcs_vmptr; > + struct page *hv_evmcs_page; > struct hv_enlightened_vmcs *hv_evmcs; > }; > > @@ -8111,11 +8114,13 @@ static int nested_vmx_failInvalid(struct kvm_vcpu *vcpu) > static int nested_vmx_failValid(struct kvm_vcpu *vcpu, > u32 vm_instruction_error) > { > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > /* > * failValid writes the error number to the current VMCS, which > * can't be done if there isn't a current VMCS. > */ > - if (to_vmx(vcpu)->nested.current_vmptr == -1ull) > + if (vmx->nested.current_vmptr == -1ull && !vmx->nested.hv_evmcs) > return nested_vmx_failInvalid(vcpu); > > vmx_set_rflags(vcpu, (vmx_get_rflags(vcpu) > @@ -8441,6 +8446,20 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) > vmcs_write64(VMCS_LINK_POINTER, -1ull); > } > > +static inline void nested_release_evmcs(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > + if (!vmx->nested.hv_evmcs) > + return; > + > + kunmap(vmx->nested.hv_evmcs_page); > + kvm_release_page_dirty(vmx->nested.hv_evmcs_page); > + vmx->nested.hv_evmcs_vmptr = -1ull; > + vmx->nested.hv_evmcs_page = NULL; > + vmx->nested.hv_evmcs = NULL; > +} > + > static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > @@ -8509,6 +8528,8 @@ static void free_nested(struct kvm_vcpu *vcpu) > > kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); > > + nested_release_evmcs(vcpu); > + > free_loaded_vmcs(&vmx->nested.vmcs02); > } > > @@ -8542,12 +8563,18 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) > return nested_vmx_failValid(vcpu, > VMXERR_VMCLEAR_VMXON_POINTER); > > - if (vmptr == vmx->nested.current_vmptr) > - nested_release_vmcs12(vcpu); > + if (vmx->nested.hv_evmcs_page) { > + if (vmptr == vmx->nested.hv_evmcs_vmptr) > + nested_release_evmcs(vcpu); > + } else { > + if (vmptr == vmx->nested.current_vmptr) > + nested_release_vmcs12(vcpu); > > - kvm_vcpu_write_guest(vcpu, > - vmptr + offsetof(struct vmcs12, launch_state), > - &zero, sizeof(zero)); > + kvm_vcpu_write_guest(vcpu, > + vmptr + offsetof(struct vmcs12, > + launch_state), > + &zero, sizeof(zero)); > + } > > return nested_vmx_succeed(vcpu); > } > @@ -8637,6 +8664,8 @@ static int copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx) > struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12; > struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; > > + vmcs12->hdr.revision_id = evmcs->revision_id; > + > /* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */ > vmcs12->tpr_threshold = evmcs->tpr_threshold; > vmcs12->guest_rip = evmcs->guest_rip; > @@ -9268,6 +9297,10 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) > return nested_vmx_failValid(vcpu, > VMXERR_VMPTRLD_VMXON_POINTER); > > + /* Forbid normal VMPTRLD if Enlightened version was used */ > + if (vmx->nested.hv_evmcs) > + return 1; > + > if (vmx->nested.current_vmptr != vmptr) { > struct vmcs12 *new_vmcs12; > struct page *page; > @@ -9301,6 +9334,68 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) > return nested_vmx_succeed(vcpu); > } > > +/* > + * This is an equivalent of the nested hypervisor executing the vmptrld > + * instruction. > + */ > +static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + struct hv_vp_assist_page assist_page; > + > + if (likely(!vmx->nested.enlightened_vmcs_enabled)) > + return 1; > + > + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page))) > + return 1; > + > + if (unlikely(!assist_page.enlighten_vmentry)) > + return 1; > + > + if (unlikely(assist_page.current_nested_vmcs != > + vmx->nested.hv_evmcs_vmptr)) { > + > + if (!vmx->nested.hv_evmcs) > + vmx->nested.current_vmptr = -1ull; > + > + nested_release_evmcs(vcpu); > + > + vmx->nested.hv_evmcs_page = kvm_vcpu_gpa_to_page( > + vcpu, assist_page.current_nested_vmcs); > + > + if (unlikely(is_error_page(vmx->nested.hv_evmcs_page))) > + return 0; > + > + vmx->nested.hv_evmcs = kmap(vmx->nested.hv_evmcs_page); Are you sure that directly mapping guest memory isn't going to lead to time-of-check vs. time-of-use bugs? This is a very hard programming model to get right.