Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp2500014ybd; Mon, 24 Jun 2019 07:30:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqzlq/9hgx9MC6qLvdArwy8rig8/j1UssEseNCKahQTggIjWfr9DanV9dR72rOWeG89+3QWg X-Received: by 2002:a63:1952:: with SMTP id 18mr30031343pgz.334.1561386607986; Mon, 24 Jun 2019 07:30:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561386607; cv=none; d=google.com; s=arc-20160816; b=WHl2t5MAs/AsQv4+A0cBvhOhLNO1JzKUXuEttSk07F8vJFO4tXhXOWJbagCp5kFNi0 aM/1BBrinu9kHzpNhMiaTasDS5h6hRcJ5Jd1bbMK+u+ztVxENPyiOniU8pVYGSwj7I4F C+B4L1gND6PWc0QJvMqraGn73qYsE5DpZdQ+HZhUWNvWj0k+D4F4pxV135EdyfQpwyBi n1C7nEZaTbLEyAGYTO2yHddgTPD1PPhG8QrAnI85OFlUibVxeG3inVpjC/klBxjmBL8u 8rZxSM3ktIm6JmC+EQ8ggvqgS2uXtUkr6e3bY9wV/7PKIDN84TKBRvD7JFl3YevgY2p5 8KTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:references:in-reply-to:subject:cc:to:from; bh=hSk6AE3sbQCAIoBtKIdZqdvHi2gn9KW0bMwzPsLPB2Y=; b=nh4Y/fX6wKOMNTiVoNjR/JB3xUSb+W+HR0pkNtWUmmeUBjqdF6h/PEL5L1FGudoe7Y +pLceWmQ7EPfjUr1tMCVGUsg0Hqu9USL1V6oph8+yPhf8tqxEl3NMuHd91jILxWl3MuQ 7gEYLIHeuh+3PSJRGPty5m8L4Eou1+/Pcgt6wmX7SWRj6J6ERqI9x+dyRFTqgkzP7tQI O5jnsDP2SCvFoo8l6BL80YtOmGrvSUfs8hw8EDFgjgWttUaAYphxDjfV7R6BqR2bV/yz cVNM4iKA5QhSE2e60Yh4LTKQxdma+WBJ9oA9+jKKsz+dc+GLazNWWE1/i+muvafPW1Jt FqTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o11si11348418pjb.30.2019.06.24.07.29.52; Mon, 24 Jun 2019 07:30:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726313AbfFXOQf (ORCPT + 99 others); Mon, 24 Jun 2019 10:16:35 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:39325 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725881AbfFXOQf (ORCPT ); Mon, 24 Jun 2019 10:16:35 -0400 Received: by mail-wm1-f66.google.com with SMTP id z23so13573955wma.4 for ; Mon, 24 Jun 2019 07:16:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=hSk6AE3sbQCAIoBtKIdZqdvHi2gn9KW0bMwzPsLPB2Y=; b=mWXoZkuEDa6v8oZlNOz8d6644T5VVljy65JKYPAkLO3GAU6KShbM1FypaH3qUkbHaI EOD1MDvUEqoUPcRmdrJcSeQPxI7MBSi0jUxJzP/jLYy4DCpBDyBsQmUZ4/K3TfJHz91s QtSivmmcchjSKfWzVhAn/dBaiomJ10sT2TR7mOTG9ShL5+tGLIk3QK0gePfNhFOEMTsh 3wbHTHPXStgeKsN5EChcUpBU3AD4TilAjNOFx0XU0rmU6bJM82VhrVpAZm5KZyNktYXS tO0ebXFAAHpkoMLFjBawjFkKwh2OIIC0b3Is0NteKAGCN1PKT04JYxtbX4EC2F0eEjpf HfZg== X-Gm-Message-State: APjAAAXtYuQH0VlBCrZdf8Tug0ihB8eSWn7c3QuENL8yELsvd12yQOef YNU96cjjX+jK8ujEgeHRKJ6VQA== X-Received: by 2002:a1c:1947:: with SMTP id 68mr16215464wmz.171.1561385793113; Mon, 24 Jun 2019 07:16:33 -0700 (PDT) Received: from vitty.brq.redhat.com (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id s10sm10868351wrt.49.2019.06.24.07.16.32 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 24 Jun 2019 07:16:32 -0700 (PDT) From: Vitaly Kuznetsov To: Liran Alon Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH] x86/kvm/nVMCS: fix VMCLEAR when Enlightened VMCS is in use In-Reply-To: References: <20190624133028.3710-1-vkuznets@redhat.com> Date: Mon, 24 Jun 2019 16:16:31 +0200 Message-ID: <87r27jdq68.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Liran Alon writes: >> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov wrote: >> >> When Enlightened VMCS is in use, it is valid to do VMCLEAR and, >> according to TLFS, this should "transition an enlightened VMCS from the >> active to the non-active state". It is, however, wrong to assume that >> it is only valid to do VMCLEAR for the eVMCS which is currently active >> on the vCPU performing VMCLEAR. >> >> Currently, the logic in handle_vmclear() is broken: in case, there is no >> active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal' >> VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly >> corrupts the memory area. >> >> So, in case the VMCLEAR argument is not the current active eVMCS on the >> vCPU, how can we know if the area it is pointing to is a normal or an >> enlightened VMCS? >> Thanks to the bug in Hyper-V (see commit 72aeb60c52bf7 ("KVM: nVMX: Verify >> eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can >> not, the revision can't be used to distinguish between them. So let's >> assume it is always enlightened in case enlightened vmentry is enabled in >> the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to >> minimize the impact for 'unenlightened' workloads. >> >> Fixes: b8bbab928fb1 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR") >> Signed-off-by: Vitaly Kuznetsov >> --- >> arch/x86/kvm/vmx/evmcs.c | 18 ++++++++++++++++++ >> arch/x86/kvm/vmx/evmcs.h | 1 + >> arch/x86/kvm/vmx/nested.c | 19 ++++++++----------- >> 3 files changed, 27 insertions(+), 11 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c >> index 1a6b3e1581aa..eae636ec0cc8 100644 >> --- a/arch/x86/kvm/vmx/evmcs.c >> +++ b/arch/x86/kvm/vmx/evmcs.c >> @@ -3,6 +3,7 @@ >> #include >> #include >> >> +#include "../hyperv.h" >> #include "evmcs.h" >> #include "vmcs.h" >> #include "vmx.h" >> @@ -309,6 +310,23 @@ void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) >> } >> #endif >> >> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr) > > I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short. > In addition, I think you should return either -1ull or assist_page.current_nested_vmcs. > i.e. Don’t return evmcs_ptr by pointer but instead as a return-value > and get rid of the bool. Sure, can do in v2. > >> +{ >> + struct hv_vp_assist_page assist_page; >> + >> + *evmptr = -1ull; >> + >> + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page))) >> + return false; >> + >> + if (unlikely(!assist_page.enlighten_vmentry)) >> + return false; >> + >> + *evmptr = assist_page.current_nested_vmcs; >> + >> + return true; >> +} >> + >> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu) >> { >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h >> index e0fcef85b332..c449e79a9c4a 100644 >> --- a/arch/x86/kvm/vmx/evmcs.h >> +++ b/arch/x86/kvm/vmx/evmcs.h >> @@ -195,6 +195,7 @@ static inline void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) {} >> static inline void evmcs_touch_msr_bitmap(void) {} >> #endif /* IS_ENABLED(CONFIG_HYPERV) */ >> >> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr); >> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu); >> int nested_enable_evmcs(struct kvm_vcpu *vcpu, >> uint16_t *vmcs_version); >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> index 9214b3aea1f9..ee8dda7d8a03 100644 >> --- a/arch/x86/kvm/vmx/nested.c >> +++ b/arch/x86/kvm/vmx/nested.c >> @@ -1765,26 +1765,21 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu, >> bool from_launch) >> { >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> - struct hv_vp_assist_page assist_page; >> + u64 evmptr; > > I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short. > Sure. >> >> if (likely(!vmx->nested.enlightened_vmcs_enabled)) >> return 1; >> >> - if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page))) >> + if (!nested_enlightened_vmentry(vcpu, &evmptr)) >> return 1; >> >> - if (unlikely(!assist_page.enlighten_vmentry)) >> - return 1; >> - >> - if (unlikely(assist_page.current_nested_vmcs != >> - vmx->nested.hv_evmcs_vmptr)) { >> - >> + if (unlikely(evmptr != vmx->nested.hv_evmcs_vmptr)) { >> if (!vmx->nested.hv_evmcs) >> vmx->nested.current_vmptr = -1ull; >> >> nested_release_evmcs(vcpu); >> >> - if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs), >> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(evmptr), >> &vmx->nested.hv_evmcs_map)) >> return 0; >> >> @@ -1826,7 +1821,7 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu, >> */ >> vmx->nested.hv_evmcs->hv_clean_fields &= >> ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL; >> - vmx->nested.hv_evmcs_vmptr = assist_page.current_nested_vmcs; >> + vmx->nested.hv_evmcs_vmptr = evmptr; >> >> /* >> * Unlike normal vmcs12, enlightened vmcs12 is not fully >> @@ -4331,6 +4326,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> u32 zero = 0; >> gpa_t vmptr; >> + u64 evmptr; > > I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short. > Sure. >> >> if (!nested_vmx_check_permission(vcpu)) >> return 1; >> @@ -4346,7 +4342,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) >> return nested_vmx_failValid(vcpu, >> VMXERR_VMCLEAR_VMXON_POINTER); >> >> - if (vmx->nested.hv_evmcs_map.hva) { >> + if (unlikely(vmx->nested.enlightened_vmcs_enabled) && >> + nested_enlightened_vmentry(vcpu, &evmptr)) { >> if (vmptr == vmx->nested.hv_evmcs_vmptr) > > Shouldn’t you also remove the (vmptr == vmx->nested.hv_evmcs_vmptr) condition? > To my understanding, vmx->nested.hv_evmcs_vmptr represents the address of the loaded eVMCS on current vCPU. > But according to commit message, it is valid for vCPU to perform > VMCLEAR on eVMCS that differ from loaded eVMCS on vCPU. > E.g. The current vCPU may even have vmx->nested.hv_evmcs_vmptr set to > -1ull. nested_release_evmcs() unmaps current eVMCS on the vCPU, we can't easily unmap eVMCS on other vCPUs without somehow synchronizing with them. Actually, if we remove nested_release_evmcs() from here nothing is going to change: the fact that eVMCS is mapped doesn't hurt much. If the next enlightened vmentry is going to happen with the same evmptr we'll have to map it back, in case a different one will be used - we'll unmap the old. In KVM, there's nothing we *have* to do to transition an eVMCS from active to non-activer state. We, for example, don't enforce the requirement that it can only be active on one vCPU at a time. Enforcing it is expensive (some synchronization is required) and if L1 hypervisor is misbehaving than, well, things are not going to work anyway. That said I'm ok with dropping nested_release_evmcs() for consistency but we can't just drop 'if (vmptr == vmx->nested.hv_evmcs_vmptr)'. Thanks for your review! -- Vitaly