Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp3516561ybn; Fri, 27 Sep 2019 07:31:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqwKQSbAn9QrzYnAdfagIylQkMqZD9BYz3z1xffhVVTEmegQgkvodtrs5UlyRnhp3I/yBNo8 X-Received: by 2002:a17:906:6084:: with SMTP id t4mr7874686ejj.164.1569594663307; Fri, 27 Sep 2019 07:31:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569594663; cv=none; d=google.com; s=arc-20160816; b=rKadPXmBdNKMHmYK2tP2zD7ph8ggtonHO58pjIb8V+l0/L2C6gqrLryBGvDlfb/cg9 r/TDil8b9fTVI0LEVB9Xdanv+0g+huNFl/d/H29eULRCT7bm3vNZKh1P403tUW+WW/F+ f3YcuZYp0O+KPi9lzSM/sTLLnMcP3I7jCZyVB8EIgyKfhs/ElbjfBbSnRlEL0ISdrJKN S8GqLtfAj/Pt+sIT49sryqsRhCKVRU0TzHLkqr6Gn4c/ARVoc05e0gbkqfYBiRWJneTi Iu+LwvJ5CC2ugpkC1tLHLjuuc9Qkj9UkiDldElP2qFAuJTXe76a9wa7CVpm3STq3lnny p2xA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=WyiRcwBWS3G3MIkMKMxgEaFJC3MIFKkL6k6H+dorlts=; b=CHsvg3vfcnlX13hQ/LqX6sjBn50o+cEyF4wKjt0osg+eV/nGW0Bc0inP+GCW6CJuS6 vV7jogax36xLdgStiIXlxem3OsHWK58m3XHDFZhZ4DZkR34oYWc/HZyx9VO4cXzWG5G/ /U+I0NzFkRa/ScrPgS7gkbA3uOtadla4Rd4x3f49uanKay7aeKshyrZTLFHlBbLx+C6Z 61VanAHKl3/8QDOuavSwYVxMgm3jHYzYCY0gARguD1u791Ey8Mp0nOHC/TjpLtiC0A2T ZlNuQ4ShvLjEu3rTWh6ntKjTa6q7iHKSinvYscllvGlyplGpF7kb8JIMX7+Tjv92UO8H 7rtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z42si1682843edz.23.2019.09.27.07.30.38; Fri, 27 Sep 2019 07:31:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728055AbfI0O11 (ORCPT + 99 others); Fri, 27 Sep 2019 10:27:27 -0400 Received: from mga06.intel.com ([134.134.136.31]:26680 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727128AbfI0O11 (ORCPT ); Fri, 27 Sep 2019 10:27:27 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Sep 2019 07:27:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,555,1559545200"; d="scan'208";a="190331489" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga007.fm.intel.com with ESMTP; 27 Sep 2019 07:27:25 -0700 Date: Fri, 27 Sep 2019 07:27:25 -0700 From: Sean Christopherson To: Liran Alon Cc: Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Reto Buerki Subject: Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Message-ID: <20190927142725.GC24889@linux.intel.com> References: <20190926214302.21990-1-sean.j.christopherson@intel.com> <20190926214302.21990-2-sean.j.christopherson@intel.com> <68340081-0094-4A74-9B33-3431F39659AA@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <68340081-0094-4A74-9B33-3431F39659AA@oracle.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 27, 2019 at 03:06:02AM +0300, Liran Alon wrote: > > > > On 27 Sep 2019, at 0:43, Sean Christopherson wrote: > > > > Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter > > isntead of deferring the VMWRITE until vmx_set_cr3(). If the VMWRITE > > is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it > > refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated > > VM-Exit occurs without actually entering L2, e.g. if the nested run > > is squashed because L2 is being put into HLT. > > I would rephrase to “If an emulated VMEntry is squashed because L1 sets > vmcs12->guest_activity_state to HLT”. I think it’s a bit more explicit. > > > > > In an ideal world where EPT *requires* unrestricted guest (and vice > > versa), VMX could handle CR3 similar to how it handles RSP and RIP, > > e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run(). But > > the unrestricted guest silliness complicates the dirty tracking logic > > to the point that explicitly handling vmcs02.GUEST_CR3 during nested > > VM-Enter is a simpler overall implementation. > > > > Cc: stable@vger.kernel.org > > Reported-by: Reto Buerki > > Signed-off-by: Sean Christopherson > > --- > > arch/x86/kvm/vmx/nested.c | 8 ++++++++ > > arch/x86/kvm/vmx/vmx.c | 9 ++++++--- > > 2 files changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index 41abc62c9a8a..971a24134081 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > > entry_failure_code)) > > return -EINVAL; > > > > + /* > > + * Immediately write vmcs02.GUEST_CR3. It will be propagated to vmcs12 > > + * on nested VM-Exit, which can occur without actually running L2, e.g. > > + * if L2 is entering HLT state, and thus without hitting vmx_set_cr3(). > > + */ > > If I understand correctly, it’s not exactly if L2 is entering HLT state in > general. (E.g. issue doesn’t occur if L2 runs HLT directly which is not > configured to be intercepted by vmcs12). It’s specifically when L1 enters L2 > with a HLT guest-activity-state. I suggest rephrasing comment. I deliberately worded the comment so that it remains valid if there are more conditions in the future that cause KVM to skip running L2. What if I split the difference and make the changelog more explicit, but leave the comment as is? > > + if (enable_ept) > > + vmcs_writel(GUEST_CR3, vmcs12->guest_cr3); > > + > > /* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */ > > if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) && > > is_pae_paging(vcpu)) { > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index d4575ffb3cec..b530950a9c2b 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) > > { > > struct kvm *kvm = vcpu->kvm; > > unsigned long guest_cr3; > > + bool skip_cr3 = false; > > u64 eptp; > > > > guest_cr3 = cr3; > > @@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) > > spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock); > > } > > > > - if (enable_unrestricted_guest || is_paging(vcpu) || > > - is_guest_mode(vcpu)) > > + if (is_guest_mode(vcpu)) > > + skip_cr3 = true; > > + else if (enable_unrestricted_guest || is_paging(vcpu)) > > guest_cr3 = kvm_read_cr3(vcpu); > > else > > guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr; > > ept_load_pdptrs(vcpu); > > } > > > > - vmcs_writel(GUEST_CR3, guest_cr3); > > + if (!skip_cr3) > > Nit: It’s a matter of taste, but I prefer positive conditions. i.e. “bool > write_guest_cr3”. > > Anyway, code seems valid to me. Nice catch. > Reviewed-by: Liran Alon > > -Liran > > > + vmcs_writel(GUEST_CR3, guest_cr3); > > } > > > > int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) > > -- > > 2.22.0 > > >