Received: by 10.192.165.148 with SMTP id m20csp3625125imm; Mon, 7 May 2018 15:57:43 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqfGHGLdpA409Kf9GHbtNwehuNHUWoifp9qJrrWDLVVqTZSH/r4dNdGPHaogZc50dCq4mLU X-Received: by 2002:a17:902:778a:: with SMTP id o10-v6mr16965868pll.214.1525733863422; Mon, 07 May 2018 15:57:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525733863; cv=none; d=google.com; s=arc-20160816; b=U8aGpBUn8LotDaFVZg0Z6VPJ9JuTTP8Z+av+HQb9zMx6ii6wx2rBserHr+Zokhf4Ee 0wSIINn8jP7XoOjfzpYLBLAFMR2J7iNknBbgu9tbBdrD8+Cp89i62nJkr7vUy7ls3pxE qbZkyfUfSuoCHzm6a7d80MLgfs1O5iJe3PIHXK9qhdw4ofeernY9n9GagY/BJYhI21uz +VawIrp+ZrSaf4tgYPHMe9XnBFKM+lkZqBnja5GYMK3ZpvuKfalZHUy2yG1vJ6ENBgnz Io9pIxVT/CVv7cJigGxigg3uzl/uTjZ4wi4bdoZs2+aLkFIFHFghZ5imwWGvm3i5y9Wj ltgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=5HrsJc1nyth6XrwTejTNpEgppE+t4Rnmc2ey1D0zrsw=; b=yTLEk4SkR7lJaeVsxdFWeapqjjwE3G1+1hbMkeKbNRq/JMRFPHuGm14WZBtKbKjaKy bcD76PRajD3SBXd4A455XBzC1i0XMlYRYrCMX4Z+v4SIiLY1cHHoFsfiWbCANy5xHJlM iV7Yz60d2i3ZCpxnDGTYz9q7no0wHylKfd08P8QCheKuTHQ1h7bC424pSMBFrQwicH+2 3C8dne9mzEM10FHqQzzKL0WBT8yo5Xeh6lyBzivKuppLgmKfDtIVJCPQlSCbSAvTZWEV Apz8SKewUUSgkCW9L6QRaIg0FKTCpkuSixvpZeXGqTIAREEfrZvvLPEE/tCV42eGIRE+ Am/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=G8s3TZLb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g14-v6si18007665pgv.648.2018.05.07.15.57.28; Mon, 07 May 2018 15:57:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=G8s3TZLb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753328AbeEGW5B (ORCPT + 99 others); Mon, 7 May 2018 18:57:01 -0400 Received: from mail-ot0-f195.google.com ([74.125.82.195]:39948 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753153AbeEGW47 (ORCPT ); Mon, 7 May 2018 18:56:59 -0400 Received: by mail-ot0-f195.google.com with SMTP id n1-v6so33940013otf.7 for ; Mon, 07 May 2018 15:56:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5HrsJc1nyth6XrwTejTNpEgppE+t4Rnmc2ey1D0zrsw=; b=G8s3TZLbPfBp2BU8q17k1H2Dk8U86g73/NhP66qA3FNE2CnSI9M5BAo8TqF8/Z0vH5 nanAPL0+9l2l0jjucX9KLDjC4uxE9Q/OnydbGLP/GEEPjBh9KWSH01EgBvz4GMx4M1Iv NNvPypT/LaCWj0GJIB29zEAN9ctZFMZ3rRVSrBdTStPwYuaNx42ttSWTKEyqPvPR6myn k1zhKQxsw6z/A+gACS+EV2g8aT9fotA2jh1a6zfekgsy2xk5rz7yGNhQB6plz0skr5k5 2+61ySmR6QUvfle4qsAWcPPZ3a83QZE157KDyVDO3oerfC3sawW7EINyIC6EgChdFxyy jA0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5HrsJc1nyth6XrwTejTNpEgppE+t4Rnmc2ey1D0zrsw=; b=hS1eh2ckyNK2gmtJ9mhkdCkcOPT3dZ2bWHSO40GgsXkEsIhSj895doyd4fMoF/BtDx SSwgm43pAsxtL8TL+MUIGOLHx+5D4ZSX6+VkB+X2yClivMFgf4NDkHf6aPsk/I2AZz4P nEdu1RZArN93QyKBV61IeHk9rkhLLrvVEH6pNQ4Fv+J4tHl4FBV6Teu9Ui3io7LyDQ7X Bf0rQunjzCO30h0jrYRa96ACiR0KUSuXQHy5yQQ2kmVWd4ikcVN1lJvMfbBPwrrD/R+r WE87EwEvbEv+3MBNSJxiJxpgKPl6OxzGGZn4FF6v2+o0ox61Qr6WYlunwFJ2Hsro8qpW 3Vyw== X-Gm-Message-State: ALQs6tD5Fbqt4X1EC5TSsJWJk0zFHr34loAihoqiF8hGCnamGiVqvRed WDbc/cd/0ofMaGdj7Re16zA4OfsgF5T/5oytO6Q7jQ== X-Received: by 2002:a9d:17c2:: with SMTP id j60-v6mr18795270otj.329.1525733818904; Mon, 07 May 2018 15:56:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.201.52.2 with HTTP; Mon, 7 May 2018 15:56:58 -0700 (PDT) In-Reply-To: References: <1509670249-4907-1-git-send-email-wanpeng.li@hotmail.com> <1509670249-4907-3-git-send-email-wanpeng.li@hotmail.com> <50b82c53-1e57-88a9-25bd-76697bf2d048@oracle.com> From: Jim Mattson Date: Mon, 7 May 2018 15:56:58 -0700 Message-ID: Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure To: Krish Sadhukhan Cc: Wanpeng Li , LKML , kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Moreover, if the VMLAUNCH/VMRESUME is in 32-bit PAE mode, then the PDPTRs should not be reloaded. On Mon, Feb 5, 2018 at 10:44 AM, Jim Mattson wrote: > I realize now that this fix isn't quite right, since it loads > vmcs12->host_cr3 rather than reverting to the CR3 that was loaded at the > time of VMLAUNCH/VMRESUME. In the case of VMfailValid(VM entry with inval= id > VMX-control field(s)), none of the VMCS12 host state fields should be > loaded. See the pseudocode for VMLAUNCH/VMRESUME in volume 3 of the SDM. > > > On Wed, Nov 8, 2017 at 1:47 PM Jim Mattson wrote: > >> I realize now that there are actually many other problems with >> deferring some control field checks to the hardware VM-entry of >> vmcs02. When there is an invalid control field, the vCPU should just >> fall through to the next instruction, without any state modifiation >> other than the ALU flags and the VM-instruction error field of the >> current VMCS. However, in preparation for the hardware VM-entry of >> vmcs02, we have already changed quite a bit of the vCPU state: the >> MSRs on the VM-entry MSR-load list, DR7, IA32_DEBUGCTL, the entire >> FLAGS register, etc. All of these changes should be undone, and we're >> not prepared to do that. (For instance, what was the old DR7 value >> that needs to be restored?) > > >> On Fri, Nov 3, 2017 at 5:07 PM, Krish Sadhukhan >> wrote: >> > On 11/02/2017 05:50 PM, Wanpeng Li wrote: >> > >> >> From: Wanpeng Li >> >> >> >> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUM= E >> >> failure >> >> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_contro= ls >> in >> >> L1) >> >> null pointer deference and also L0 calltrace when EPT=3D0 on both L0 = and >> L1. >> >> >> >> In L1: >> >> >> >> BUG: unable to handle kernel paging request at ffffffffc015bf8f >> >> IP: vmx_vcpu_run+0x202/0x510 [kvm_intel] >> >> PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af91= 61 >> >> Oops: 0003 [#1] PREEMPT SMP >> >> CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6 >> >> RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel] >> >> Call Trace: >> >> WARNING: kernel stack frame pointer at ffffb86f4988bc18 in >> >> qemu-system-x86:1798 has bad value 0000000000000002 >> >> >> >> In L0: >> >> >> >> -----------[ cut here ]------------ >> >> WARNING: CPU: 6 PID: 4460 at >> /home/kernel/linux/arch/x86/kvm//vmx.c:9845 >> >> vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] >> >> CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE >> >> 4.14.0-rc7+ #25 >> >> RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] >> >> Call Trace: >> >> paging64_page_fault+0x500/0xde0 [kvm] >> >> ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm] >> >> ? nonpaging_page_fault+0x3b0/0x3b0 [kvm] >> >> ? __asan_storeN+0x12/0x20 >> >> ? paging64_gva_to_gpa+0xb0/0x120 [kvm] >> >> ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm] >> >> ? lock_acquire+0x2c0/0x2c0 >> >> ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel] >> >> ? vmx_get_segment+0x2a6/0x310 [kvm_intel] >> >> ? sched_clock+0x1f/0x30 >> >> ? check_chain_key+0x137/0x1e0 >> >> ? __lock_acquire+0x83c/0x2420 >> >> ? kvm_multiple_exception+0xf2/0x220 [kvm] >> >> ? debug_check_no_locks_freed+0x240/0x240 >> >> ? debug_smp_processor_id+0x17/0x20 >> >> ? __lock_is_held+0x9e/0x100 >> >> kvm_mmu_page_fault+0x90/0x180 [kvm] >> >> kvm_handle_page_fault+0x15c/0x310 [kvm] >> >> ? __lock_is_held+0x9e/0x100 >> >> handle_exception+0x3c7/0x4d0 [kvm_intel] >> >> vmx_handle_exit+0x103/0x1010 [kvm_intel] >> >> ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm] >> >> >> >> The commit avoids to load host state of vmcs12 as vmcs01's guest stat= e >> >> since vmcs12 is not modified (except for the VM-instruction error >> >> field) >> >> if the checking of vmcs control area fails. However, the mmu context = is >> >> switched to nested mmu in prepare_vmcs02() and it will not be reloade= d >> >> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESU= ME >> >> fails. This patch fixes it by reloading mmu context when nested >> >> VMLAUNCH/VMRESUME fails. >> >> >> >> Cc: Paolo Bonzini >> >> Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 >> >> Cc: Jim Mattson >> >> Signed-off-by: Wanpeng Li >> >> --- >> >> v3 -> v4: >> >> * move it to a new function load_vmcs12_mmu_host_state >> >> >> >> arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------ >> >> 1 file changed, 22 insertions(+), 12 deletions(-) >> >> >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> >> index 6cf3972..8aefb91 100644 >> >> --- a/arch/x86/kvm/vmx.c >> >> +++ b/arch/x86/kvm/vmx.c >> >> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu >> *vcpu, >> >> struct vmcs12 *vmcs12, >> >> kvm_clear_interrupt_queue(vcpu); >> >> } >> >> +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu, >> >> + struct vmcs12 *vmcs12) >> >> +{ >> >> + u32 entry_failure_code; >> >> + >> >> + nested_ept_uninit_mmu_context(vcpu); >> >> + >> >> + /* >> >> + * Only PDPTE load can fail as the value of cr3 was checked o= n >> >> entry and >> >> + * couldn't have changed. >> >> + */ >> >> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, >> >> &entry_failure_code)) >> >> + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL= ); >> >> + >> >> + if (!enable_ept) >> >> + vcpu->arch.walk_mmu->inject_page_fault =3D >> >> kvm_inject_page_fault; >> >> +} >> >> + >> >> /* >> >> * A part of what we need to when the nested L2 guest exits and we >> want >> >> to >> >> * run its L1 parent, is to reset L1's guest state to the host stat= e >> >> specified >> >> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct >> kvm_vcpu >> >> *vcpu, >> >> struct vmcs12 *vmcs12) >> >> { >> >> struct kvm_segment seg; >> >> - u32 entry_failure_code; >> >> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) >> >> vcpu->arch.efer =3D vmcs12->host_ia32_efer; >> >> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct >> >> kvm_vcpu *vcpu, >> >> vcpu->arch.cr4_guest_owned_bits =3D >> >> ~vmcs_readl(CR4_GUEST_HOST_MASK); >> >> vmx_set_cr4(vcpu, vmcs12->host_cr4); >> >> - nested_ept_uninit_mmu_context(vcpu); >> >> - >> >> - /* >> >> - * Only PDPTE load can fail as the value of cr3 was checked o= n >> >> entry and >> >> - * couldn't have changed. >> >> - */ >> >> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, >> >> &entry_failure_code)) >> >> - nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL= ); >> >> - >> >> - if (!enable_ept) >> >> - vcpu->arch.walk_mmu->inject_page_fault =3D >> >> kvm_inject_page_fault; >> >> + load_vmcs12_mmu_host_state(vcpu, vmcs12); >> >> if (enable_vpid) { >> >> /* >> >> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu >> >> *vcpu, u32 exit_reason, >> >> * accordingly. >> >> */ >> >> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD= ); >> >> + >> >> + load_vmcs12_mmu_host_state(vcpu, vmcs12); >> >> + >> >> /* >> >> * The emulated instruction was already skipped in >> >> * nested_vmx_run, but the updated RIP was never >> > >> > Reviewed-by: Krish Sadhukhan > >