Received: by 10.223.176.5 with SMTP id f5csp2841945wra; Mon, 5 Feb 2018 10:46:39 -0800 (PST) X-Google-Smtp-Source: AH8x224EeXa3cpxrsC9Y8x1yAVyuSYdqvK5lsytcEceaXWNTT3nYxwhvAcUo+zvPa3mCX2Hdl1ms X-Received: by 10.98.196.204 with SMTP id h73mr14842469pfk.143.1517856399071; Mon, 05 Feb 2018 10:46:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517856399; cv=none; d=google.com; s=arc-20160816; b=MC0GAnF7oU1fgmSrMeunpqznnYJ9wSqg0mG91hGjikr6R219VGuO4ebqxxzwZD++8I OrWpXUhYEoruT9TYOVo8nqabq/9nnoCfVKaROb2Lg6lzqZP6vDaJFb3IsrDYBRPN2rH7 HooIWgx7Nszm08ZdzG+QuC58vpxSthhaRmD6xD2WRS6WXpCQFRDPDIcFfKVIoDwB9ED8 8r4xzkyKlX52qGRSLaEn5yfQfeyzOoJMvTMwLK874C1ruGcLLtYQesZOWjsHTiHeo9kX M5g9ote8UK68T5dMBW3mFLYDW+wi/0OCYxlGiS27Lhges1Cqd9jGSWIdm8uQpPIXiHpJ X65g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature:arc-authentication-results; bh=LkVMDuYq4jadSwUP4adx3G5f4sM6rvmZ6ntLw78I6iw=; b=KQ1l+zmT3RFkz2Szw9FnS2aywy4aNGpxGSu8zr05gaVyk7ajBGBpZXqOPk844X7ruv 56fH8JH5kJZzlmBmg++9FTv8Sd1SJrGh91UqyZSoMEHvM34ie44CJrYbiuE0LafoIzlk Uj+jKsh0+nzmD5R9Qwhif3KhbMNnBuy0YAVELG7Bk4xvz+vTaOaZ9qE8X4tCv+7BJt8J PD1TmLUjo0b2HdyvQI9KRUYifyLYgNAPcVUdfzH+ZxkPYbdKnLpOj6z6HPUCS7ydvj9v 4o6KaeRmcn71SL74ZKJqjVDnGPnVl6otSwUnvSoc2Dy4/3I6YxtpK7/Sst7TIS8adoBU g73g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cGUobQSc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m14si179564pgd.207.2018.02.05.10.46.24; Mon, 05 Feb 2018 10:46:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cGUobQSc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753899AbeBESoq (ORCPT + 99 others); Mon, 5 Feb 2018 13:44:46 -0500 Received: from mail-it0-f68.google.com ([209.85.214.68]:34436 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753872AbeBESoc (ORCPT ); Mon, 5 Feb 2018 13:44:32 -0500 Received: by mail-it0-f68.google.com with SMTP id j21-v6so8974131ita.1 for ; Mon, 05 Feb 2018 10:44:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=LkVMDuYq4jadSwUP4adx3G5f4sM6rvmZ6ntLw78I6iw=; b=cGUobQScLPhgyUV0BEdWYSB3xsFychBqCAS68ZeYDi2aEl5SGL7yZ68UwjuKx8UeaW T3Pc5fPOa7EJfCrU1h7D0/XrA++MgqveAvPRIDcoQIiPtVNLjFJkcBR9bVe3xbZ2INyw 6jq0j0EfHHeLazLrTZjDb52ru5ZEiSrJyGCl8cVfZZDd/wXNi4120Od7lwSYkO0YAmrk V41/CMtcbdsaPrRAL9sdpU7/sQkO6vBKsEhtRXaLzYizqZlHjH+3xs646zwsBTuyl0aE zzkZ1cv+gssXTV1Pp5jY9IKkzu/vvxszF9psB9gp3XBwIfLtm60tdFmkD+4aLBAT+1hw HIfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=LkVMDuYq4jadSwUP4adx3G5f4sM6rvmZ6ntLw78I6iw=; b=ekRro1RH6Xcjd71VRkwvTH2sLQp41wMkmk/4EucIUTsR7nAySgYMPJt0dIwsY14wdB cZ9NhsdqOpWac9mWG+js1/ujXtE3xCx7K1d1YIosdAd6WAA+3r/F/Uv/0BfaZnm+xReP rnQLPxhGWIoO8D7a/rco/wIiuI33Xvhw/1TvYq/dpM7xnSSEEPRsN5tvarkh9jcF2iWT 2renxbpgaeDPiUANWK5eJ8/6GGOBqAgpIFNXxamlGohUIiWKkjNb6E0K9RWRmIYUcEHW dcj4cfk+jDGIAmiHo+IEu19qFAmbk8jzzPTgv8zhmF51dQuUqpMMV9mgwuqPYUaLWoDi VpMg== X-Gm-Message-State: APf1xPBbdHVNfB9RiM94H4jgoyCHUI4xSV6c51+/UpDWNrHp2lMjyenH Nss8vUHaVRM2C9j9ZQ7037T1NZ8Ao3YFU4ZkISzLlw== X-Received: by 10.36.28.68 with SMTP id c65mr414784itc.142.1517856268618; Mon, 05 Feb 2018 10:44:28 -0800 (PST) MIME-Version: 1.0 References: <1509670249-4907-1-git-send-email-wanpeng.li@hotmail.com> <1509670249-4907-3-git-send-email-wanpeng.li@hotmail.com> <50b82c53-1e57-88a9-25bd-76697bf2d048@oracle.com> In-Reply-To: From: Jim Mattson Date: Mon, 05 Feb 2018 18:44:17 +0000 Message-ID: Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure To: Krish Sadhukhan Cc: Wanpeng Li , LKML , kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I realize now that this fix isn't quite right, since it loads vmcs12->host_cr3 rather than reverting to the CR3 that was loaded at the time of VMLAUNCH/VMRESUME. In the case of VMfailValid(VM entry with invalid VMX-control field(s)), none of the VMCS12 host state fields should be loaded. See the pseudocode for VMLAUNCH/VMRESUME in volume 3 of the SDM. On Wed, Nov 8, 2017 at 1:47 PM Jim Mattson wrote: > I realize now that there are actually many other problems with > deferring some control field checks to the hardware VM-entry of > vmcs02. When there is an invalid control field, the vCPU should just > fall through to the next instruction, without any state modifiation > other than the ALU flags and the VM-instruction error field of the > current VMCS. However, in preparation for the hardware VM-entry of > vmcs02, we have already changed quite a bit of the vCPU state: the > MSRs on the VM-entry MSR-load list, DR7, IA32_DEBUGCTL, the entire > FLAGS register, etc. All of these changes should be undone, and we're > not prepared to do that. (For instance, what was the old DR7 value > that needs to be restored?) > On Fri, Nov 3, 2017 at 5:07 PM, Krish Sadhukhan > wrote: > > On 11/02/2017 05:50 PM, Wanpeng Li wrote: > > > >> From: Wanpeng Li > >> > >> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME > >> failure > >> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_control= s > in > >> L1) > >> null pointer deference and also L0 calltrace when EPT=3D0 on both L0 a= nd > L1. > >> > >> In L1: > >> > >> BUG: unable to handle kernel paging request at ffffffffc015bf8f > >> IP: vmx_vcpu_run+0x202/0x510 [kvm_intel] > >> PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af916= 1 > >> Oops: 0003 [#1] PREEMPT SMP > >> CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6 > >> RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel] > >> Call Trace: > >> WARNING: kernel stack frame pointer at ffffb86f4988bc18 in > >> qemu-system-x86:1798 has bad value 0000000000000002 > >> > >> In L0: > >> > >> -----------[ cut here ]------------ > >> WARNING: CPU: 6 PID: 4460 at > /home/kernel/linux/arch/x86/kvm//vmx.c:9845 > >> vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] > >> CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE > >> 4.14.0-rc7+ #25 > >> RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] > >> Call Trace: > >> paging64_page_fault+0x500/0xde0 [kvm] > >> ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm] > >> ? nonpaging_page_fault+0x3b0/0x3b0 [kvm] > >> ? __asan_storeN+0x12/0x20 > >> ? paging64_gva_to_gpa+0xb0/0x120 [kvm] > >> ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm] > >> ? lock_acquire+0x2c0/0x2c0 > >> ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel] > >> ? vmx_get_segment+0x2a6/0x310 [kvm_intel] > >> ? sched_clock+0x1f/0x30 > >> ? check_chain_key+0x137/0x1e0 > >> ? __lock_acquire+0x83c/0x2420 > >> ? kvm_multiple_exception+0xf2/0x220 [kvm] > >> ? debug_check_no_locks_freed+0x240/0x240 > >> ? debug_smp_processor_id+0x17/0x20 > >> ? __lock_is_held+0x9e/0x100 > >> kvm_mmu_page_fault+0x90/0x180 [kvm] > >> kvm_handle_page_fault+0x15c/0x310 [kvm] > >> ? __lock_is_held+0x9e/0x100 > >> handle_exception+0x3c7/0x4d0 [kvm_intel] > >> vmx_handle_exit+0x103/0x1010 [kvm_intel] > >> ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm] > >> > >> The commit avoids to load host state of vmcs12 as vmcs01's guest state > >> since vmcs12 is not modified (except for the VM-instruction error > field) > >> if the checking of vmcs control area fails. However, the mmu context i= s > >> switched to nested mmu in prepare_vmcs02() and it will not be reloaded > >> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUM= E > >> fails. This patch fixes it by reloading mmu context when nested > >> VMLAUNCH/VMRESUME fails. > >> > >> Cc: Paolo Bonzini > >> Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > >> Cc: Jim Mattson > >> Signed-off-by: Wanpeng Li > >> --- > >> v3 -> v4: > >> * move it to a new function load_vmcs12_mmu_host_state > >> > >> arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------ > >> 1 file changed, 22 insertions(+), 12 deletions(-) > >> > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> index 6cf3972..8aefb91 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu > *vcpu, > >> struct vmcs12 *vmcs12, > >> kvm_clear_interrupt_queue(vcpu); > >> } > >> +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu, > >> + struct vmcs12 *vmcs12) > >> +{ > >> + u32 entry_failure_code; > >> + > >> + nested_ept_uninit_mmu_context(vcpu); > >> + > >> + /* > >> + * Only PDPTE load can fail as the value of cr3 was checked on > >> entry and > >> + * couldn't have changed. > >> + */ > >> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, > >> &entry_failure_code)) > >> + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL)= ; > >> + > >> + if (!enable_ept) > >> + vcpu->arch.walk_mmu->inject_page_fault =3D > >> kvm_inject_page_fault; > >> +} > >> + > >> /* > >> * A part of what we need to when the nested L2 guest exits and we > want > >> to > >> * run its L1 parent, is to reset L1's guest state to the host state > >> specified > >> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct > kvm_vcpu > >> *vcpu, > >> struct vmcs12 *vmcs12) > >> { > >> struct kvm_segment seg; > >> - u32 entry_failure_code; > >> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) > >> vcpu->arch.efer =3D vmcs12->host_ia32_efer; > >> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct > >> kvm_vcpu *vcpu, > >> vcpu->arch.cr4_guest_owned_bits =3D > >> ~vmcs_readl(CR4_GUEST_HOST_MASK); > >> vmx_set_cr4(vcpu, vmcs12->host_cr4); > >> - nested_ept_uninit_mmu_context(vcpu); > >> - > >> - /* > >> - * Only PDPTE load can fail as the value of cr3 was checked on > >> entry and > >> - * couldn't have changed. > >> - */ > >> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, > >> &entry_failure_code)) > >> - nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL)= ; > >> - > >> - if (!enable_ept) > >> - vcpu->arch.walk_mmu->inject_page_fault =3D > >> kvm_inject_page_fault; > >> + load_vmcs12_mmu_host_state(vcpu, vmcs12); > >> if (enable_vpid) { > >> /* > >> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu > >> *vcpu, u32 exit_reason, > >> * accordingly. > >> */ > >> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD)= ; > >> + > >> + load_vmcs12_mmu_host_state(vcpu, vmcs12); > >> + > >> /* > >> * The emulated instruction was already skipped in > >> * nested_vmx_run, but the updated RIP was never > > > > Reviewed-by: Krish Sadhukhan