Received: by 10.223.164.202 with SMTP id h10csp155589wrb; Wed, 8 Nov 2017 13:48:27 -0800 (PST) X-Google-Smtp-Source: ABhQp+Q46h9tN+uzRcEV4H7QTokrMYt0BZFjbSDHfl70Jqy2pbSqtVCdepZM+gaokdIpDc8eFRWl X-Received: by 10.159.216.145 with SMTP id s17mr1648053plp.297.1510177707171; Wed, 08 Nov 2017 13:48:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510177707; cv=none; d=google.com; s=arc-20160816; b=PhOcW62TfHTfHNXz5ETGIjsgShOAx6HdYNgDyH9gnmW3X2BhPXHedzsPfyEMpXAiEB +z/4OOZv0D+tnhkLepLblq+V4sE0xEJYSaB78evdjq4mUxy/1zoxK+kzQvcku6lLOknE 5l24XrRX/ry7VcNZVKl3kW4Ihda2VlTzovgn/WM2MomGVMGBl52LkzqmD7WlcEH34fXe imDrZCF6zSWFRWTyzTXeLvE7cK8Vgc1gwVU4lGa9Pi5mRUfBRofLRA4P37FByJQrJCBC r4np2xQ6vjh3Ldry3sO6bgD5EuiyWm0ah8y7nTrsNrlqNHjRazVG82+GeNkUo1SAik3Y LFAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=GG1P9kvBTR6ztVqXkr3FLwg6tpoeyoROfR5S5jVzSbo=; b=YxXhXuq5IF92Uvqr93Iugd3B6GpybFQ9bGtAPYUnxiA9cgB9sSq4W83I53vhYrDtDY xLhHS/DxO1FZcgX+tfA2uv0STTY0jY+amyUV6szan4MR0nNg+WQbakwifGt1Qf6RybrY 7pNUETgqBkYWw2xx9rXif1K2kHWBOJEUYNXKZlMMHCoUPHaZTGXHjHIo13ZxKMXJuocT 82C/X3KGDhzwiV8Y7dIJBa/Umk7LVYNig8yfjQAw1mpI8PpAVvzfiq978O9VSm2n2cRV 4tDFanDdlYYB8VsmOSIeucpgJHKnILE0Akm+ctl9liedwhyur7YnCD1GOeu5UzVer7Lu 9QSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OLZ5wnw2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e91si4600041plb.350.2017.11.08.13.48.15; Wed, 08 Nov 2017 13:48:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OLZ5wnw2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752802AbdKHVra (ORCPT + 84 others); Wed, 8 Nov 2017 16:47:30 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:54489 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752595AbdKHVr2 (ORCPT ); Wed, 8 Nov 2017 16:47:28 -0500 Received: by mail-wr0-f196.google.com with SMTP id l22so3746734wrc.11 for ; Wed, 08 Nov 2017 13:47:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=GG1P9kvBTR6ztVqXkr3FLwg6tpoeyoROfR5S5jVzSbo=; b=OLZ5wnw2+56QHPEkcBfes9hYvEFPdTvdah1tTzPDXco9WNAo5MyI6SZrr/9SAVmecC nhDFTw5/vmhYr52rqiVxqplsD8Jp53bp4pP+G18yUJlP8BoO9LTE/QhU8mL8KJ4gHBrM 6cLibK2npkhxICNWP4emiQCmUlUTFmCe8W6FCzqiTxO0Ax/qxpElkfXzp3DksBn+HuuG HIS6+Lv6X2meDUHFz9KoMy7wfK8cOfixIUdg2HfGzj2Elk25jo8Vv0SopkwQD70oZ/CO CMPtY/IEEfHLpuDOS7JXEOpWptVlzwQc4u787iL5HyFKNAVTHQo4KA5v26fragSURrOc F3ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=GG1P9kvBTR6ztVqXkr3FLwg6tpoeyoROfR5S5jVzSbo=; b=fd+tDSDRW2WUIl4OsmUTusEu05uDnCie8+5sDNCWMZEFWykZlKXaWOKaQbu7h1aRRu Gnj/YoaDuNK2jiksJcqdN9Hvzjteao79Afu9PdLafCn9rsvUdIiAa56tOldV9IqXFjWs MwJKB4QS3sAXYABylEO7o6D3XxY2EGj8x8tJCEPOin1rFw3m/m4HzZ+a3S/n3EP1oYnb hmRpV5RNa0TyCgi5hz65EYws9I/xMVTBOrvA4izV9G+/qB9Z7O12rYhe3+lfjgnFPSJf FrPtNY7/PHlhgOIL/TgGrkb2mUW0QBpHvlemSgFk8L28d1tlmV0FffOwLqPb+K3DzneE KHAQ== X-Gm-Message-State: AJaThX4Ss9kx+ur4z/evxe4bwSBjkQrWvOPk+VKQ2AjSwoDQ60eROBRR fVZxdfBkBtQKkGxzUFt6GNydXav/hToPa4aj5+aOkg== X-Received: by 10.223.187.3 with SMTP id r3mr1545240wrg.34.1510177646979; Wed, 08 Nov 2017 13:47:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.155.157 with HTTP; Wed, 8 Nov 2017 13:47:26 -0800 (PST) In-Reply-To: <50b82c53-1e57-88a9-25bd-76697bf2d048@oracle.com> References: <1509670249-4907-1-git-send-email-wanpeng.li@hotmail.com> <1509670249-4907-3-git-send-email-wanpeng.li@hotmail.com> <50b82c53-1e57-88a9-25bd-76697bf2d048@oracle.com> From: Jim Mattson Date: Wed, 8 Nov 2017 13:47:26 -0800 Message-ID: Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure To: Krish Sadhukhan Cc: Wanpeng Li , LKML , kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I realize now that there are actually many other problems with deferring some control field checks to the hardware VM-entry of vmcs02. When there is an invalid control field, the vCPU should just fall through to the next instruction, without any state modifiation other than the ALU flags and the VM-instruction error field of the current VMCS. However, in preparation for the hardware VM-entry of vmcs02, we have already changed quite a bit of the vCPU state: the MSRs on the VM-entry MSR-load list, DR7, IA32_DEBUGCTL, the entire FLAGS register, etc. All of these changes should be undone, and we're not prepared to do that. (For instance, what was the old DR7 value that needs to be restored?) On Fri, Nov 3, 2017 at 5:07 PM, Krish Sadhukhan wrote: > On 11/02/2017 05:50 PM, Wanpeng Li wrote: > >> From: Wanpeng Li >> >> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME >> failure >> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls = in >> L1) >> null pointer deference and also L0 calltrace when EPT=3D0 on both L0 and= L1. >> >> In L1: >> >> BUG: unable to handle kernel paging request at ffffffffc015bf8f >> IP: vmx_vcpu_run+0x202/0x510 [kvm_intel] >> PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161 >> Oops: 0003 [#1] PREEMPT SMP >> CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6 >> RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel] >> Call Trace: >> WARNING: kernel stack frame pointer at ffffb86f4988bc18 in >> qemu-system-x86:1798 has bad value 0000000000000002 >> >> In L0: >> >> -----------[ cut here ]------------ >> WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:98= 45 >> vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] >> CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE >> 4.14.0-rc7+ #25 >> RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] >> Call Trace: >> paging64_page_fault+0x500/0xde0 [kvm] >> ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm] >> ? nonpaging_page_fault+0x3b0/0x3b0 [kvm] >> ? __asan_storeN+0x12/0x20 >> ? paging64_gva_to_gpa+0xb0/0x120 [kvm] >> ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm] >> ? lock_acquire+0x2c0/0x2c0 >> ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel] >> ? vmx_get_segment+0x2a6/0x310 [kvm_intel] >> ? sched_clock+0x1f/0x30 >> ? check_chain_key+0x137/0x1e0 >> ? __lock_acquire+0x83c/0x2420 >> ? kvm_multiple_exception+0xf2/0x220 [kvm] >> ? debug_check_no_locks_freed+0x240/0x240 >> ? debug_smp_processor_id+0x17/0x20 >> ? __lock_is_held+0x9e/0x100 >> kvm_mmu_page_fault+0x90/0x180 [kvm] >> kvm_handle_page_fault+0x15c/0x310 [kvm] >> ? __lock_is_held+0x9e/0x100 >> handle_exception+0x3c7/0x4d0 [kvm_intel] >> vmx_handle_exit+0x103/0x1010 [kvm_intel] >> ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm] >> >> The commit avoids to load host state of vmcs12 as vmcs01's guest state >> since vmcs12 is not modified (except for the VM-instruction error field) >> if the checking of vmcs control area fails. However, the mmu context is >> switched to nested mmu in prepare_vmcs02() and it will not be reloaded >> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME >> fails. This patch fixes it by reloading mmu context when nested >> VMLAUNCH/VMRESUME fails. >> >> Cc: Paolo Bonzini >> Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 >> Cc: Jim Mattson >> Signed-off-by: Wanpeng Li >> --- >> v3 -> v4: >> * move it to a new function load_vmcs12_mmu_host_state >> >> arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------ >> 1 file changed, 22 insertions(+), 12 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 6cf3972..8aefb91 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu *vcp= u, >> struct vmcs12 *vmcs12, >> kvm_clear_interrupt_queue(vcpu); >> } >> +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu, >> + struct vmcs12 *vmcs12) >> +{ >> + u32 entry_failure_code; >> + >> + nested_ept_uninit_mmu_context(vcpu); >> + >> + /* >> + * Only PDPTE load can fail as the value of cr3 was checked on >> entry and >> + * couldn't have changed. >> + */ >> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, >> &entry_failure_code)) >> + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL); >> + >> + if (!enable_ept) >> + vcpu->arch.walk_mmu->inject_page_fault =3D >> kvm_inject_page_fault; >> +} >> + >> /* >> * A part of what we need to when the nested L2 guest exits and we wan= t >> to >> * run its L1 parent, is to reset L1's guest state to the host state >> specified >> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct kvm_vc= pu >> *vcpu, >> struct vmcs12 *vmcs12) >> { >> struct kvm_segment seg; >> - u32 entry_failure_code; >> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) >> vcpu->arch.efer =3D vmcs12->host_ia32_efer; >> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct >> kvm_vcpu *vcpu, >> vcpu->arch.cr4_guest_owned_bits =3D >> ~vmcs_readl(CR4_GUEST_HOST_MASK); >> vmx_set_cr4(vcpu, vmcs12->host_cr4); >> - nested_ept_uninit_mmu_context(vcpu); >> - >> - /* >> - * Only PDPTE load can fail as the value of cr3 was checked on >> entry and >> - * couldn't have changed. >> - */ >> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, >> &entry_failure_code)) >> - nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL); >> - >> - if (!enable_ept) >> - vcpu->arch.walk_mmu->inject_page_fault =3D >> kvm_inject_page_fault; >> + load_vmcs12_mmu_host_state(vcpu, vmcs12); >> if (enable_vpid) { >> /* >> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu >> *vcpu, u32 exit_reason, >> * accordingly. >> */ >> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); >> + >> + load_vmcs12_mmu_host_state(vcpu, vmcs12); >> + >> /* >> * The emulated instruction was already skipped in >> * nested_vmx_run, but the updated RIP was never > > Reviewed-by: Krish Sadhukhan From 1583091952337334990@xxx Sat Nov 04 00:08:55 +0000 2017 X-GM-THRID: 1583003100143951555 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread