Received: by 10.223.164.202 with SMTP id h10csp286486wrb; Wed, 8 Nov 2017 16:38:07 -0800 (PST) X-Google-Smtp-Source: ABhQp+RkKGe2wGni0UHVpPXKnnV0lQgTMqou933ZDRoRnyetd7hhazfmNedzkdJuJy9vqEVdBktX X-Received: by 10.98.130.201 with SMTP id w192mr393207pfd.98.1510187887859; Wed, 08 Nov 2017 16:38:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510187887; cv=none; d=google.com; s=arc-20160816; b=iR4FUodXsq1jrL/HR+DoatYs0lITHAjXECEAe98rzjl9UZxj6vATddMMWvhCtficiC NRJxa0GkaD491ofAX7VMEXqVQYwZ/v/kpqWSzDkktGN5YF7WN/gaDSPCzgtvMTQNcu79 WYFA4jcJRALEArGVUlKtFTLBOzWGzeXzcUgPmxZqns5RW6tZan4VaAlHAU5UCcuTeA+d e67zMUD/ynl5+/iwU1S2mUs7zQ+ZZqJkA6qiCZhr9sBFbuiG+RlJbpa0pZV/AeHxVxov b7XoLK1LR3zd+QQFjTSzRI+coeu2IgK0QndPnWKo8XYJF3RR4DGq30aJ9vOzU53xE6d5 uXBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=euNlA+w2Ju1Sj8RI4Gxc/uK0bgHnB/BnBHvpGqZNa9g=; b=c3UoTWwL0PLl+Iw6pOVAJ2IbVl+hnhSEaPveSHT4weQ7XigXal3wro1ocLRUO+HgTT LPrgxhMqSZ6eTPw4Zkj2R2fSHNAmHLE7vlXy5r8abWAaDHAaFVQhDqLxaZ3Lh/Tclsjh 7ugdebCUnlNhr4vBIvkxUXhmgVMu/4pL5W5E8xtVp722wzO3ELJRj2H7bCEoIVSZGxIN xULQmy8f8lbKTo+rqOYX4Xk19XofSPp0IkRG/PdC3HNCUy2WVjIOHCEJHmCpUeVXU6/i 3UgJnMnZVHb7lfhjrwj/vucexe9CK2gOkWZUOjS3MqoUXdsH77KpQEgG12ED359Tbw/E IHVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=taq4u3F5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e1si4881357ple.116.2017.11.08.16.37.55; Wed, 08 Nov 2017 16:38:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=taq4u3F5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753090AbdKIAhQ (ORCPT + 84 others); Wed, 8 Nov 2017 19:37:16 -0500 Received: from mail-ot0-f194.google.com ([74.125.82.194]:53286 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752622AbdKIAhO (ORCPT ); Wed, 8 Nov 2017 19:37:14 -0500 Received: by mail-ot0-f194.google.com with SMTP id n19so3552493ote.10; Wed, 08 Nov 2017 16:37:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=euNlA+w2Ju1Sj8RI4Gxc/uK0bgHnB/BnBHvpGqZNa9g=; b=taq4u3F5eP0LBKhLOCME6bB7D5zygxLaqCjp0JiVT4qTfvNarKC3YXZzgSNrBa1f8K gj6Gr2JOROC+jdNHjk9ZArFY8z1DSe206YhrJaTw1rTnFCj7fkLZ2AeVQ7TGxnKigi9r ibYgNTe7Ujio0A5OHHuWGQ8ilwPw4aeariYMWd8FSoIpixtrXVWDQhPDO+K85GqO2x1c hPQ5CDm5RUvL0pYN78GYqXweKv0SYewk8ReQGW/svC4AQq2Apvl71HW9fDMw8iq0Xg6z dv0+HNXLyZCAyHzrsoZmuXHDvxLDJ8idn68UK/wAO24JiTP3215RekrTLcoz9+cuVwy5 iXXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=euNlA+w2Ju1Sj8RI4Gxc/uK0bgHnB/BnBHvpGqZNa9g=; b=DZ8hz5cRfBbRH1NppegO+ehyb99wnPVppdM+/hpg+32EaNqIEZ8Bsq0XzJjf0z8bv/ qk1vz8rpMNqGXyPnzAI8R+wGWf/XvHYaI967lcqCBdErconpEbl/tsLCBILimVOhME44 oD0a3sYrADU7rZc04XtXzDZGxtd3mQ6ZeEOi80Kn3WKI3SBV4Fz1RLq87TCSKikVAFnW nlqdvbbjuK6O/OKGB8+isG8ldYFtEdEQGeEJdSaHbJZYnR6uLr7rVIHiNNr0H2zoHGkw zPkcCBnJ09TudiBWL84l5ZqkJim6+EcUgXVjqBuY4K6v1KKixPcfvG2F2EIIS2uakESi rE1g== X-Gm-Message-State: AJaThX6XBTb94H2Q17ninYiecmS5zZci1Ya5tBeIfxgoht99j5sfMteL U2cF4fegNHUE82jlU+KAsxkUOqZLP8rdNShLLMA= X-Received: by 10.157.47.71 with SMTP id h65mr1326385otb.62.1510187833634; Wed, 08 Nov 2017 16:37:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.74.53.27 with HTTP; Wed, 8 Nov 2017 16:37:13 -0800 (PST) In-Reply-To: References: <1509670249-4907-1-git-send-email-wanpeng.li@hotmail.com> <1509670249-4907-3-git-send-email-wanpeng.li@hotmail.com> <50b82c53-1e57-88a9-25bd-76697bf2d048@oracle.com> From: Wanpeng Li Date: Thu, 9 Nov 2017 08:37:13 +0800 Message-ID: Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure To: Jim Mattson Cc: Krish Sadhukhan , LKML , kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2017-11-09 5:47 GMT+08:00 Jim Mattson : > I realize now that there are actually many other problems with > deferring some control field checks to the hardware VM-entry of > vmcs02. When there is an invalid control field, the vCPU should just > fall through to the next instruction, without any state modifiation > other than the ALU flags and the VM-instruction error field of the > current VMCS. However, in preparation for the hardware VM-entry of > vmcs02, we have already changed quite a bit of the vCPU state: the > MSRs on the VM-entry MSR-load list, DR7, IA32_DEBUGCTL, the entire > FLAGS register, etc. All of these changes should be undone, and we're > not prepared to do that. (For instance, what was the old DR7 value > that needs to be restored?) I didn't observe real issue currently, and I hope this patchset can catch the upcoming merge window. Then we can dig more into your concern. Regards, Wanpeng Li > > On Fri, Nov 3, 2017 at 5:07 PM, Krish Sadhukhan > wrote: >> On 11/02/2017 05:50 PM, Wanpeng Li wrote: >> >>> From: Wanpeng Li >>> >>> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME >>> failure >>> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls= in >>> L1) >>> null pointer deference and also L0 calltrace when EPT=3D0 on both L0 an= d L1. >>> >>> In L1: >>> >>> BUG: unable to handle kernel paging request at ffffffffc015bf8f >>> IP: vmx_vcpu_run+0x202/0x510 [kvm_intel] >>> PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161 >>> Oops: 0003 [#1] PREEMPT SMP >>> CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6 >>> RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel] >>> Call Trace: >>> WARNING: kernel stack frame pointer at ffffb86f4988bc18 in >>> qemu-system-x86:1798 has bad value 0000000000000002 >>> >>> In L0: >>> >>> -----------[ cut here ]------------ >>> WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9= 845 >>> vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] >>> CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE >>> 4.14.0-rc7+ #25 >>> RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] >>> Call Trace: >>> paging64_page_fault+0x500/0xde0 [kvm] >>> ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm] >>> ? nonpaging_page_fault+0x3b0/0x3b0 [kvm] >>> ? __asan_storeN+0x12/0x20 >>> ? paging64_gva_to_gpa+0xb0/0x120 [kvm] >>> ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm] >>> ? lock_acquire+0x2c0/0x2c0 >>> ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel] >>> ? vmx_get_segment+0x2a6/0x310 [kvm_intel] >>> ? sched_clock+0x1f/0x30 >>> ? check_chain_key+0x137/0x1e0 >>> ? __lock_acquire+0x83c/0x2420 >>> ? kvm_multiple_exception+0xf2/0x220 [kvm] >>> ? debug_check_no_locks_freed+0x240/0x240 >>> ? debug_smp_processor_id+0x17/0x20 >>> ? __lock_is_held+0x9e/0x100 >>> kvm_mmu_page_fault+0x90/0x180 [kvm] >>> kvm_handle_page_fault+0x15c/0x310 [kvm] >>> ? __lock_is_held+0x9e/0x100 >>> handle_exception+0x3c7/0x4d0 [kvm_intel] >>> vmx_handle_exit+0x103/0x1010 [kvm_intel] >>> ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm] >>> >>> The commit avoids to load host state of vmcs12 as vmcs01's guest state >>> since vmcs12 is not modified (except for the VM-instruction error field= ) >>> if the checking of vmcs control area fails. However, the mmu context is >>> switched to nested mmu in prepare_vmcs02() and it will not be reloaded >>> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME >>> fails. This patch fixes it by reloading mmu context when nested >>> VMLAUNCH/VMRESUME fails. >>> >>> Cc: Paolo Bonzini >>> Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 >>> Cc: Jim Mattson >>> Signed-off-by: Wanpeng Li >>> --- >>> v3 -> v4: >>> * move it to a new function load_vmcs12_mmu_host_state >>> >>> arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------ >>> 1 file changed, 22 insertions(+), 12 deletions(-) >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index 6cf3972..8aefb91 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu *vc= pu, >>> struct vmcs12 *vmcs12, >>> kvm_clear_interrupt_queue(vcpu); >>> } >>> +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu, >>> + struct vmcs12 *vmcs12) >>> +{ >>> + u32 entry_failure_code; >>> + >>> + nested_ept_uninit_mmu_context(vcpu); >>> + >>> + /* >>> + * Only PDPTE load can fail as the value of cr3 was checked on >>> entry and >>> + * couldn't have changed. >>> + */ >>> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, >>> &entry_failure_code)) >>> + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL); >>> + >>> + if (!enable_ept) >>> + vcpu->arch.walk_mmu->inject_page_fault =3D >>> kvm_inject_page_fault; >>> +} >>> + >>> /* >>> * A part of what we need to when the nested L2 guest exits and we wa= nt >>> to >>> * run its L1 parent, is to reset L1's guest state to the host state >>> specified >>> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct kvm_v= cpu >>> *vcpu, >>> struct vmcs12 *vmcs12) >>> { >>> struct kvm_segment seg; >>> - u32 entry_failure_code; >>> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) >>> vcpu->arch.efer =3D vmcs12->host_ia32_efer; >>> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct >>> kvm_vcpu *vcpu, >>> vcpu->arch.cr4_guest_owned_bits =3D >>> ~vmcs_readl(CR4_GUEST_HOST_MASK); >>> vmx_set_cr4(vcpu, vmcs12->host_cr4); >>> - nested_ept_uninit_mmu_context(vcpu); >>> - >>> - /* >>> - * Only PDPTE load can fail as the value of cr3 was checked on >>> entry and >>> - * couldn't have changed. >>> - */ >>> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, >>> &entry_failure_code)) >>> - nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL); >>> - >>> - if (!enable_ept) >>> - vcpu->arch.walk_mmu->inject_page_fault =3D >>> kvm_inject_page_fault; >>> + load_vmcs12_mmu_host_state(vcpu, vmcs12); >>> if (enable_vpid) { >>> /* >>> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu >>> *vcpu, u32 exit_reason, >>> * accordingly. >>> */ >>> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); >>> + >>> + load_vmcs12_mmu_host_state(vcpu, vmcs12); >>> + >>> /* >>> * The emulated instruction was already skipped in >>> * nested_vmx_run, but the updated RIP was never >> >> Reviewed-by: Krish Sadhukhan From 1583536099697049521@xxx Wed Nov 08 21:48:27 +0000 2017 X-GM-THRID: 1583003100143951555 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread