Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751765AbdG1B2y (ORCPT ); Thu, 27 Jul 2017 21:28:54 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:37721 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751745AbdG1B2w (ORCPT ); Thu, 27 Jul 2017 21:28:52 -0400 MIME-Version: 1.0 In-Reply-To: References: <1501163686-13648-1-git-send-email-pbonzini@redhat.com> From: Wanpeng Li Date: Fri, 28 Jul 2017 09:28:51 +0800 Message-ID: Subject: Re: [PATCH] KVM: nVMX: do not pin the VMCS12 To: David Matlack Cc: Paolo Bonzini , "linux-kernel@vger.kernel.org" , kvm list , Jim Mattson Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6191 Lines: 156 2017-07-28 1:20 GMT+08:00 David Matlack : > On Thu, Jul 27, 2017 at 6:54 AM, Paolo Bonzini wrote: >> Since the current implementation of VMCS12 does a memcpy in and out >> of guest memory, we do not need current_vmcs12 and current_vmcs12_page >> anymore. current_vmptr is enough to read and write the VMCS12. > > This patch also fixes dirty tracking (memslot->dirty_bitmap) of the > VMCS12 page by using kvm_write_guest. nested_release_page() only marks > the struct page dirty. > >> >> Signed-off-by: Paolo Bonzini >> --- >> arch/x86/kvm/vmx.c | 23 ++++++----------------- >> 1 file changed, 6 insertions(+), 17 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index b37161808352..142f16ebdca2 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -416,9 +416,6 @@ struct nested_vmx { >> >> /* The guest-physical address of the current VMCS L1 keeps for L2 */ >> gpa_t current_vmptr; >> - /* The host-usable pointer to the above */ >> - struct page *current_vmcs12_page; >> - struct vmcs12 *current_vmcs12; >> /* >> * Cache of the guest's VMCS, existing outside of guest memory. >> * Loaded from guest memory during VMPTRLD. Flushed to guest >> @@ -7183,10 +7180,6 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx) >> if (vmx->nested.current_vmptr == -1ull) >> return; >> >> - /* current_vmptr and current_vmcs12 are always set/reset together */ >> - if (WARN_ON(vmx->nested.current_vmcs12 == NULL)) >> - return; >> - >> if (enable_shadow_vmcs) { >> /* copy to memory all shadowed fields in case >> they were modified */ >> @@ -7199,13 +7192,11 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx) >> vmx->nested.posted_intr_nv = -1; >> >> /* Flush VMCS12 to guest memory */ >> - memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12, >> - VMCS12_SIZE); >> + kvm_vcpu_write_guest_page(&vmx->vcpu, >> + vmx->nested.current_vmptr >> PAGE_SHIFT, >> + vmx->nested.cached_vmcs12, 0, VMCS12_SIZE); > > Have you hit any "suspicious RCU usage" error messages during VM Yeah, I observe this splat when testing Paolo's patch today. [87214.855344] ============================= [87214.855346] WARNING: suspicious RCU usage [87214.855348] 4.13.0-rc2+ #2 Tainted: G OE [87214.855350] ----------------------------- [87214.855352] ./include/linux/kvm_host.h:573 suspicious rcu_dereference_check() usage! [87214.855353] other info that might help us debug this: [87214.855355] rcu_scheduler_active = 2, debug_locks = 1 [87214.855357] 1 lock held by qemu-system-x86/17059: [87214.855359] #0: (&vcpu->mutex){+.+.+.}, at: [] vcpu_load+0x22/0x80 [kvm] [87214.855396] stack backtrace: [87214.855399] CPU: 3 PID: 17059 Comm: qemu-system-x86 Tainted: G OE 4.13.0-rc2+ #2 [87214.855401] Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 [87214.855403] Call Trace: [87214.855408] dump_stack+0x99/0xce [87214.855413] lockdep_rcu_suspicious+0xc5/0x100 [87214.855423] kvm_vcpu_gfn_to_memslot+0x166/0x180 [kvm] [87214.855432] kvm_vcpu_write_guest_page+0x24/0x50 [kvm] [87214.855438] free_nested.part.76+0x76/0x270 [kvm_intel] [87214.855443] vmx_free_vcpu+0x7a/0xc0 [kvm_intel] [87214.855454] kvm_arch_destroy_vm+0x104/0x1d0 [kvm] [87214.855463] kvm_put_kvm+0x17a/0x2b0 [kvm] [87214.855473] kvm_vm_release+0x21/0x30 [kvm] [87214.855477] __fput+0xfb/0x240 [87214.855482] ____fput+0xe/0x10 [87214.855485] task_work_run+0x7e/0xb0 [87214.855490] do_exit+0x323/0xcf0 [87214.855494] ? get_signal+0x318/0x930 [87214.855498] ? _raw_spin_unlock_irq+0x2c/0x60 [87214.855503] do_group_exit+0x50/0xd0 [87214.855507] get_signal+0x24f/0x930 [87214.855514] do_signal+0x37/0x750 [87214.855518] ? __might_fault+0x3e/0x90 [87214.855523] ? __might_fault+0x85/0x90 [87214.855527] ? exit_to_usermode_loop+0x2b/0x100 [87214.855531] ? __this_cpu_preempt_check+0x13/0x20 [87214.855535] exit_to_usermode_loop+0xab/0x100 [87214.855539] syscall_return_slowpath+0x153/0x160 [87214.855542] entry_SYSCALL_64_fastpath+0xc0/0xc2 [87214.855545] RIP: 0033:0x7ff40d24a26d Regards, Wanpeng Li > teardown with this patch? We did when we replaced memcpy with > kvm_write_guest a while back. IIRC it was due to kvm->srcu not being > held in one of the teardown paths. kvm_write_guest() expects it to be > held in order to access memslots. > > We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send > that patch along with a few other nVMX dirty tracking related patches > I've been meaning to get upstreamed. > >> >> - kunmap(vmx->nested.current_vmcs12_page); >> - nested_release_page(vmx->nested.current_vmcs12_page); >> vmx->nested.current_vmptr = -1ull; >> - vmx->nested.current_vmcs12 = NULL; >> } >> >> /* >> @@ -7623,14 +7614,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) >> } >> >> nested_release_vmcs12(vmx); >> - vmx->nested.current_vmcs12 = new_vmcs12; >> - vmx->nested.current_vmcs12_page = page; >> /* >> * Load VMCS12 from guest memory since it is not already >> * cached. >> */ >> - memcpy(vmx->nested.cached_vmcs12, >> - vmx->nested.current_vmcs12, VMCS12_SIZE); >> + memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE); >> + kunmap(page); > > + nested_release_page_clean(page); > >> + >> set_current_vmptr(vmx, vmptr); >> } >> >> @@ -9354,7 +9344,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) >> >> vmx->nested.posted_intr_nv = -1; >> vmx->nested.current_vmptr = -1ull; >> - vmx->nested.current_vmcs12 = NULL; >> >> vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED; >> >> -- >> 1.8.3.1 >>