Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753324AbdHJVqm (ORCPT ); Thu, 10 Aug 2017 17:46:42 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:36730 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752281AbdHJVqj (ORCPT ); Thu, 10 Aug 2017 17:46:39 -0400 MIME-Version: 1.0 In-Reply-To: References: <1502373351-9304-1-git-send-email-wanpeng.li@hotmail.com> From: Wanpeng Li Date: Fri, 11 Aug 2017 05:46:38 +0800 Message-ID: Subject: Re: [PATCH] KVM: MMU: Fix softlockup due to mmu_lock is held too long To: Paolo Bonzini Cc: "linux-kernel@vger.kernel.org" , kvm , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v7ALkldR014697 Content-Length: 4969 Lines: 114 2017-08-10 22:36 GMT+08:00 Paolo Bonzini : > On 10/08/2017 15:55, Wanpeng Li wrote: >> From: Wanpeng Li >> >> watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089] >> irq event stamp: 20532 >> hardirqs last enabled at (20531): [] restore_regs_and_iret+0x0/0x1d >> hardirqs last disabled at (20532): [] apic_timer_interrupt+0x98/0xb0 >> softirqs last enabled at (8266): [] __do_softirq+0x206/0x4c1 >> softirqs last disabled at (8253): [] irq_exit+0xf8/0x100 >> CPU: 5 PID: 3089 Comm: warn_test Tainted: G OE 4.13.0-rc3+ #8 >> RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm] >> Call Trace: >> make_mmu_pages_available.isra.120+0x71/0xc0 [kvm] >> kvm_mmu_load+0x1cf/0x410 [kvm] >> kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm] >> kvm_vcpu_ioctl+0x340/0x700 [kvm] >> ? kvm_vcpu_ioctl+0x340/0x700 [kvm] >> ? __fget+0xfc/0x210 >> do_vfs_ioctl+0xa4/0x6a0 >> ? __fget+0x11d/0x210 >> SyS_ioctl+0x79/0x90 >> entry_SYSCALL_64_fastpath+0x23/0xc2 >> ? __this_cpu_preempt_check+0x13/0x20 >> >> This can be reproduced readily by ept=N and running syzkaller tests since >> many syzkaller testcases don't setup any memory regions. However, if ept=Y >> rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will >> extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES >> which just hide the issue. >> >> I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1, >> so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails >> to zap any pages, however prepare_zap_oldest_mmu_page() always returns true. >> It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock >> softlockup. >> >> This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page() >> according to whether or not there is mmu page zapped. In addition, we bail out >> immediately if there is no available mmu page to alloc root page. > > Nice! > > But I think all callers of make_mmu_pages_available should be handled > the same way. I'm committing the first hunk for now. In the meanwhile, > can you look into returning -ENOSPC from make_mmu_pages_available if > !kvm_mmu_available_pages after zapping the pages? Good point. :) Regards, Wanpeng Li > > Thanks, > > Paolo >> Cc: Paolo Bonzini >> Cc: Radim Krčmář >> Signed-off-by: Wanpeng Li >> --- >> arch/x86/kvm/mmu.c | 16 +++++++++++++--- >> 1 file changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 9b1dd11..b9897e8 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -2608,9 +2608,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm, >> >> sp = list_last_entry(&kvm->arch.active_mmu_pages, >> struct kvm_mmu_page, link); >> - kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); >> - >> - return true; >> + return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); >> } >> >> /* >> @@ -3379,6 +3377,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) >> if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { >> spin_lock(&vcpu->kvm->mmu_lock); >> make_mmu_pages_available(vcpu); >> + if (!kvm_mmu_available_pages(vcpu->kvm)) { >> + spin_unlock(&vcpu->kvm->mmu_lock); >> + return 1; >> + } >> sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); >> ++sp->root_count; >> spin_unlock(&vcpu->kvm->mmu_lock); >> @@ -3390,6 +3392,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) >> MMU_WARN_ON(VALID_PAGE(root)); >> spin_lock(&vcpu->kvm->mmu_lock); >> make_mmu_pages_available(vcpu); >> + if (!kvm_mmu_available_pages(vcpu->kvm)) { >> + spin_unlock(&vcpu->kvm->mmu_lock); >> + return 1; >> + } >> sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), >> i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); >> root = __pa(sp->spt); >> @@ -3427,6 +3433,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) >> >> spin_lock(&vcpu->kvm->mmu_lock); >> make_mmu_pages_available(vcpu); >> + if (!kvm_mmu_available_pages(vcpu->kvm)) { >> + spin_unlock(&vcpu->kvm->mmu_lock); >> + return 1; >> + } >> sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, >> 0, ACC_ALL); >> root = __pa(sp->spt); >> >