Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752605AbdHOPV6 (ORCPT ); Tue, 15 Aug 2017 11:21:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58502 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752175AbdHOPV5 (ORCPT ); Tue, 15 Aug 2017 11:21:57 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 7B91D5F742 Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=rkrcmar@redhat.com Date: Tue, 15 Aug 2017 17:21:52 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Wanpeng Li Cc: Paolo Bonzini , "linux-kernel@vger.kernel.org" , kvm , Wanpeng Li Subject: Re: [PATCH] KVM: MMU: Fix softlockup due to mmu_lock is held too long Message-ID: <20170815152152.GA6408@flask> References: <1502373351-9304-1-git-send-email-wanpeng.li@hotmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 15 Aug 2017 15:21:57 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2637 Lines: 53 2017-08-12 05:51+0800, Wanpeng Li: > 2017-08-10 22:36 GMT+08:00 Paolo Bonzini : > > On 10/08/2017 15:55, Wanpeng Li wrote: > >> From: Wanpeng Li > >> > >> watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089] > >> irq event stamp: 20532 > >> hardirqs last enabled at (20531): [] restore_regs_and_iret+0x0/0x1d > >> hardirqs last disabled at (20532): [] apic_timer_interrupt+0x98/0xb0 > >> softirqs last enabled at (8266): [] __do_softirq+0x206/0x4c1 > >> softirqs last disabled at (8253): [] irq_exit+0xf8/0x100 > >> CPU: 5 PID: 3089 Comm: warn_test Tainted: G OE 4.13.0-rc3+ #8 > >> RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm] > >> Call Trace: > >> make_mmu_pages_available.isra.120+0x71/0xc0 [kvm] > >> kvm_mmu_load+0x1cf/0x410 [kvm] > >> kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm] > >> kvm_vcpu_ioctl+0x340/0x700 [kvm] > >> ? kvm_vcpu_ioctl+0x340/0x700 [kvm] > >> ? __fget+0xfc/0x210 > >> do_vfs_ioctl+0xa4/0x6a0 > >> ? __fget+0x11d/0x210 > >> SyS_ioctl+0x79/0x90 > >> entry_SYSCALL_64_fastpath+0x23/0xc2 > >> ? __this_cpu_preempt_check+0x13/0x20 > >> > >> This can be reproduced readily by ept=N and running syzkaller tests since > >> many syzkaller testcases don't setup any memory regions. However, if ept=Y > >> rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will > >> extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES > >> which just hide the issue. > >> > >> I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1, > >> so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails > >> to zap any pages, however prepare_zap_oldest_mmu_page() always returns true. > >> It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock > >> softlockup. > >> > >> This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page() > >> according to whether or not there is mmu page zapped. In addition, we bail out > >> immediately if there is no available mmu page to alloc root page. > > > > Nice! > > > > But I think all callers of make_mmu_pages_available should be handled > > the same way. I'm committing the first hunk for now. In the meanwhile, > > I saw the commit "KVM: MMU: Fix softlockup due to infinite loop" is > lost from kvm/queue? Ah, I found it on an old snapshot. Paolo made changes to the commit message and the same code change is now called "KVM: MMU: Fix softlockup due to mmu_lock is held too long".