Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751450AbbDHGrl (ORCPT ); Wed, 8 Apr 2015 02:47:41 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:37441 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751030AbbDHGrj (ORCPT ); Wed, 8 Apr 2015 02:47:39 -0400 Date: Wed, 8 Apr 2015 08:47:34 +0200 From: Ingo Molnar To: Chris J Arges Cc: Linus Torvalds , Rafael David Tinoco , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , the arch/x86 maintainers Subject: Re: [PATCH] smp/call: Detect stuck CSD locks Message-ID: <20150408064734.GA26861@gmail.com> References: <20150402182607.GA8896@gmail.com> <551D8FAF.5070805@canonical.com> <20150402190725.GA10570@gmail.com> <551DB0E2.1020607@canonical.com> <20150403054320.GA9863@gmail.com> <5522BB49.5060704@canonical.com> <20150407092121.GA9971@gmail.com> <20150407205945.GA28212@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150407205945.GA28212@canonical.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4845 Lines: 100 * Chris J Arges wrote: > Ingo, > > Looks like sched_clock() works in this case. > > Adding the dump_stack() line caused various issues such as the VM > oopsing on boot or the softlockup never being detected properly (and > thus not crashing). So the below is running with your patch and > 'dump_stack()' commented out. > > Here is the log leading up to the soft lockup (I adjusted CSD_LOCK_TIMEOUT to 5s): > [ 22.669630] kvm [1523]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xffff > [ 38.712710] csd: Detected non-responsive CSD lock (#1) on CPU#00, waiting 5.000 secs for CPU#01 > [ 38.712715] csd: Re-sending CSD lock (#1) IPI from CPU#00 to CPU#01 > [ 43.712709] csd: Detected non-responsive CSD lock (#2) on CPU#00, waiting 5.000 secs for CPU#01 > [ 43.712713] csd: Re-sending CSD lock (#2) IPI from CPU#00 to CPU#01 > [ 48.712708] csd: Detected non-responsive CSD lock (#3) on CPU#00, waiting 5.000 secs for CPU#01 > [ 48.712732] csd: Re-sending CSD lock (#3) IPI from CPU#00 to CPU#01 > [ 53.712708] csd: Detected non-responsive CSD lock (#4) on CPU#00, waiting 5.000 secs for CPU#01 > [ 53.712712] csd: Re-sending CSD lock (#4) IPI from CPU#00 to CPU#01 > [ 58.712707] csd: Detected non-responsive CSD lock (#5) on CPU#00, waiting 5.000 secs for CPU#01 > [ 58.712712] csd: Re-sending CSD lock (#5) IPI from CPU#00 to CPU#01 > [ 60.080005] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ksmd:26] > > Still we never seem to release the lock, even when resending the IPI. So it would be really important to see the stack dump of CPU#0 at this point, and also do an APIC state dump of it. Because from previous dumps it appeared that the 'stuck' CPU was just in its idle loop - which is 'impossible' as the idle loop should still allow APIC irqs arriving. This behavior can only happen if: - CPU#0 has irqs disabled perpetually. A dump of CPU#0 should tell us where it's executing. This has actually a fair chance to be the case as it actually happened in a fair number of bugs in the past, but I thought from past dumps you guys provided that this possibility was excluded ... but it merits re-examination with the debug patches applied. - the APIC on CPU#0 is unacked and has queued up so many IPIs that it starts rejecting them. I'm not sure that's even possible on KVM though. I'm not sure that's even possible on KVM, unless part of the hardware virtualizes the APIC. One other thing that talks against this scenario is that NMIs appear to be reaching through to CPU#0: the crash dumps and dump-on-all-cpus NMI callbacks worked fine. - the APIC on CPU#0 is in some weird state well outside of its Linux programming model (TPR set wrong, etc. etc.). There's literally a myriad of ways an APIC can be configured to not receive IPIs: but I've never actually seen this happen under Linux, as it needs complicated writes to specialized APIC registers, and we don't actually reconfigure the APIC in any serious fashion aside bootup. Low likelihood but not impossible. Again, NMIs reaching through make this situation less likely. - CPU#0 having a bad IDT and essentially ignoring certain IPIs. This presumes some serious but very targeted memory corruption. Lowest likelihood. - ... other failure modes that elude me. Neither of the scenarios above strike me as particularly plausible - but something must be causing the lockup, so ... In any case, something got seriously messed up on CPU#0, and stays messed up during the lockup, and it would help a lot figuring out exactly what, by further examining its state. Note, it might also be useful to dump KVM's state of the APIC of CPU#0, to see why _it_ isn't sending (and injecting) the lapic IRQ into CPU#0. By all means it should. [Maybe take a look at CPU#1 as well, to make sure the IPI was actually generated.] It should be much easier to figure this out on the KVM side than on the native hardware side, which emulates the lapic to a large degree, so we can see 'hardware state' directly. If we are lucky then the KVM problem mirrors the native hardware problem. Btw., it might also be helpful to try to turn off hardware assisted APIC virtualization on the KVM side, to make the APIC purely software emulated. If this makes the bug go away magically then this raises the likelihood that the bug is really hardware APIC related. I don't know what the magic incantation is to make 'pure software APIC' happen on KVM and Qemu though. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/