Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753223AbbDAMjV (ORCPT ); Wed, 1 Apr 2015 08:39:21 -0400 Received: from mail-wg0-f48.google.com ([74.125.82.48]:35819 "EHLO mail-wg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752930AbbDAMjS (ORCPT ); Wed, 1 Apr 2015 08:39:18 -0400 Date: Wed, 1 Apr 2015 14:39:13 +0200 From: Ingo Molnar To: Chris J Arges Cc: Linus Torvalds , Rafael David Tinoco , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , the arch/x86 maintainers Subject: Re: [debug PATCHes] Re: smp_call_function_single lockups Message-ID: <20150401123913.GA12841@gmail.com> References: <20150331031536.GA9303@canonical.com> <20150331105656.GA25180@gmail.com> <20150331223800.GB12512@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150331223800.GB12512@canonical.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4197 Lines: 76 * Chris J Arges wrote: > This was only tested only on the L1, so I can put this on the L0 host and run > this as well. The results: > > [ 124.897002] apic: vector c1, new-domain move in progress > [ 124.954827] apic: vector d1, sent cleanup vector, move completed > [ 163.477270] apic: vector d1, new-domain move in progress > [ 164.041938] apic: vector e1, sent cleanup vector, move completed > [ 213.466971] apic: vector e1, new-domain move in progress > [ 213.775639] apic: vector 22, sent cleanup vector, move completed > [ 365.996747] apic: vector 22, new-domain move in progress > [ 366.011136] apic: vector 42, sent cleanup vector, move completed > [ 393.836032] apic: vector 42, new-domain move in progress > [ 393.837727] apic: vector 52, sent cleanup vector, move completed > [ 454.977514] apic: vector 52, new-domain move in progress > [ 454.978880] apic: vector 62, sent cleanup vector, move completed > [ 467.055730] apic: vector 62, new-domain move in progress > [ 467.058129] apic: vector 72, sent cleanup vector, move completed > [ 545.280125] apic: vector 72, new-domain move in progress > [ 545.282801] apic: vector 82, sent cleanup vector, move completed > [ 567.631652] apic: vector 82, new-domain move in progress > [ 567.632207] apic: vector 92, sent cleanup vector, move completed > [ 628.940638] apic: vector 92, new-domain move in progress > [ 628.965274] apic: vector a2, sent cleanup vector, move completed > [ 635.187433] apic: vector a2, new-domain move in progress > [ 635.191643] apic: vector b2, sent cleanup vector, move completed > [ 673.548020] apic: vector b2, new-domain move in progress > [ 673.553843] apic: vector c2, sent cleanup vector, move completed > [ 688.221906] apic: vector c2, new-domain move in progress > [ 688.229487] apic: vector d2, sent cleanup vector, move completed > [ 723.818916] apic: vector d2, new-domain move in progress > [ 723.828970] apic: vector e2, sent cleanup vector, move completed > [ 733.485435] apic: vector e2, new-domain move in progress > [ 733.615007] apic: vector 23, sent cleanup vector, move completed > [ 824.092036] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ksmd:26] Are these all the messages? Looks like Linus's warnings went away, or did you filter them out? But ... the affinity setting message does not appear to trigger, and that's the only real race I can see in the code. Also, the frequency of these messages appears to be low, while the race window is narrow. So I'm not sure the problem is related to the irq-move mechanism. One thing that appears to be weird: why is there irq-movement activity to begin with? Is something changing irq-affinities? Could you put a dump_stack() into the call? Something like the patch below, in addition to all patches so far. (if it conflicts with the previous debugging patches then just add the code manually to after the debug printout.) Thanks, Ingo diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 6cedd7914581..79d6de6fdf0a 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -144,6 +144,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) cfg->move_in_progress = cpumask_intersects(cfg->old_domain, cpu_online_mask); cpumask_and(cfg->domain, cfg->domain, tmp_mask); + if (cfg->move_in_progress) + dump_stack(); break; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/