Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753367AbbDAMnn (ORCPT ); Wed, 1 Apr 2015 08:43:43 -0400 Received: from mail-wg0-f48.google.com ([74.125.82.48]:34307 "EHLO mail-wg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751668AbbDAMnl (ORCPT ); Wed, 1 Apr 2015 08:43:41 -0400 Date: Wed, 1 Apr 2015 14:43:36 +0200 From: Ingo Molnar To: Chris J Arges Cc: Linus Torvalds , Rafael David Tinoco , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , the arch/x86 maintainers Subject: Re: smp_call_function_single lockups Message-ID: <20150401124336.GB12841@gmail.com> References: <20150331031536.GA9303@canonical.com> <20150331222327.GA12512@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150331222327.GA12512@canonical.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2754 Lines: 63 * Chris J Arges wrote: > Linus, > > I had a few runs with your patch plus modifications, and got the following > results (modified patch inlined below): > > [ 14.423916] ack_APIC_irq: vector = d1, irq = ffffffff > [ 176.060005] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:1630] > > [ 17.995298] ack_APIC_irq: vector = d1, irq = ffffffff > [ 182.993828] ack_APIC_irq: vector = e1, irq = ffffffff > [ 202.919691] ack_APIC_irq: vector = 22, irq = ffffffff > [ 484.132006] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:1586] > > [ 15.592032] ack_APIC_irq: vector = d1, irq = ffffffff > [ 304.993490] ack_APIC_irq: vector = e1, irq = ffffffff > [ 315.174755] ack_APIC_irq: vector = 22, irq = ffffffff > [ 360.108007] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ksmd:26] > > [ 15.026077] ack_APIC_irq: vector = b1, irq = ffffffff > [ 374.828531] ack_APIC_irq: vector = c1, irq = ffffffff > [ 402.965942] ack_APIC_irq: vector = d1, irq = ffffffff > [ 434.540814] ack_APIC_irq: vector = e1, irq = ffffffff > [ 461.820768] ack_APIC_irq: vector = 22, irq = ffffffff > [ 536.120027] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:4243] > > [ 17.889334] ack_APIC_irq: vector = d1, irq = ffffffff > [ 291.888784] ack_APIC_irq: vector = e1, irq = ffffffff > [ 297.824627] ack_APIC_irq: vector = 22, irq = ffffffff > [ 336.960594] ack_APIC_irq: vector = 42, irq = ffffffff > [ 367.012706] ack_APIC_irq: vector = 52, irq = ffffffff > [ 377.025090] ack_APIC_irq: vector = 62, irq = ffffffff > [ 417.088773] ack_APIC_irq: vector = 72, irq = ffffffff > [ 447.136788] ack_APIC_irq: vector = 82, irq = ffffffff > -- stopped it since it wasn't reproducing / I was impatient -- > > So I'm seeing irq == VECTOR_UNDEFINED in all of these cases. Making > (vector >= 0) didn't seem to expose any additional vectors. So, these vectors do seem to be lining up with the pattern of how new irq vectors get assigned and how we slowly rotate through all available ones. The VECTOR_UNDEFINED might correspond to the fact that we already 'freed' that vector, as part of the irq-move mechanism - but it was likely in use shortly before. So the irq-move code is not off the hook, to the contrary. Have you already tested whether the hang goes away if you remove irq-affinity fiddling daemons from the system? Do you have irqbalance installed or similar mechanisms? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/