Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753577AbbDAOdK (ORCPT ); Wed, 1 Apr 2015 10:33:10 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:44711 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752850AbbDAOdH (ORCPT ); Wed, 1 Apr 2015 10:33:07 -0400 Date: Wed, 1 Apr 2015 09:32:36 -0500 From: Chris J Arges To: Linus Torvalds Cc: Rafael David Tinoco , Ingo Molnar , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , the arch/x86 maintainers Subject: Re: smp_call_function_single lockups Message-ID: <20150401143236.GB12730@canonical.com> References: <20150331031536.GA9303@canonical.com> <20150331222327.GA12512@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2322 Lines: 62 On Tue, Mar 31, 2015 at 04:07:32PM -0700, Linus Torvalds wrote: > On Tue, Mar 31, 2015 at 3:23 PM, Chris J Arges > wrote: > > > > I had a few runs with your patch plus modifications, and got the following > > results (modified patch inlined below): > > Ok, thanks. > > > [ 14.423916] ack_APIC_irq: vector = d1, irq = ffffffff > > [ 176.060005] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:1630] > > > > [ 17.995298] ack_APIC_irq: vector = d1, irq = ffffffff > > [ 182.993828] ack_APIC_irq: vector = e1, irq = ffffffff > > [ 202.919691] ack_APIC_irq: vector = 22, irq = ffffffff > > [ 484.132006] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:1586] > > > > [ 15.592032] ack_APIC_irq: vector = d1, irq = ffffffff > > [ 304.993490] ack_APIC_irq: vector = e1, irq = ffffffff > > [ 315.174755] ack_APIC_irq: vector = 22, irq = ffffffff > > [ 360.108007] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ksmd:26] > .. snip snip .. > > So yeah, that's VECTOR_UNDEFINED, and while it could happen as part of > irq setup, I'm not seeing that being something that your load should > trigger. > > It could also obviously just be the vector being somehow corrupted, > either due to crazy hardware or software. > > But quite frankly, the most likely reason is that whole irq vector movement. > > Especially since it sounds from your other email that when you apply > Ingo's patches, the ack_APIC_irq warnings go away. Is that correct? Or > did you just grep for "move" in the messages? > > If you do get both movement messages (from Info's patch) _and_ the > ack_APIC_irq warnings (from mine), it would be interesting to see if > the vectors line up somehow.. > > Linus > Linus, I included the full patch in reply to Ingo's email, and when running with that I no longer get the ack_APIC_irq WARNs. My next homework assignments are: - Testing with irqbalance disabled - Testing w/ the appropriate dump_stack() in Ingo's patch - L0 testing Thanks, --chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/