Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761127AbXFSSE0 (ORCPT ); Tue, 19 Jun 2007 14:04:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760035AbXFSSET (ORCPT ); Tue, 19 Jun 2007 14:04:19 -0400 Received: from mga03.intel.com ([143.182.124.21]:21657 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760074AbXFSSES (ORCPT ); Tue, 19 Jun 2007 14:04:18 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.16,439,1175497200"; d="scan'208";a="240924261" Date: Tue, 19 Jun 2007 11:00:03 -0700 From: "Siddha, Suresh B" To: "Eric W. Biederman" Cc: "Darrick J. Wong" , "Siddha, Suresh B" , linux-kernel@vger.kernel.org Subject: Re: Device hang when offlining a CPU due to IRQ misrouting Message-ID: <20070619180003.GE7160@linux-os.sc.intel.com> References: <20070605235707.GB16074@tree.beaverton.ibm.com> <20070606013759.GI17143@linux-os.sc.intel.com> <20070606185829.GA26062@tree.beaverton.ibm.com> <20070606193514.GN17143@linux-os.sc.intel.com> <20070606231642.GH13751@tree.beaverton.ibm.com> <20070608005726.GO17143@linux-os.sc.intel.com> <20070618223819.GD9751@tree.beaverton.ibm.com> <20070618235434.GB7160@linux-os.sc.intel.com> <20070619005136.GF9751@tree.beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2802 Lines: 80 On Tue, Jun 19, 2007 at 11:54:45AM -0600, Eric W. Biederman wrote: > "Darrick J. Wong" writes: > > > On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: > > > >> > > >> > [ 256.298787] irq=4341 affinity=d > >> > > >> > >> And just to make sure, at this point, your MSI irq 4341 affinity > >> (/proc/irq/4341/smp_affinity) still points to '2'? > > > > Actually, it's 0xD. From the kernel's perspective the mask has been > > updated (and I even stuck a printk into set_msi_irq_affinity to verify > > that the writes are happening) but ... the hardware doesn't seem to > > reflect this. I also tried putting read_msi_msg right afterwards to > > compare contents, though it complained about all the MSIs _except_ for > > 4341. (Of course, I could just be way off on the effectiveness of > > that.) > > The fact that MSI interrupts are having problems is odd. It is possible > that we still have a bug in there somewhere but msi interrupts should > be safe to migrate outside of irq context (no known hardware bugs). > As we can actually synchronize with the irq source and eliminate all > of the migration races. > > The non-msi case requires hitting a hardware race that is rare enough > you should not normally have problems. Yep. But Darrick's seems to say, problem happens consistently. Anyhow, Darrick there is a general bug in this area, can you try this and see if it helps? diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c index 3eaceac..a0e11c9 100644 --- a/arch/x86_64/kernel/irq.c +++ b/arch/x86_64/kernel/irq.c @@ -144,17 +144,35 @@ void fixup_irqs(cpumask_t map) for (irq = 0; irq < NR_IRQS; irq++) { cpumask_t mask; + int break_affinity = 0; + int set_affinity = 1; + if (irq == 2) continue; + /* irq's are disabled at this point */ + spin_lock(&irq_desc[irq].lock); + cpus_and(mask, irq_desc[irq].affinity, map); if (any_online_cpu(mask) == NR_CPUS) { - printk("Breaking affinity for irq %i\n", irq); + break_affinity = 1; mask = map; } + + irq_desc[irq].chip->mask(irq); + if (irq_desc[irq].chip->set_affinity) irq_desc[irq].chip->set_affinity(irq, mask); else if (irq_desc[irq].action && !(warned++)) + set_affinity = 0; + + irq_desc[irq].chip->unmask(irq); + + spin_unlock(&irq_desc[irq].lock); + + if (break_affinity && set_affinity) + printk("Broke affinity for irq %i\n", irq); + else if (!set_affinity) printk("Cannot set affinity for irq %i\n", irq); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/