Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756228AbZFDQjM (ORCPT ); Thu, 4 Jun 2009 12:39:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753298AbZFDQi6 (ORCPT ); Thu, 4 Jun 2009 12:38:58 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:53947 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753151AbZFDQi5 (ORCPT ); Thu, 4 Jun 2009 12:38:57 -0400 Date: Thu, 4 Jun 2009 09:38:40 -0700 From: Gary Hade To: "Eric W. Biederman" Cc: Gary Hade , mingo@elte.hu, mingo@redhat.com, linux-kernel@vger.kernel.org, tglx@linutronix.de, hpa@zytor.com, x86@kernel.org, yinghai@kernel.org, lcm@us.ibm.com Subject: Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "inactive" device IRQ interrruption Message-ID: <20090604163840.GB7233@us.ibm.com> References: <20090602193202.GB7282@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3586 Lines: 82 On Wed, Jun 03, 2009 at 04:55:26AM -0700, Eric W. Biederman wrote: > Gary Hade writes: > > > Impact: Eliminates a race that can leave the system in an > > unusable state > > > > During rapid offlining of multiple CPUs there is a chance > > that an IRQ affinity move destination CPU will be offlined > > before the IRQ affinity move initiated during the offlining > > of a previous CPU completes. This can happen when the device > > is not very active and thus fails to generate the IRQ that is > > needed to complete the IRQ affinity move before the move > > destination CPU is offlined. When this happens there is an > > -EBUSY return from __assign_irq_vector() during the offlining > > of the IRQ move destination CPU which prevents initiation of > > a new IRQ affinity move operation to an online CPU. This > > leaves the IRQ affinity set to an offlined CPU. > > > > I have been able to reproduce the problem on some of our > > systems using the following script. When the system is idle > > the problem often reproduces during the first CPU offlining > > sequence. > > Nacked-by: "Eric W. Biederman" > > fixup_irqs() is broken for allowing such a thing. When fixup_irqs() calls the set_affinity function: ... if (desc->chip->set_affinity) desc->chip->set_affinity(irq, affinity); ... it receives no feedback so it obviously expects the set_affinity function or it's called functions to do the right thing by preventing or correctly handling any problems that should arise. In the case of this bug there is obviously a problem happening during the set_affinity function call that needs to be resolved and/or properly handled. When you made your "x86_64 irq: Safely cleanup an irq after moving it." changes (re: http://lkml.org/lkml/2007/2/23/92) you added the check to __assign_irq_vector() that causes it to return -EBUSY if the migration of the IRQ is still in progress: + if ((cfg->move_in_progress) || cfg->move_cleanup_count) + return -EBUSY; + However, you did not add any code to other functions on the call stack to properly deal with this error. When doing this you may have assumed (as I may have also assumed) that the underlying code was solid enough that the handling was not needed. Unfortunately, you apparently did not anticipate the case where an idle or relatively idle device may not generate the IRQ needed to complete the move before the CPU that is still handling that IRQ is offlined. My fix only addresses the issue that caused the -EBUSY return and subsequent mess. It does not address the omitted handling for this error condition. If we were to add the handling to fixup_irq() and the arch and non-arch specific functions above it on the call stack as you may be suggesting, it would be quite involved because of all the things that would need to be undone. I am not certain that my fix plugs the very last hole that could cause the -EBUSY return from __assign_irq_vector() so maybe we should at least add a warning or BUG_ON to make the unhandled error more obvious in the future. I would be happy to provide this via a separate patch. Thanks, Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/