Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966695AbWKTUxf (ORCPT ); Mon, 20 Nov 2006 15:53:35 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S966698AbWKTUxf (ORCPT ); Mon, 20 Nov 2006 15:53:35 -0500 Received: from e5.ny.us.ibm.com ([32.97.182.145]:38843 "EHLO e5.ny.us.ibm.com") by vger.kernel.org with ESMTP id S966695AbWKTUxe (ORCPT ); Mon, 20 Nov 2006 15:53:34 -0500 Date: Mon, 20 Nov 2006 15:52:56 -0500 From: Vivek Goyal To: Pavel Emelianov Cc: Pavel Emelianov , Andrew Morton , mingo@redhat.com, Kirill Korotaev , linux kernel mailing list Subject: Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks Message-ID: <20061120205256.GB6141@in.ibm.com> Reply-To: vgoyal@in.ibm.com References: <455484E4.1020100@openvz.org> <20061120192335.GA11879@in.ibm.com> <20061120195652.GA6141@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20061120195652.GA6141@in.ibm.com> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2550 Lines: 69 On Mon, Nov 20, 2006 at 02:56:52PM -0500, Vivek Goyal wrote: > On Mon, Nov 20, 2006 at 02:23:35PM -0500, Vivek Goyal wrote: > > On Fri, Nov 10, 2006 at 04:55:48PM +0300, Pavel Emelianov wrote: > > > As the second lock on booth CPUs is taken before checking that > > > this irq is being handled in another processor this may cause > > > a deadlock. This issue is only theoretical. > > > > > > I propose the attached patch to fix booth problems: when trying > > > to handle misrouted IRQ active desc->lock may be unlocked. > > > > > > Please comment. > > > > > --- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300 > > > +++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300 > > > @@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st > > > if (unlikely(irqfixup)) { > > > /* Don't punish working computers */ > > > if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) { > > > - int ok = misrouted_irq(irq); > > > + int ok; > > > + > > > + spin_unlock(&desc->lock); > > > + ok = misrouted_irq(irq); > > > + spin_lock(&desc->lock); > > > > Hi Pavel, > > > > Till -rc5, I was able to boot a kernel with irqpoll option and in -rc6 I > > can't. It simply hangs. I suspect it is realted to this change. I have yet > > to confirm that. But before that one observation. > > > > Hi Pavel, > > If I backout your changes, everything works fine. So it looks like that > the problem I am facing is because of your patch but I don't have a logical > explanation yet that why the problem is there. Just realasing a lock > which is not currently acquired should not hang the system? > Some more data regarding this issue. For my system it gets locked in following sequence. handle_level_irq() { spin_unlock(desc->lock) ...... note_interrupt() { /* Called without desc->lock held */ spin_unlock(desc->lock) misrouted_irq() spin_lock(desc->lock) } spin_lock(desc->lock) /* Lockup */ } So basically problems seems to be due to calling conventions of note_interrupt(). In few cases we call it with desc->lock held and in other cases we call it with desc->lock not held. I guess, note_interrupt() should restore the desc->lock status back to the state of the lock when function was entered so that caller does not lockup. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/