Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933968AbXJRPbp (ORCPT ); Thu, 18 Oct 2007 11:31:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760672AbXJRPb3 (ORCPT ); Thu, 18 Oct 2007 11:31:29 -0400 Received: from cerber.ds.pg.gda.pl ([153.19.208.18]:44709 "EHLO cerber.ds.pg.gda.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757010AbXJRPb1 (ORCPT ); Thu, 18 Oct 2007 11:31:27 -0400 Date: Thu, 18 Oct 2007 16:31:01 +0100 (BST) From: "Maciej W. Rozycki" To: Jarek Poplawski cc: Andy Fleming , Andrew Morton , Jeff Garzik , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes In-Reply-To: <20071018143752.GA3630@ff.dom.local> Message-ID: References: <20071015125301.GC3015@ff.dom.local> <20071016062108.GB1000@ff.dom.local> <20071017085809.GA1658@ff.dom.local> <20071018143752.GA3630@ff.dom.local> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3952 Lines: 88 On Thu, 18 Oct 2007, Jarek Poplawski wrote: > Technically until free_irq returns a handler should respond to calls > and with proper hardware it should have no problem with checking if > it's its interrupt, even after disabling this hardware, because of > possible races. Well, the hardirq handler can check it, no problem -- it is just it is so slow, the latency would be unacceptable. The problem with the softirq handler is we do really not want it to be called after the driver has already been shut down and its structures freed. It used to happen before this flush/cancel call was added with the usual effect (oops) as by then they may well have been stamped on already. > But with a scenario like this: > > - disable_irq() > - disable_my_hadrware_irq() > - clear_my_hardware_irq() > - free_irq() > - enable_irq() > > it seems the handler should respond even after free_irq because there > could be still interrupts to resend, originally generated by its > hardware, so such behavior looks very suspicious, at least with some > type of interrupts. These are softirqs, not hardware interrupts, so they are as such not related to *_irq() infrastructure. The flaw is the depth count of IRQ lines is not maintained consistently. This is, according to comments around the code in question, to cover up bugs elsewhere. Not a brillant idea, I am afraid -- there should be no need to reset the depth upon request_irq() and likewise with free_irq(), but there you go. I would be happy to investigate a possible solution and rewrite the necessary bits, but right now I am committed to other stuff, overdue already, sorry. The view could change if we supported hot-pluggable interrupt controllers, but it is not the case at the moment right now, so the interrupt lines are there to stay for the duration of the system lifespan and could be manipulated as necessary. > So, I think, the idea of DEBUG_SHIRQ is generally right and very > useful - but, of course, there could be exceptions, which btw. could > try some hacks under DEBUG_SHIRQ too. And my opinion about > 'properness' was very general (not about phy) too, just like my > 'knowledge' of drivers. The idea is right, no question, but I am not quite sure it has been properly architected into our current design. Actually I am almost sure of the reverse. This is why I was (and still am) interested in feedback on it. > Right! But then some warning could be useful, I presume. Of course; adding one to phy_error() should be straightforward. > > > 4) phy_interrupt() should check return value from schedule_work() and > > > enable irq on 0. > > > > No -- the work already pending will do that. > > How? If schedule_work won't succeed because there is a pending one, > we did disable_irq_nosync 2x, so we would stay at least 1x too deep! Correct and my note is misleading, sorry. The thing is we shouldn't have come here the second time in the first place (which is I think why the check is not there) as handlers for the same line are not allowed to run in parallel (cf. irq_desc->lock and IRQ_INPROGRESS). Perhaps BUG_ON(!schedule_work()) would be appropriate, though I am not sure if we should handle every "impossible" condition we can imagine. Quite a lot of hardirq handlers assume two instances will not run in parallel, so if it ever happened, then a serious breakage would unleash. > But, I've enough of other concerns too, so nothing urgent here... The problem is at the moment I am still probably the only user of this code ;-) -- `grep' through the sources to see how many drivers request an IRQ for their PHY through phylib; unless something has changed very recently. Maciej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/