LinuxLists.cc - [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Fri, 27 Nov 2009, Uwe Kleine-K?nig wrote:

> IRQF_DISABLED is not guaranteed on shared irqs. There is already a
> warning in place for irqs registered with request_irq (introduced in
> 470c66239ef03). Move it to __setup_irq, this way it triggers for both
> request_irq and setup_irq.
>
> One irq that is now warned about is the timer tick on at91 (ARCH=arm).

And how does that help ? The interrupt is shared between the timer and
the debug port. There is nothing you can do about that.

The interupt handlers are called in order of setup. The AT91 timer
irq is set up first and if that's not the case then it needs to be
fixed and the only way to catch it is in the affected interrupt
handler.

Applying your patch does not change the hardware and will just result
in useless, annoying and confusing dmesg warnings.

> Cc: Rusty Russell <[email protected]>
> Cc: Remy Bohmer <[email protected]>,
> Cc: Hugh Dickins <[email protected]>,
> Cc: Andrea Gallo <[email protected]>,

Nice that you added me (and others) to that long list, but ...

> Cc: Thomas Gleixner <[email protected]>,
------------------------------------------^

Thanks,

tglx

2009-11-28 20:03:44

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Fri, Nov 27, 2009 at 11:18:00PM +0100, Thomas Gleixner wrote:
> On Fri, 27 Nov 2009, Uwe Kleine-K?nig wrote:
>
> > IRQF_DISABLED is not guaranteed on shared irqs. There is already a
> > warning in place for irqs registered with request_irq (introduced in
> > 470c66239ef03). Move it to __setup_irq, this way it triggers for both
> > request_irq and setup_irq.
> >
> > One irq that is now warned about is the timer tick on at91 (ARCH=arm).
>
> And how does that help ? The interrupt is shared between the timer and
> the debug port. There is nothing you can do about that.
>
> The interupt handlers are called in order of setup. The AT91 timer
> irq is set up first and if that's not the case then it needs to be
> fixed and the only way to catch it is in the affected interrupt
> handler.
Russell already suggests to save (and restore) irqs in the handler
before (and after resp.) calling the clockevent functions.

Should it use the raw_ variants?

> Applying your patch does not change the hardware and will just result
> in useless, annoying and confusing dmesg warnings.
My patch wasn't mainly to help in the at91 case. I just thought that a
warning that is triggered with request_irq and request_threaded_irq
shouldn't be skipped when using setup_irq.

Maybe the warning should only be printed if there's a mismatch between
different actions of the same irq regarding IRQF_DISABLED?!

I will prepare a patch for both (i.e. fixing the at91 ISRs and the
warning about IRQF_DISABLED | IRQF_SHARED) but probably only on Monday.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-11-28 21:51:01

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Sat, 28 Nov 2009, Uwe Kleine-K?nig wrote:

> On Fri, Nov 27, 2009 at 11:18:00PM +0100, Thomas Gleixner wrote:
> > On Fri, 27 Nov 2009, Uwe Kleine-K?nig wrote:
> >
> > > IRQF_DISABLED is not guaranteed on shared irqs. There is already a
> > > warning in place for irqs registered with request_irq (introduced in
> > > 470c66239ef03). Move it to __setup_irq, this way it triggers for both
> > > request_irq and setup_irq.
> > >
> > > One irq that is now warned about is the timer tick on at91 (ARCH=arm).
> >
> > And how does that help ? The interrupt is shared between the timer and
> > the debug port. There is nothing you can do about that.
> >
> > The interupt handlers are called in order of setup. The AT91 timer
> > irq is set up first and if that's not the case then it needs to be
> > fixed and the only way to catch it is in the affected interrupt
> > handler.
>
> Russell already suggests to save (and restore) irqs in the handler
> before (and after resp.) calling the clockevent functions.

What about analysing the code and verifying that the setup order is
correct ?

Adding save/restore_irq just because you have no clue what the code
does is utter nonsense.

Thanks,

tglx

2009-11-28 22:09:09

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Saturday 28 November 2009, Uwe Kleine-K?nig wrote:
> ?I just thought that a
> warning that is triggered with request_irq and request_threaded_irq
> shouldn't be skipped when using setup_irq.

Seems fair to me.

2009-11-28 22:19:49

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Saturday 28 November 2009, Thomas Gleixner wrote:
> What about analysing the code and verifying that the setup order is
> correct ?

Better to not care too much about setup order. Sometimes it's
unavoidable -- X initializes before Y "or else..." -- but the
patch I saw was just reporting goofage better.

> Adding save/restore_irq just because you have no clue what the code
> does is utter nonsense.

Absolutely. If that helps, find and fix the real bug.

2009-11-29 02:32:00

by Jamie Lokier

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

Thomas Gleixner wrote:
> What about analysing the code and verifying that the setup order is
> correct ?
>
> Adding save/restore_irq just because you have no clue what the code
> does is utter nonsense.

Wouldn't it be quite a lot nicer if generic setup moved the
IRQF_DISABLED handler to be first in the list, if that actually works
in a useful way rather than simply being a quirk that irqs are
disabled for the first one?

-- Jamie

2009-11-29 10:26:55

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

Hello,

On Sun, Nov 29, 2009 at 02:31:18AM +0000, Jamie Lokier wrote:
> Thomas Gleixner wrote:
> > What about analysing the code and verifying that the setup order is
> > correct ?
> >
> > Adding save/restore_irq just because you have no clue what the code
> > does is utter nonsense.
>
> Wouldn't it be quite a lot nicer if generic setup moved the
> IRQF_DISABLED handler to be first in the list, if that actually works
> in a useful way rather than simply being a quirk that irqs are
> disabled for the first one?
Hmm, what happens if an ISR runs with irqs disabled even though it
doesn't expect it? I wouldn't bet that nothing breaks.

IMHO the best is if a warning is printed or registering fails if shared
irq actions don't agree about wanting IRQF_DISABLED.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-11-29 15:19:01

by Jamie Lokier

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

Uwe Kleine-K?nig wrote:
> Hello,
>
> On Sun, Nov 29, 2009 at 02:31:18AM +0000, Jamie Lokier wrote:
> > Thomas Gleixner wrote:
> > > What about analysing the code and verifying that the setup order is
> > > correct ?
> > >
> > > Adding save/restore_irq just because you have no clue what the code
> > > does is utter nonsense.
> >
> > Wouldn't it be quite a lot nicer if generic setup moved the
> > IRQF_DISABLED handler to be first in the list, if that actually works
> > in a useful way rather than simply being a quirk that irqs are
> > disabled for the first one?
> Hmm, what happens if an ISR runs with irqs disabled even though it
> doesn't expect it? I wouldn't bet that nothing breaks.

Moving the IRQF_DISABLED handler to be first will run an ISR with
interrupts disabled which *does* expect it, so that's good.

According to this thread, at the moment when you have multiple
IRQF_DISABLED|IRQF_SHARED ISRs, only the first one is run with
interrupts disabled.

In fact I don't see why the kernel cannot put _all_ of the
IRQ_DISABLED handlers at the beginning of the list, traverse those
with interrupts disabled, then enable interrupts them for the
remaining handlers.

> IMHO the best is if a warning is printed or registering fails if shared
> irq actions don't agree about wanting IRQF_DISABLED.

On the hardware in question, the debug and timer interrupts share a
line, and I'm guessing only the timer interrupt should have IRQF_DISABLED.

Or we could do away with this silliness and just switch everything to
threaded interrupts with RT-priorities ;-)

-- Jamie

2009-11-29 15:29:10

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Sun, Nov 29, 2009 at 03:18:40PM +0000, Jamie Lokier wrote:
> Or we could do away with this silliness and just switch everything to
> threaded interrupts with RT-priorities ;-)

... thereby needlessly increasing the latency of all interrupt handling
and probably breaking some devices.

Some devices (eg, SMC91x, serial ports, PIO audio - AACI, MMCI) are
_extremely_ sensitive to interrupt latency due to lack of FIFO depth.

MMCI already sees overruns on ARM platforms if you try and clock the
cards at anything over about 500kHz. Adding any more latency means
reducing the maximum clock rate there, and therefore crippling
throughput even more than it is already.

2009-11-30 09:29:09

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

Hello Jamie,

On Sun, Nov 29, 2009 at 03:18:40PM +0000, Jamie Lokier wrote:
> According to this thread, at the moment when you have multiple
> IRQF_DISABLED|IRQF_SHARED ISRs, only the first one is run with
> interrupts disabled.
No. When you have multiple IRQF_SHARED ISRs *all* are run with irqs
disabled if the first has IRQF_DISABLED.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-11-30 09:55:47

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

On Sun, 29 Nov 2009, Jamie Lokier wrote:
> > > Wouldn't it be quite a lot nicer if generic setup moved the
> > > IRQF_DISABLED handler to be first in the list, if that actually works
> > > in a useful way rather than simply being a quirk that irqs are
> > > disabled for the first one?
> > Hmm, what happens if an ISR runs with irqs disabled even though it
> > doesn't expect it? I wouldn't bet that nothing breaks.
>
> Moving the IRQF_DISABLED handler to be first will run an ISR with
> interrupts disabled which *does* expect it, so that's good.
>
> According to this thread, at the moment when you have multiple
> IRQF_DISABLED|IRQF_SHARED ISRs, only the first one is run with
> interrupts disabled.

Wrong. The flags of the first action are checked to decide whether
interrupts are enabled _before_ calling the action handler.

So for shared interrupts we have the following combinations:

action1 action2
A - -
B IRQF_DISABLED -
C - IRQF_DISABLED
D IRQF_DISABLED IRQF_DISABLED

A) Correct behaviour for action1 and action2

B) Correct behaviour for action1, but action2 is called with
interrupts disabled

C) Correct behaviour for action1, but action2 is called with
interrupts enabled

D) Correct behaviour for action1 and action2 in theory

> In fact I don't see why the kernel cannot put _all_ of the
> IRQ_DISABLED handlers at the beginning of the list, traverse those
> with interrupts disabled, then enable interrupts them for the
> remaining handlers.

That does not work reliable, because IRQF_DISABLED is just describing
the calling convention of the interrupt.

It does not say, that the handler is not allowed to enable
interrupts. So for D) there is no guarantee that the action1 handler
keeps interrupts disabled. If it enables interrupts in the handler
then the assumptions of action2 are already violated.

> > IMHO the best is if a warning is printed or registering fails if shared
> > irq actions don't agree about wanting IRQF_DISABLED.

No, we print a warning when IRQF_SHARED is used with IRQF_DISABLED as
there is no point about agreeing on something which can not be
guaranteed at all.

Thanks,

tglx

2009-11-30 10:48:14

[permalink] [raw]

Subject: [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED at the right place

For shared irqs IRQF_DISABLED is only guaranteed for the first handler.
So only warn starting at the second registration.

The warning is moved to __setup_irq having the additional benefit of
catching actions registered using setup_irq not only register_irq.

This doesn't fix the cases where setup order is wrong but it should
report the broken cases more reliably.

Signed-off-by: Uwe Kleine-König <[email protected]>
Cc: David Brownell <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Nicolas Pitre <[email protected]>
Cc: Eric Miao <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Remy Bohmer <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Gallo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jamie Lokier <[email protected]>
Cc: [email protected]
---
kernel/irq/manage.c | 23 ++++++++++-------------
1 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index bde4c66..d8f7415 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -702,6 +702,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
goto mismatch;
#endif

+ /*
+ * handle_IRQ_event() only honors IRQF_DISABLED for the _first_
+ * irqaction. As the first handler might reenable irqs all bets
+ * are off for the later handlers even if the first one has
+ * IRQF_DISABLED, too.
+ */
+ if (new->flags & IRQF_DISABLED)
+ pr_warning("IRQ %d/%s: IRQF_DISABLED is not guaranteed "
+ "on shared IRQs\n", irq, new->name);
+
/* add new interrupt at end of irq queue */
do {
old_ptr = &old->next;
@@ -1008,19 +1018,6 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
struct irq_desc *desc;
int retval;

- /*
- * handle_IRQ_event() always ignores IRQF_DISABLED except for
- * the _first_ irqaction (sigh). That can cause oopsing, but
- * the behavior is classified as "will not fix" so we need to
- * start nudging drivers away from using that idiom.
- */
- if ((irqflags & (IRQF_SHARED|IRQF_DISABLED)) ==
- (IRQF_SHARED|IRQF_DISABLED)) {
- pr_warning(
- "IRQ %d/%s: IRQF_DISABLED is not guaranteed on shared IRQs\n",
- irq, devname);
- }
-
#ifdef CONFIG_LOCKDEP
/*
* Lockdep wants atomic interrupt handlers:
--
1.6.5.2

2009-11-30 13:56:09

[permalink] [raw]

Subject: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Uwe Kleine-K?nig wrote:
> For shared irqs IRQF_DISABLED is only guaranteed for the first handler.
> So only warn starting at the second registration.
>
> The warning is moved to __setup_irq having the additional benefit of
> catching actions registered using setup_irq not only register_irq.
>
> This doesn't fix the cases where setup order is wrong but it should
> report the broken cases more reliably.

The whole IRQF_DISABLED trickery is questionable and I'm pretty
unhappy about the warning in general.

While it is true that there is no guarantee of IRQF_DISABLED on shared
interrupts (at least not for the secondary handlers) we really need to
think about the reason why we want to run interrupt handlers with
interrupts enabled at all.

The separation of interrupt handlers which run with interrupts
disabled/enabled goes all the way back to Linux 1.0, which had two
interrupt handling modes:

1) handlers installed with SA_INTERRUPT ran atomically with interrupts
disabled.

2) handlers installed without SA_INTERRUPT ran with interrupts enabled
as they did more complex stuff like signal handling in the kernel.

The interrupt which was always run with interrupts disabled was the
timer interrupt because some of the "slower" interrupt handlers were
relying on jiffies being updated, which is only possible when they run
with interrupts enabled and no such handler can interrupt the timer
interrupt.

In the 2.1.x timeframe the discussion about shared interrupt handlers
and the treatment of SA_INTERRUPT (today IRQF_DISABLED) was resolved
by changing the code to what we have right now. If you read back in
the archives you will find the same arguments as we have seen in this
thread and a boatload of different solutions to that.

The real question is why we want to run an interrupt handler with
interrupts enabled at all. There are two reaons AFAICT:

1) interrupt handler relies on jiffies being updated:

I don't think that this is the case anymore and if we still have
code which does it is probably historic crap which is unused for
quite a time.

2) interrupt handler runs a long time:

I'm sure we still have some of those especially in the
archaelogical corners of drivers/* and in the creative space of the
embedded "oh, I don't know why but it works" departement. That's
code which needs to be fixed anyway.

The correct solution IMNSHO is to get rid of IRQF_DISABLED and run
interrupt handlers always with interrupts disabled and require them
not to reenable interrupts themself.

Thoughts ?

tglx

2009-11-30 14:03:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 2009-11-30 at 14:54 +0100, Thomas Gleixner wrote:
> On Mon, 30 Nov 2009, Uwe Kleine-König wrote:
> > For shared irqs IRQF_DISABLED is only guaranteed for the first handler.
> > So only warn starting at the second registration.
> >
> > The warning is moved to __setup_irq having the additional benefit of
> > catching actions registered using setup_irq not only register_irq.
> >
> > This doesn't fix the cases where setup order is wrong but it should
> > report the broken cases more reliably.
>
> The whole IRQF_DISABLED trickery is questionable and I'm pretty
> unhappy about the warning in general.
>
> While it is true that there is no guarantee of IRQF_DISABLED on shared
> interrupts (at least not for the secondary handlers) we really need to
> think about the reason why we want to run interrupt handlers with
> interrupts enabled at all.
>
> The separation of interrupt handlers which run with interrupts
> disabled/enabled goes all the way back to Linux 1.0, which had two
> interrupt handling modes:
>
> 1) handlers installed with SA_INTERRUPT ran atomically with interrupts
> disabled.
>
> 2) handlers installed without SA_INTERRUPT ran with interrupts enabled
> as they did more complex stuff like signal handling in the kernel.
>
> The interrupt which was always run with interrupts disabled was the
> timer interrupt because some of the "slower" interrupt handlers were
> relying on jiffies being updated, which is only possible when they run
> with interrupts enabled and no such handler can interrupt the timer
> interrupt.
>
> In the 2.1.x timeframe the discussion about shared interrupt handlers
> and the treatment of SA_INTERRUPT (today IRQF_DISABLED) was resolved
> by changing the code to what we have right now. If you read back in
> the archives you will find the same arguments as we have seen in this
> thread and a boatload of different solutions to that.
>
> The real question is why we want to run an interrupt handler with
> interrupts enabled at all. There are two reaons AFAICT:
>
> 1) interrupt handler relies on jiffies being updated:
>
> I don't think that this is the case anymore and if we still have
> code which does it is probably historic crap which is unused for
> quite a time.
>
> 2) interrupt handler runs a long time:
>
> I'm sure we still have some of those especially in the
> archaelogical corners of drivers/* and in the creative space of the
> embedded "oh, I don't know why but it works" departement. That's
> code which needs to be fixed anyway.
>
> The correct solution IMNSHO is to get rid of IRQF_DISABLED and run
> interrupt handlers always with interrupts disabled and require them
> not to reenable interrupts themself.
>
> Thoughts ?

I'm all for removing that brain damage:

http://lkml.org/lkml/2009/3/2/33

We should convert the broken hardware PIO and 3com interrupt things to
threaded interrupts, and simply mandate all IRQ handlers run short and
with IRQs disabled.

Except I guess that will upset some of the IRQ priority folks, like
power, where they (iirc) have a stack per irq prio level.

But its not like the core kernel knows about these nesting rules and can
actually track any of that muck.

2009-11-30 14:26:18

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Peter Zijlstra wrote:
> On Mon, 2009-11-30 at 14:54 +0100, Thomas Gleixner wrote:
> > The correct solution IMNSHO is to get rid of IRQF_DISABLED and run
> > interrupt handlers always with interrupts disabled and require them
> > not to reenable interrupts themself.
> >
> > Thoughts ?
>
> I'm all for removing that brain damage:
>
> http://lkml.org/lkml/2009/3/2/33

Darn, I knew that we discussed that before, but my memory tricked me
into believing that it was years ago :)

> We should convert the broken hardware PIO and 3com interrupt things to
> threaded interrupts, and simply mandate all IRQ handlers run short and
> with IRQs disabled.

The ide stuff seems to be the only one which uses
local_irq_enable_in_hardirq() so that at least it tripped over lockdep
already.

What's the 3com wreckage about ?

> Except I guess that will upset some of the IRQ priority folks, like
> power, where they (iirc) have a stack per irq prio level.

Is that actually used ? Ben ?

> But its not like the core kernel knows about these nesting rules and can
> actually track any of that muck.

True.

tglx

2009-11-30 14:38:44

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, Nov 30, 2009 at 02:54:54PM +0100, Thomas Gleixner wrote:
> The correct solution IMNSHO is to get rid of IRQF_DISABLED and run
> interrupt handlers always with interrupts disabled and require them
> not to reenable interrupts themself.

I think Linus advocated at one point making the default be "irqs disabled"
and only only if a flag was passed would handlers run with IRQs enabled.
However, that might have been a step towards having all handlers running
with IRQs disabled.

I'm all in favour of it. There are a large number of relatively simple
interrupt handlers out there which don't care about whether they're
called with IRQs disabled or not.

However, I think we still have a number of corner cases. The SMC91x
driver comes to mind, with its stupidly small FIFOs, where the majority
of implementations have to have the packets loaded via PIO - and this
seems to generally happen from IRQ context.

The upshot of that is switching the SMC91x interrupt handler to run with
IRQs disabled will mean that serial can suffer with overruns, especially
if the serial port FIFO is also small.

The alternative is to push the "expensive" packet-loading parts of SMC91x
support into a tasklet, but that's probably going to impact performance
of the driver.

What I'm saying is that I think it's a good idea, but we should be
cautious about forcing a blanket change - to do so I believe risks creating
performance regressions.

Maybe a "safer" way forward is to go and find all those request_irq()
sites and add IRQF_DISABLED to them all, wait for regression reports and
selectively remove the IRQF_DISABLED flags? We would then be able to
build up a picture of the problematical drivers that need to be reworked,
and whether the "run everything with irqs disabled" is even a practical
proposition.

Now, at the risk of covering old ground, how about we have two separate
irqaction lists, one for handlers to be called with irqs disabled and
one for handlers with irqs enabled. We run the irqs-disabled list
first, naturally with irqs disabled. If, at the end of that run (or
maybe after each handler), IRQs have ended being enabled, print some
diagnostics. (We're going to need something like this to ensure that
drivers interrupt handlers don't enable IRQs themselves.) Then enable
IRQs and run the irqs-enabled chain.

2009-11-30 14:40:50

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, Nov 30, 2009 at 02:37:03PM +0000, Russell King - ARM Linux wrote:
> Now, at the risk of covering old ground, how about we have two separate
> irqaction lists, one for handlers to be called with irqs disabled and
> one for handlers with irqs enabled. We run the irqs-disabled list
> first, naturally with irqs disabled. If, at the end of that run (or
> maybe after each handler), IRQs have ended being enabled, print some
> diagnostics. (We're going to need something like this to ensure that
> drivers interrupt handlers don't enable IRQs themselves.) Then enable
> IRQs and run the irqs-enabled chain.

Oh, and the other interesting thing to do may be to have a way of
measuring how much time irq handlers run for, so that handlers taking
an excessive time (more than 0.5ms or so - thinking about the 1000Hz
timer rate found on some arches) can be targetted.

2009-11-30 14:48:28

by Alan

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009 15:24:40 +0100 (CET)
Thomas Gleixner <[email protected]> wrote:

> On Mon, 30 Nov 2009, Peter Zijlstra wrote:
> > On Mon, 2009-11-30 at 14:54 +0100, Thomas Gleixner wrote:
> > > The correct solution IMNSHO is to get rid of IRQF_DISABLED and run
> > > interrupt handlers always with interrupts disabled and require them
> > > not to reenable interrupts themself.
> > >
> > > Thoughts ?
> >
> > I'm all for removing that brain damage:
> >
> > http://lkml.org/lkml/2009/3/2/33
>
> Darn, I knew that we discussed that before, but my memory tricked me
> into believing that it was years ago :)

Well the patch listed there is utterly bogus and will cause hangs at
startup. The problem case is IRQF_SHARED|IRQF_DISABLED. The patch messes
up all the unshared cases too - and lots of non sharable IRQ hardware just
jams the IRQ line high until you beat it into sense (8530's are notorious
for getting into that kind of state at init for example). You can't simply
remove the disabled from those drivers, you need to be able to allocate a
non-shared IRQ, and then enable it or do major driver restructuring of
obscure old driver code.

SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
then return -EINVAL. And with any luck that'll prove 6 months later that
most of the offenders are not used and we can delete them wholesale.

DISABLE without SHARED is fine, and saves waking the dead.

2009-11-30 14:51:03

by Alan

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

> However, I think we still have a number of corner cases. The SMC91x
> driver comes to mind, with its stupidly small FIFOs, where the majority
> of implementations have to have the packets loaded via PIO - and this
> seems to generally happen from IRQ context.

Everything 8390 based is in the same boat. It relies on being able to
use disable_irq_nosync/enable_irq and knows all about the joys of
interrupt bus asynchronicity internally. That does however allow it to
get sane results by using the irq controller to mask the potentially
shared IRQ at source.

2009-11-30 15:06:46

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, Nov 30, 2009 at 02:47:02PM +0000, Alan Cox wrote:
> SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
> then return -EINVAL.

That is an impossibility. There is hardware out there (AT91) where
the timer interrupt is shared with other peripherals, and you end
up with a mixture of irqs-disabled and irqs-enabled handlers sharing
the same interrupt.

Luckily, the timer interrupt is the first to claim, and so is the first
to be run. However, there was a problem reported a while back of the
clock event code being called on AT91 with IRQs enabled - unfortunately
the original reporters stopped responding so it was never worked out
what was going on.

My point is that if we outlaw irqs-disabled shared interrupts, it puts
Atmel AT91 support into immediate difficulties.

2009-11-30 15:33:06

by Alan

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009 15:01:00 +0000
Russell King - ARM Linux <[email protected]> wrote:

> On Mon, Nov 30, 2009 at 02:47:02PM +0000, Alan Cox wrote:
> > SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
> > then return -EINVAL.
>
> That is an impossibility. There is hardware out there (AT91) where
> the timer interrupt is shared with other peripherals, and you end
> up with a mixture of irqs-disabled and irqs-enabled handlers sharing
> the same interrupt.

Well that will encourage people to fix it.

> My point is that if we outlaw irqs-disabled shared interrupts, it puts
> Atmel AT91 support into immediate difficulties.

If a driver disables the timer irq across a tick aren't you already in
trouble ?

2009-11-30 15:38:22

by Nicolas Pitre

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Russell King - ARM Linux wrote:

> However, I think we still have a number of corner cases. The SMC91x
> driver comes to mind, with its stupidly small FIFOs, where the majority
> of implementations have to have the packets loaded via PIO - and this
> seems to generally happen from IRQ context.

That is indeed the case for the RX path in order to avoid packet drop as
much as possible. The TX path is always run outside IRQ context though.

> The upshot of that is switching the SMC91x interrupt handler to run with
> IRQs disabled will mean that serial can suffer with overruns, especially
> if the serial port FIFO is also small.

That can happen now already anyway, regardless of whether IRQ handlers
are run with IRQs enabled or not. Suffice to have serial and smc91x
interrupts asserted more or less at the same time, i.e. before the
pending interrupt sources are actually determined and interrupts enabled
again, in which case the IRQ handlers are serialized usually according
to their IRQ number (most target don't use sophisticated IRQ
priorities).

And then, the serial interrupt isn't currently registered with
IRQF_DISABLED, meaning that its handler can be interrupted by any other
interrupt coming along, including the heavier smc91x RX code. That
isn't much different from having all interrupt handlers run with IRQs
disabled and the serial interrupt having to wait after the smc91x one.

In other words, I don't think having all IRQ handlers run with IRQs
disabled would change much wrt average IRQ latency in practice.
Without real hardware based IRQ priority management (or thread based IRQ
handlers with software managed priorities), The smc91x handler could
adversely affect the serial handler in either cases.

Nicolas

2009-11-30 15:48:15

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, Nov 30, 2009 at 03:32:23PM +0000, Alan Cox wrote:
> On Mon, 30 Nov 2009 15:01:00 +0000
> Russell King - ARM Linux <[email protected]> wrote:
> > On Mon, Nov 30, 2009 at 02:47:02PM +0000, Alan Cox wrote:
> > > SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
> > > then return -EINVAL.
> >
> > That is an impossibility. There is hardware out there (AT91) where
> > the timer interrupt is shared with other peripherals, and you end
> > up with a mixture of irqs-disabled and irqs-enabled handlers sharing
> > the same interrupt.
>
> Well that will encourage people to fix it.
>
> > My point is that if we outlaw irqs-disabled shared interrupts, it puts
> > Atmel AT91 support into immediate difficulties.
>
> If a driver disables the timer irq across a tick aren't you already in
> trouble ?

Not with this clockevents code - you can miss timer interrupts and the core
time keeping code catches up automatically.

2009-11-30 17:47:59

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Russell King - ARM Linux wrote:
> On Mon, Nov 30, 2009 at 02:54:54PM +0100, Thomas Gleixner wrote:
> Maybe a "safer" way forward is to go and find all those request_irq()
> sites and add IRQF_DISABLED to them all, wait for regression reports and
> selectively remove the IRQF_DISABLED flags? We would then be able to
> build up a picture of the problematical drivers that need to be reworked,
> and whether the "run everything with irqs disabled" is even a practical
> proposition.

Well, that's basically the same as removing IRQF_DISABLED, setting the
default to run disabled and provide a new flag IRQF_ENABLE_IRQS which
can be added to drivers when regressions show up. That way you just
have to grep for IRQF_ENABLE_IRQS instead of doing a search for
everything which has it removed. We can keep IRQF_DISABLED around by
setting it to 0 for the transition, so we don't have to touch the
world in the first place.

> Now, at the risk of covering old ground, how about we have two separate
> irqaction lists, one for handlers to be called with irqs disabled and
> one for handlers with irqs enabled. We run the irqs-disabled list
> first, naturally with irqs disabled. If, at the end of that run (or
> maybe after each handler), IRQs have ended being enabled, print some
> diagnostics. (We're going to need something like this to ensure that
> drivers interrupt handlers don't enable IRQs themselves.) Then enable
> IRQs and run the irqs-enabled chain.

Pretty old ground. :) That was discussed 10 years ago and never found
much love, but yes we could do that. Either two list or sorting the
entries. That might avoid the obvious pitfall, but I doubt it'd help
us to fix the drivers which think that they need to do the irqs
enabled dance.

Thanks,

tglx

2009-11-30 17:49:47

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Russell King - ARM Linux wrote:

> On Mon, Nov 30, 2009 at 02:37:03PM +0000, Russell King - ARM Linux wrote:
> > Now, at the risk of covering old ground, how about we have two separate
> > irqaction lists, one for handlers to be called with irqs disabled and
> > one for handlers with irqs enabled. We run the irqs-disabled list
> > first, naturally with irqs disabled. If, at the end of that run (or
> > maybe after each handler), IRQs have ended being enabled, print some
> > diagnostics. (We're going to need something like this to ensure that
> > drivers interrupt handlers don't enable IRQs themselves.) Then enable
> > IRQs and run the irqs-enabled chain.
>
> Oh, and the other interesting thing to do may be to have a way of
> measuring how much time irq handlers run for, so that handlers taking
> an excessive time (more than 0.5ms or so - thinking about the 1000Hz
> timer rate found on some arches) can be targetted.

.33 will have trace points for this, so the infrastructure is there or
do you think about something permanent which does not depend on the
tracer ?

Thanks,

tglx

2009-11-30 19:51:30

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

Hello,

On Mon, Nov 30, 2009 at 02:54:54PM +0100, Thomas Gleixner wrote:
> On Mon, 30 Nov 2009, Uwe Kleine-K?nig wrote:
> > For shared irqs IRQF_DISABLED is only guaranteed for the first handler.
> > So only warn starting at the second registration.
> >
> > The warning is moved to __setup_irq having the additional benefit of
> > catching actions registered using setup_irq not only register_irq.
> >
> > This doesn't fix the cases where setup order is wrong but it should
> > report the broken cases more reliably.
>
> The whole IRQF_DISABLED trickery is questionable and I'm pretty
> unhappy about the warning in general.
>
> While it is true that there is no guarantee of IRQF_DISABLED on shared
> interrupts (at least not for the secondary handlers) we really need to
> think about the reason why we want to run interrupt handlers with
> interrupts enabled at all.
>
> The separation of interrupt handlers which run with interrupts
> disabled/enabled goes all the way back to Linux 1.0, which had two
> interrupt handling modes:
>
> 1) handlers installed with SA_INTERRUPT ran atomically with interrupts
> disabled.
>
> 2) handlers installed without SA_INTERRUPT ran with interrupts enabled
> as they did more complex stuff like signal handling in the kernel.

> The interrupt which was always run with interrupts disabled was the
> timer interrupt because some of the "slower" interrupt handlers were
> relying on jiffies being updated, which is only possible when they run
> with interrupts enabled and no such handler can interrupt the timer
> interrupt.
>
> In the 2.1.x timeframe the discussion about shared interrupt handlers
> and the treatment of SA_INTERRUPT (today IRQF_DISABLED) was resolved
> by changing the code to what we have right now. If you read back in
> the archives you will find the same arguments as we have seen in this
> thread and a boatload of different solutions to that.
>
> The real question is why we want to run an interrupt handler with
> interrupts enabled at all. There are two reaons AFAICT:
>
> 1) interrupt handler relies on jiffies being updated:
>
> I don't think that this is the case anymore and if we still have
> code which does it is probably historic crap which is unused for
> quite a time.
>
> 2) interrupt handler runs a long time:
>
> I'm sure we still have some of those especially in the
> archaelogical corners of drivers/* and in the creative space of the
> embedded "oh, I don't know why but it works" departement. That's
> code which needs to be fixed anyway.

I think there is

3) you can only benefit from decent priority hardware if irqs are
processed while irqs are enabled.

I think

git grep handle_fasteoi_irq

gives an overview here: some hits in arch/powerpc, arch/sparc and
arch/x86/kernel/apic/io_apic.c. (There is handle_prio_irq in
arch/arm/mach-ns9xxx, but the priodecoder is crappy and actually it
should use handle_level_irq IIRC.)

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-11-30 20:05:14

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 2009-11-30 at 15:24 +0100, Thomas Gleixner wrote:
>
> > Except I guess that will upset some of the IRQ priority folks, like
> > power, where they (iirc) have a stack per irq prio level.
>
> Is that actually used ? Ben ?

Nah, we have 2 dedicated stacks.. one for external interrupts and one
for softirq. On embedded we also have a separate stack for the critical
interrupts and machine checks but both are special (critical interrupts
are aking to FIQ on ARM).

> > But its not like the core kernel knows about these nesting rules and
> can actually track any of that muck.
>
> True.

Well, thing is, in cases where we have a sane PIC, the PIC itself is
perfectly good at keeping the source and anything of the same priority
or lower masked while we handle an irq.

So disabling local CPU IRQs will basically add an unnecessary blocking
of both timer interrupts and perfmon interrupts while doing so.

Yes, all driver interrupt handlers -should- be only running short amount
of code in their handlers but you know how it is. The drift introduced
on timer and perfmon events can be a problem, the later might even make
it difficult to figure out what an -interrupt- is taking more time than
it should.

I would suggest we timestamp the handlers in the core btw and warn if
they take too long so we get a chance to track down the bad guys.

Cheers,
Ben.

2009-11-30 20:15:34

by Andrew Victor

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

hi,

> There is hardware out there (AT91) where
> the timer interrupt is shared with other peripherals, and you end
> up with a mixture of irqs-disabled and irqs-enabled handlers sharing
> the same interrupt.

For the AT91 case I don't think this shouldn't matter.

The AT91's have a priority-level interrupt controller, so:
1. a lower-priority interrupt won't interrupt a higher-priority
2. shared interrupts cannot interrupt each other until irq_finish()
is called (a write to AIC_EOICR)

Since the Timer, DBGU serial port (and other system peripherals) are
on the same priority level they cannot interrupt each other.
(ie, basically as-if always irqs-disabled).

The case of irqs-enabled does means that a higher-priority interrupt
could interrupt [*], but it's not a shared-IRQ in that case.

([*] The system peripherals have the highest priority by default, so
the user would need to override the defaults for this to occur)

Regards,
Andrew Victor

2009-11-30 20:28:07

[permalink] [raw]

Subject: Re: [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED at the right place

On Monday 30 November 2009, Uwe Kleine-K?nig wrote:
> +???????????????if (new->flags & IRQF_DISABLED)
> +???????????????????????pr_warning("IRQ %d/%s: IRQF_DISABLED is not guaranteed "
> +???????????????????????????????????????"on shared IRQs\n", irq, new->name);

This should have copied the original test ... this way,
it's dropping the SHARED constraint, and trying to morph
into a generic "IRQF_DISABLED is eeebil!" test.

If it just moved the original test, I'd have no problem
with the patch.

... although it'd still not address the general mess in
this area, it'd at least not introduce false warnings.

2009-11-30 20:27:23

[permalink] [raw]

Subject: Re: [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED at the right place

On Mon, Nov 30, 2009 at 12:21:30PM -0800, David Brownell wrote:
> On Monday 30 November 2009, Uwe Kleine-K?nig wrote:
> > +???????????????if (new->flags & IRQF_DISABLED)
> > +???????????????????????pr_warning("IRQ %d/%s: IRQF_DISABLED is not guaranteed "
> > +???????????????????????????????????????"on shared IRQs\n", irq, new->name);
>
> This should have copied the original test ... this way,
> it's dropping the SHARED constraint, and trying to morph
> into a generic "IRQF_DISABLED is eeebil!" test.
No, it doesn't. The inserted code is in an if block:

old_ptr = &desc->action;
old = *old_ptr;
if (old) {
/* ... */
if (!((old->flags & new->flags) & IRQF_SHARED) ||
((old->flags ^ new->flags) & IRQF_TRIGGER_MASK)) {
old_name = old->name;
goto mismatch;
}

...

+ if (new->flags & IRQF_DISABLED)
+ pr_warning("...");

and the mismatch label is further below. So the warning is still only
hit if a shared irq is registered.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-11-30 21:00:06

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Monday 30 November 2009, Alan Cox wrote:
> SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
> then return -EINVAL. And with any luck that'll prove 6 months later that
> most of the offenders are not used and we can delete them wholesale.

So ... merge an updated version of the original patch, to
get full WARN coverage?

We've had that warning for a long time now. The original
patch just covered non-request_irq() cases. So by your
timetable we're ready for the "return -EINVAL" stage of
the migration... at least, for request_irq() callers.

2009-11-30 20:39:50

by Jamie Lokier

[permalink] [raw]

Subject: Re: [PATCH] warn about shared irqs requesting IRQF_DISABLED registered with setup_irq

Russell King - ARM Linux wrote:
> On Sun, Nov 29, 2009 at 03:18:40PM +0000, Jamie Lokier wrote:
> > Or we could do away with this silliness and just switch everything to
> > threaded interrupts with RT-priorities ;-)
>
> ... thereby needlessly increasing the latency of all interrupt handling
> and probably breaking some devices.

It's not that cut and dried. RT gives you priorities where you might
not have had them before, so in some cases can reduce worst-case
latency for critical devices. IRQF_DISABLED is a cheap two-level
priority scheme; RT threaded interrupts extend it.

The extra processing increases latency, yes, but that is in a sense a
scheduling fast-path problem; it's not _intrinsic_ to threaded
interrupts that they have to have higher latency (you don't have to
use the normal calculation or task switch to get equivalent
behaviour), but undoubtedly trying to optimise that leads to very
twisty code and state representations, and I don't see it happening in
Linux any time soon.

-- Jamie

2009-11-30 21:00:09

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Monday 30 November 2009, Russell King - ARM Linux wrote:
> On Mon, Nov 30, 2009 at 02:47:02PM +0000, Alan Cox wrote:
> > SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
> > then return -EINVAL.
>
> That is an impossibility. There is hardware out there (AT91) where
> the timer interrupt is shared with other peripherals, and you end
> up with a mixture of irqs-disabled and irqs-enabled handlers sharing
> the same interrupt.

For the record: AT91 isn't restricted to the system timers hooked
up on irq 0 ... there's also drivers/clocksource/tcb_clksrc.c (not
at the same hardware priority).

But to concur, this is indeed messy. Both the system timer and
the serial console generally share the same IRQ; both are very
timing-sensitive. I've seen console character dropouts after
tweaking timer IRQ handling. And I've never convinced myself
that Linux handles the hardware IRQ priority on those chips as
well as it could.

> My point is that if we outlaw irqs-disabled shared interrupts, it puts
> Atmel AT91 support into immediate difficulties.

ISTR that those TCB modules don't share IRQs with other peripherals.

Also, that Linux doesn't use them for much else. I've yet to see a
three-phase motor driver using the TCB's PWM capabilities, for example.

- Dave

2009-11-30 21:25:21

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Uwe Kleine-K?nig wrote:
> I think there is
>
> 3) you can only benefit from decent priority hardware if irqs are
> processed while irqs are enabled.
>
> I think
>
> git grep handle_fasteoi_irq
>
> gives an overview here: some hits in arch/powerpc, arch/sparc and
> arch/x86/kernel/apic/io_apic.c. (There is handle_prio_irq in

No. That handler is not an indicator for prio hardware actively used
in the sense of allowing higher prio interrupts to interrupt a current
running lower priority one. It can be used when the irq controller
does not fire the interrupt again before the eoi acknowledge has been
done.

Thanks,

tglx

2009-11-30 21:32:39

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Tue, 1 Dec 2009, Benjamin Herrenschmidt wrote:
> Well, thing is, in cases where we have a sane PIC, the PIC itself is
> perfectly good at keeping the source and anything of the same priority
> or lower masked while we handle an irq.

Unfortunately the majority of PICs does not fall into that category.

> So disabling local CPU IRQs will basically add an unnecessary blocking
> of both timer interrupts and perfmon interrupts while doing so.
>
> Yes, all driver interrupt handlers -should- be only running short amount
> of code in their handlers but you know how it is. The drift introduced
> on timer and perfmon events can be a problem, the later might even make
> it difficult to figure out what an -interrupt- is taking more time than
> it should.

The timer problem only affects the old style tick/jiffies driven
hardware where you have no continous clock source for keeping track of
time. Even x86 managed to do something about that recently :)

Are the perf events on power generally coming through the standard irq
handler code path and/or sensitive to local_irq_disable() ?

> I would suggest we timestamp the handlers in the core btw and warn if
> they take too long so we get a chance to track down the bad guys.

The hassle is to find a time which we think is appropriate as a
threshold which is of course depending on the cpu power of a
system. Also I wonder whether we'd need to make such a warning thing
aware of irq nesting.

Thanks,

tglx

2009-11-30 21:47:13

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 2009-11-30 at 22:31 +0100, Thomas Gleixner wrote:

> Are the perf events on power generally coming through the standard irq
> handler code path and/or sensitive to local_irq_disable() ?

They are in HW yes. On ppc64, we do soft-disabling, which mean that we
can still get the perf events within a local_irq_disable() region
provided we don't get another interrupt within that region that forces
us to hard disable so it would make the problem less bad I suppose.

> > I would suggest we timestamp the handlers in the core btw and warn
> if
> > they take too long so we get a chance to track down the bad guys.
>
> The hassle is to find a time which we think is appropriate as a
> threshold which is of course depending on the cpu power of a
> system. Also I wonder whether we'd need to make such a warning thing
> aware of irq nesting.

But if we always disable interrupts while running the handlers, we don't
nest right ?

Cheers,
Ben.

2009-11-30 21:55:48

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Tue, 1 Dec 2009, Benjamin Herrenschmidt wrote:
> On Mon, 2009-11-30 at 22:31 +0100, Thomas Gleixner wrote:
>
> > Are the perf events on power generally coming through the standard irq
> > handler code path and/or sensitive to local_irq_disable() ?
>
> They are in HW yes. On ppc64, we do soft-disabling, which mean that we
> can still get the perf events within a local_irq_disable() region
> provided we don't get another interrupt within that region that forces
> us to hard disable so it would make the problem less bad I suppose.
>
> > > I would suggest we timestamp the handlers in the core btw and warn
> > if
> > > they take too long so we get a chance to track down the bad guys.
> >
> > The hassle is to find a time which we think is appropriate as a
> > threshold which is of course depending on the cpu power of a
> > system. Also I wonder whether we'd need to make such a warning thing
> > aware of irq nesting.
>
> But if we always disable interrupts while running the handlers, we don't
> nest right ?

Right, in that case we do not and it's easy to instrument.

Thanks,

tglx

2009-11-30 22:01:32

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 30 Nov 2009, Alan Cox wrote:

> > However, I think we still have a number of corner cases. The SMC91x
> > driver comes to mind, with its stupidly small FIFOs, where the majority
> > of implementations have to have the packets loaded via PIO - and this
> > seems to generally happen from IRQ context.
>
> Everything 8390 based is in the same boat. It relies on being able to
> use disable_irq_nosync/enable_irq and knows all about the joys of
> interrupt bus asynchronicity internally. That does however allow it to
> get sane results by using the irq controller to mask the potentially
> shared IRQ at source.

So that would be a known candidate for IRQF_NEEDS_IRQS_ENABLED, right?

Either that or we decide to push such beasts into the threaded irq
space to keep them working until the last card hits the trashcan. I
know that this would still need to disable the interrupt on the PIC
level, but we have already mechanisms for that in the threaded code.

Thanks,

tglx

2009-11-30 23:30:21

by Alan

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

> Either that or we decide to push such beasts into the threaded irq
> space to keep them working until the last card hits the trashcan. I
> know that this would still need to disable the interrupt on the PIC
> level, but we have already mechanisms for that in the threaded code.

The 8390 is essentially a single thread device so treating interrupts as
events indicating work is to be done might make sense

Unfortunately you cannot check the interrupt flags on the chip without
switching to page 0, which will cause any parallel tx to crap itself and
potentially hang the box. It's a design from single CPU days and the
programming model is solely around 'stack the register window selected,
do stuff in irq, put it back', so any parallel execution ends in tears,
even peeking to see if the IRQ is ours.

These chips still keep popping up in old boxes although the rtl8139 seems
to have exterminated them at last in all the ultra-cheap devices.

Pushing them into threaded IRQ space with PIC masking seems to make
complete sense. The wonderously gothic IRQ magic becomes a mutex, the IRQ
handler may sleep blocking the IRQ during a transmit and the transmit
path may block during an IRQ thread execution. Reset works as a mutex and
all the crap and magic goes away.

2009-12-01 01:44:07

by Andy Walls

[permalink] [raw]

Subject: Re: Get rid of IRQF_DISABLED - (was [PATCH] genirq: warn about IRQF_SHARED|IRQF_DISABLED)

On Mon, 2009-11-30 at 12:38 -0800, David Brownell wrote:
> On Monday 30 November 2009, Alan Cox wrote:
> > SHARED|DISABLED ought to WARN_ON() and if that doesn't motivate people
> > then return -EINVAL. And with any luck that'll prove 6 months later that
> > most of the offenders are not used and we can delete them wholesale.
>
> So ... merge an updated version of the original patch, to
> get full WARN coverage?
>
> We've had that warning for a long time now. The original
> patch just covered non-request_irq() cases. So by your
> timetable we're ready for the "return -EINVAL" stage of
> the migration... at least, for request_irq() callers.

OK, I'm motivated. I haven't followed the discussion closely though.

Can someone give me a clue as to the preferred way to correct this:

/* Register IRQ */
retval = request_irq(cx->pci_dev->irq, cx18_irq_handler,
IRQF_SHARED | IRQF_DISABLED,
cx->v4l2_dev.name, (void *)cx);

?
The top half handler performs as little work as it possibly can and
schedules the long duration activites on a workqueue already.

The device is always on a plug-in PCI card.

Regards,
Andy

2010-01-12 15:42:15