Hello,
I use 2.6.23.1-rt5 on the Atmel AT91 series.
Interrupt threading on Preempt-RT and ARM works fine, except for
(edge-triggered) GPIO interrupts. There is a problem when a new
interrupt arives while the interrupt thread is handling the previous
interrupt. If this occurs the interrupt handling stalls forever.
This is caused by a unbalanced interrupt mask/unmask problem in the kernel.
The attached patch fixes this. More information about this problem is
documented inside the patch itself.
This patch is meant for Preempt-RT only.
Kind Regards,
Remy Bohmer
Attached the same patch, but it also cleans the manage.c code a bit,
because the IRQ types 'simple IRQ', 'level-IRQ' and 'FastEOI' were
handled differently while they should be handled the same.
Kind Regards,
Remy
2007/11/26, Remy Bohmer <[email protected]>:
> Hello,
>
> I use 2.6.23.1-rt5 on the Atmel AT91 series.
> Interrupt threading on Preempt-RT and ARM works fine, except for
> (edge-triggered) GPIO interrupts. There is a problem when a new
> interrupt arives while the interrupt thread is handling the previous
> interrupt. If this occurs the interrupt handling stalls forever.
>
> This is caused by a unbalanced interrupt mask/unmask problem in the kernel.
> The attached patch fixes this. More information about this problem is
> documented inside the patch itself.
>
> This patch is meant for Preempt-RT only.
>
> Kind Regards,
>
> Remy Bohmer
>
>
Thomas,
Can you ACK or NACK this patch. I know you play with a bunch of
hardware that this patch may affect.
Thanks,
-- Steve
On Mon, 2007-11-26 at 14:45 +0100, Remy Bohmer wrote:
> Attached the same patch, but it also cleans the manage.c code a bit,
> because the IRQ types 'simple IRQ', 'level-IRQ' and 'FastEOI' were
> handled differently while they should be handled the same.
>
> Kind Regards,
>
> Remy
>
> 2007/11/26, Remy Bohmer <[email protected]>:
> > Hello,
> >
> > I use 2.6.23.1-rt5 on the Atmel AT91 series.
> > Interrupt threading on Preempt-RT and ARM works fine, except for
> > (edge-triggered) GPIO interrupts. There is a problem when a new
> > interrupt arives while the interrupt thread is handling the previous
> > interrupt. If this occurs the interrupt handling stalls forever.
> >
> > This is caused by a unbalanced interrupt mask/unmask problem in the kernel.
> > The attached patch fixes this. More information about this problem is
> > documented inside the patch itself.
> >
> > This patch is meant for Preempt-RT only.
> >
> > Kind Regards,
> >
> > Remy Bohmer
> >
> >
On ARM there is a problem where the interrupt handler stalls when they are
coming faster than the kernel can handle.
The problem occurs when the routine handle_simple_irq() masks the interrupt
when an IRQ-thread is handling the interrupt at the same time. (IRQ_INPROGRESS
is set). The interrupt thread, however does **never** a
desc->chip->unmask(), so the interrupt becomes disabled forever.
IRQ_DISABLED is usually not set for this interrupt
This is in kernel/irq/chip.c, where the irq is masked when a IRQ-thread is
running:
--------------------------------------------------------------------------
void fastcall
handle_simple_irq(unsigned int irq, struct irq_desc *desc)
{
....
....
if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
IRQ_DISABLED)))) {
(!!)-> if (desc->chip->mask)
(!!)-> desc->chip->mask(irq);
desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
desc->status |= IRQ_PENDING;
goto out_unlock;
}
....
....
}
--------------------------------------------------------------------------
Masking the interrupt seems valid, because the interrupt handler thread is
still running, so it can handle the new pending interrupt. But, it has to be
umasked somewhere. The logical place is to do this in kernel/irq/manage.c,
because this situation is also handled for the thread_level_irq() and
thread_fasteoi_irq(), but not for thread_simple_irq(). This patch adds this
for these kind of interrupts also.
Signed-off-by: Remy Bohmer <[email protected]>
---
kernel/irq/manage.c | 29 +++--------------------------
1 file changed, 3 insertions(+), 26 deletions(-)
Index: linux-2.6.23/kernel/irq/manage.c
===================================================================
--- linux-2.6.23.orig/kernel/irq/manage.c 2007-11-26 13:46:58.000000000 +0100
+++ linux-2.6.23/kernel/irq/manage.c 2007-11-26 14:43:32.000000000 +0100
@@ -646,28 +646,7 @@ static void thread_simple_irq(irq_desc_t
note_interrupt(irq, desc, action_ret);
}
desc->status &= ~IRQ_INPROGRESS;
-}
-
-/*
- * threaded level type irq handler
- */
-static void thread_level_irq(irq_desc_t *desc)
-{
- unsigned int irq = desc - irq_desc;
-
- thread_simple_irq(desc);
- if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
- desc->chip->unmask(irq);
-}
-
-/*
- * threaded fasteoi type irq handler
- */
-static void thread_fasteoi_irq(irq_desc_t *desc)
-{
- unsigned int irq = desc - irq_desc;
- thread_simple_irq(desc);
if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
desc->chip->unmask(irq);
}
@@ -747,12 +726,10 @@ static void do_hardirq(struct irq_desc *
if (!(desc->status & IRQ_INPROGRESS))
goto out;
- if (desc->handle_irq == handle_simple_irq)
+ if ((desc->handle_irq == handle_simple_irq) ||
+ (desc->handle_irq == handle_level_irq) ||
+ (desc->handle_irq == handle_fasteoi_irq))
thread_simple_irq(desc);
- else if (desc->handle_irq == handle_level_irq)
- thread_level_irq(desc);
- else if (desc->handle_irq == handle_fasteoi_irq)
- thread_fasteoi_irq(desc);
else if (desc->handle_irq == handle_edge_irq)
thread_edge_irq(desc);
else
On Tue, 2007-11-27 at 10:11 -0500, Steven Rostedt wrote:
> Thomas,
>
> Can you ACK or NACK this patch. I know you play with a bunch of
> hardware that this patch may affect.
>
My two cents, I think it's needed (or something like it).. It looks like
handle_simple_irq normally expects a custom cascade type handler to
unmask on return. When you switch to a threaded handler the cascade
handler is gone and can't unmask..
Daniel
On Tue, 2007-11-27 at 07:25 -0800, Daniel Walker wrote:
> On Tue, 2007-11-27 at 10:11 -0500, Steven Rostedt wrote:
> > Thomas,
> >
> > Can you ACK or NACK this patch. I know you play with a bunch of
> > hardware that this patch may affect.
> >
>
> My two cents, I think it's needed (or something like it).. It looks like
> handle_simple_irq normally expects a custom cascade type handler to
> unmask on return. When you switch to a threaded handler the cascade
> handler is gone and can't unmask..
I take this back .. The comment at the top of handle_simple_irq() is,
* Note: The caller is expected to handle the ack, clear, mask and
* unmask issues if necessary.
So we shouldn't need any flow control unless there is some other
factors..
Additionally, we have a patch in the real time tree called
"irq-mask-fix.patch" which adds an "unmask" to handle_simple_irq, but as
the note says we don't need flow control..
Daniel
Hello Daniel,
> * Note: The caller is expected to handle the ack, clear, mask and
> * unmask issues if necessary.
> So we shouldn't need any flow control unless there is some other
> factors..
This comment can be misinterpreted, I think. Who is assumed to be the
caller in this context? The 2 other routines in the driver that
actually do the unmasking stuff besides only calling this routine? Is
it allowed to call it directly or should it always be done through a
wrapper that does all these special things?
Either way, only masking interrupts, and never unmasking it, is a bug.
If interrupts come and go slow enough you never run into this problem,
and if this type is not used often, nobody will notice it.
Usually interrupts needs clearence of the source before the hardware
can generate a new one. GPIO interrupts are different, they are
generated whenever a IO-level changes, there is no acknowledge or
clearing of the interupt needed. These types of interrupts are never
'pending' from hardware point of view. So, with these type of
interrupts, a new one can occur while the interrupt handler has not
handled the previous one yet, and therefor these interrupt-types will
show this bug.
>
> Additionally, we have a patch in the real time tree called
> "irq-mask-fix.patch" which adds an "unmask" to handle_simple_irq, but as
> the note says we don't need flow control..
You mean the Montavista real time tree?
Kind Regards,
Remy
Hello Kevin,
Just copied your mail to the list, maybe your solution is also worth looking at.
Remy
> I had a similar issue when using the chained GPIO interrupts on OMAP
> under PREEMPT_RT.
>
> I believe the chained handler itself is supposed to be doing the
> ack/unmask instead of the simple_handler.
>
> However, currently there is no way for the chained handler to know
> when the threaded handler has acutally run. So the way I fixed this
> was to have the simple handler call the chip->end hook so that the
> chained handler could do the ack/unmask after the threaded handler has
> actually run.
>
> I've been using this against the -rt patch since 2.6.21, and submitted
> and RFC a while back, but got no comments.
>
> This patch is against 2.6.24-rc2-rt1.
>
> Signed-off-by: Kevin Hilman <[email protected]>
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index b35d209..2f9e09e 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -661,6 +661,8 @@ static void thread_simple_irq(irq_desc_t *desc)
> note_interrupt(irq, desc, action_ret);
> }
> desc->status &= ~IRQ_INPROGRESS;
> + if (desc->chip->end)
> + desc->chip->end(irq);
> }
>
> /*
>
>
>
>
On Wed, 2007-11-28 at 15:38 +0100, Remy Bohmer wrote:
> Hello Daniel,
>
> > * Note: The caller is expected to handle the ack, clear, mask and
> > * unmask issues if necessary.
> > So we shouldn't need any flow control unless there is some other
> > factors..
>
> This comment can be misinterpreted, I think. Who is assumed to be the
> caller in this context? The 2 other routines in the driver that
> actually do the unmasking stuff besides only calling this routine? Is
> it allowed to call it directly or should it always be done through a
> wrapper that does all these special things?
The later I think ..
> Either way, only masking interrupts, and never unmasking it, is a bug.
> If interrupts come and go slow enough you never run into this problem,
> and if this type is not used often, nobody will notice it.
> Usually interrupts needs clearence of the source before the hardware
> can generate a new one. GPIO interrupts are different, they are
> generated whenever a IO-level changes, there is no acknowledge or
> clearing of the interupt needed. These types of interrupts are never
> 'pending' from hardware point of view. So, with these type of
> interrupts, a new one can occur while the interrupt handler has not
> handled the previous one yet, and therefor these interrupt-types will
> show this bug.
Yeah, it's clear there needs to be an unmask for this special case..
I've attached a patch which only handles the special case.. Could you
test/review it..
> >
> > Additionally, we have a patch in the real time tree called
> > "irq-mask-fix.patch" which adds an "unmask" to handle_simple_irq, but as
> > the note says we don't need flow control..
>
> You mean the Montavista real time tree?
No .. I wouldn't comment about an company specific tree. I was talking
about the broken out real time patches.
Daniel
--------
Remove the IRQ_PENDING flag if it's asserted, and unmask the irq. Also loop
around to account for the pending interrupt.
Signed-Off-By: Daniel Walker <[email protected]>
---
kernel/irq/manage.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
Index: linux-2.6.23/kernel/irq/manage.c
===================================================================
--- linux-2.6.23.orig/kernel/irq/manage.c
+++ linux-2.6.23/kernel/irq/manage.c
@@ -646,7 +646,7 @@ __setup("hardirq-preempt=", hardirq_pree
/*
* threaded simple handler
*/
-static void thread_simple_irq(irq_desc_t *desc)
+static void thread_core_irq(irq_desc_t *desc)
{
struct irqaction *action = desc->action;
unsigned int irq = desc - irq_desc;
@@ -664,13 +664,35 @@ static void thread_simple_irq(irq_desc_t
}
/*
+ * threaded fasteoi type irq handler
+ */
+static void thread_simple_irq(irq_desc_t *desc)
+{
+ unsigned int irq = desc - irq_desc;
+
+ do {
+ /*
+ * When another irq arrived while we were handling
+ * one, we could have masked the irq.
+ * Renable it, if it was not disabled in meantime.
+ */
+ if (unlikely(desc->status & IRQ_PENDING)) {
+ desc->status &= ~IRQ_PENDING;
+ desc->chip->unmask(irq);
+ }
+ thread_core_irq(desc);
+ } while ((desc->status & (IRQ_PENDING | IRQ_INPROGRESS)));
+
+}
+
+/*
* threaded level type irq handler
*/
static void thread_level_irq(irq_desc_t *desc)
{
unsigned int irq = desc - irq_desc;
- thread_simple_irq(desc);
+ thread_core_irq(desc);
if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
desc->chip->unmask(irq);
}
@@ -682,7 +704,7 @@ static void thread_fasteoi_irq(irq_desc_
{
unsigned int irq = desc - irq_desc;
- thread_simple_irq(desc);
+ thread_core_irq(desc);
if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
desc->chip->unmask(irq);
}
On Wed, Nov 28, 2007 at 03:38:11PM +0100, Remy Bohmer wrote:
> Hello Daniel,
>
> > * Note: The caller is expected to handle the ack, clear, mask and
> > * unmask issues if necessary.
> > So we shouldn't need any flow control unless there is some other
> > factors..
>
> This comment can be misinterpreted, I think. Who is assumed to be the
> caller in this context? The 2 other routines in the driver that
> actually do the unmasking stuff besides only calling this routine? Is
> it allowed to call it directly or should it always be done through a
> wrapper that does all these special things?
The whole point of this simple handler is to accomodate interrupts such
as those found on the Neponset board.
There, you have a status register in a CPLD but no enable/disable
registers. The status register tells you whether the SA1111, ethernet
or 'USAR' chip asserted its interrupt.
However, as there is no way to disable the sources, this situation has
to be handled carefully - the function decoding the interrupt source
needs to mask and unmask the _parent_ interrupt itself, and it's
exactly that which the comment is directed towards.
See neponset_irq_handler().
The simple IRQ handler is not meant for anything other than that really
simple implementation. If people have been using it with interrupts
which can be individually masked and unmasked, that's where the bug is.
They should be using one of the other handlers.
On Wed, 28 Nov 2007, Russell King - ARM Linux wrote:
> On Wed, Nov 28, 2007 at 03:38:11PM +0100, Remy Bohmer wrote:
> > Hello Daniel,
> >
> > > * Note: The caller is expected to handle the ack, clear, mask and
> > > * unmask issues if necessary.
> > > So we shouldn't need any flow control unless there is some other
> > > factors..
> >
> > This comment can be misinterpreted, I think. Who is assumed to be the
> > caller in this context? The 2 other routines in the driver that
> > actually do the unmasking stuff besides only calling this routine? Is
> > it allowed to call it directly or should it always be done through a
> > wrapper that does all these special things?
>
> The whole point of this simple handler is to accomodate interrupts such
> as those found on the Neponset board.
>
> There, you have a status register in a CPLD but no enable/disable
> registers. The status register tells you whether the SA1111, ethernet
> or 'USAR' chip asserted its interrupt.
>
> However, as there is no way to disable the sources, this situation has
> to be handled carefully - the function decoding the interrupt source
> needs to mask and unmask the _parent_ interrupt itself, and it's
> exactly that which the comment is directed towards.
>
> See neponset_irq_handler().
>
> The simple IRQ handler is not meant for anything other than that really
> simple implementation. If people have been using it with interrupts
> which can be individually masked and unmasked, that's where the bug is.
> They should be using one of the other handlers.
>
Russell,
Thanks for the reply and this nice explanation.
I'm taking this as a NACK.
Daniel or Remy, could you find the offending users and make send patches
to fix them.
-- Steve
On Wed, Nov 28, 2007 at 02:04:41PM -0500, Steven Rostedt wrote:
> Thanks for the reply and this nice explanation.
>
> I'm taking this as a NACK.
>
> Daniel or Remy, could you find the offending users and make send patches
> to fix them.
Note that I'm not acking nor nacking the patch; I'm not involved with
the RT stuff and I've never looked at the code, so I don't know what
the implications of the patch itself are.
I've merely explained the point of the simple irq handler.
Maybe the simple irq handler should never have been something that got
sucked into the generic IRQ stuff, but kept as something specific to
Neponset.
On Wed, 28 Nov 2007, Russell King - ARM Linux wrote:
> On Wed, Nov 28, 2007 at 02:04:41PM -0500, Steven Rostedt wrote:
> > Thanks for the reply and this nice explanation.
> >
> > I'm taking this as a NACK.
> >
> > Daniel or Remy, could you find the offending users and make send patches
> > to fix them.
>
> Note that I'm not acking nor nacking the patch; I'm not involved with
> the RT stuff and I've never looked at the code, so I don't know what
> the implications of the patch itself are.
Understood. But I didn't know to pull it in or not. So I used your
explanation to NACK it myself. I don't understand all the intricacies of
the arm architecture. So while Thomas is out, I'm not pulling this in.
If he comes back and gives his ACK, I'll simply NACK my NACK ;-)
>
> I've merely explained the point of the simple irq handler.
>
> Maybe the simple irq handler should never have been something that got
> sucked into the generic IRQ stuff, but kept as something specific to
> Neponset.
>
This could also simply be unique to the interrupt threads (only in RT). So
perhaps the patch is OK.
Remy, sorry about this round-a-bout. But I don't have any of the hardware
that this affects, and I'm just being cautious.
Thanks,
-- Steve
On Wed, 2007-11-28 at 15:16 -0500, Steven Rostedt wrote:
>
> This could also simply be unique to the interrupt threads (only in RT). So
> perhaps the patch is OK.
>
> Remy, sorry about this round-a-bout. But I don't have any of the hardware
> that this affects, and I'm just being cautious.
Ignoring the ARM side of things for a sec, handle_simple_irq() will
mask() the interrupt in the special case that an interrupt is already in
the processes of being handled.. handle_simple_irq() also unmasks when
it finishes handling an interrupt (something real time adds for some
reason) ..
In terms of threading the irq everything is the same except there is no
unmask() call when the thread finishes ..
Daniel
On Wed, 28 Nov 2007, Daniel Walker wrote:
>
> Ignoring the ARM side of things for a sec, handle_simple_irq() will
> mask() the interrupt in the special case that an interrupt is already in
> the processes of being handled.. handle_simple_irq() also unmasks when
> it finishes handling an interrupt (something real time adds for some
> reason) ..
>
> In terms of threading the irq everything is the same except there is no
> unmask() call when the thread finishes ..
>
OK, to be honest, I never fully understood the concept of this
"simple_irq". I figured it was because of the ARM architecture.
Your arguments seem reasonable and you are probably correct. But I didn't
write this code, nor do I understand it, and before I go ahead and change
it, I'll wait to hear input from Thomas. Hopefully, he'll be back soon.
Perhaps my confusion about the simple_irq part is from the bug you are
trying to fix. I've been confused by why it was different ;-)
-- Steve
On Wed, Nov 28, 2007 at 04:13:07PM -0500, Steven Rostedt wrote:
>
>
> On Wed, 28 Nov 2007, Daniel Walker wrote:
>
> >
> > Ignoring the ARM side of things for a sec, handle_simple_irq() will
> > mask() the interrupt in the special case that an interrupt is already in
> > the processes of being handled.. handle_simple_irq() also unmasks when
> > it finishes handling an interrupt (something real time adds for some
> > reason) ..
> >
> > In terms of threading the irq everything is the same except there is no
> > unmask() call when the thread finishes ..
> >
>
> OK, to be honest, I never fully understood the concept of this
> "simple_irq". I figured it was because of the ARM architecture.
If you read what I said compared with what Daniel said, you'll see that
adding the mask/unmask is _pointless_ because for the case where the
simple handler should be used, there is _no_ hardware masking available
except via the parent interrupt signal.
So actually Daniel's argument misses the basic point - that using
handle_simple_irq for non-simple IRQs is just WRONG.
On Wed, 2007-11-28 at 23:03 +0000, Russell King - ARM Linux wrote:
> On Wed, Nov 28, 2007 at 04:13:07PM -0500, Steven Rostedt wrote:
> >
> >
> > On Wed, 28 Nov 2007, Daniel Walker wrote:
> >
> > >
> > > Ignoring the ARM side of things for a sec, handle_simple_irq() will
> > > mask() the interrupt in the special case that an interrupt is already in
> > > the processes of being handled.. handle_simple_irq() also unmasks when
> > > it finishes handling an interrupt (something real time adds for some
> > > reason) ..
> > >
> > > In terms of threading the irq everything is the same except there is no
> > > unmask() call when the thread finishes ..
> > >
> >
> > OK, to be honest, I never fully understood the concept of this
> > "simple_irq". I figured it was because of the ARM architecture.
>
> If you read what I said compared with what Daniel said, you'll see that
> adding the mask/unmask is _pointless_ because for the case where the
> simple handler should be used, there is _no_ hardware masking available
> except via the parent interrupt signal.
>
> So actually Daniel's argument misses the basic point - that using
> handle_simple_irq for non-simple IRQs is just WRONG.
Well we've got at least two ARM boards which need the additional unmask
to work correctly with interrupt threading .. So there must be at least
two ARM boards using handle_simple_irq incorrectly .. It sounds like you
would prefer we send patches to move those handle_simple_irq users into
another method ?
Daniel
On Wed, Nov 28, 2007 at 03:19:14PM -0800, Daniel Walker wrote:
> On Wed, 2007-11-28 at 23:03 +0000, Russell King - ARM Linux wrote:
> > On Wed, Nov 28, 2007 at 04:13:07PM -0500, Steven Rostedt wrote:
> > >
> > >
> > > On Wed, 28 Nov 2007, Daniel Walker wrote:
> > >
> > > >
> > > > Ignoring the ARM side of things for a sec, handle_simple_irq() will
> > > > mask() the interrupt in the special case that an interrupt is already in
> > > > the processes of being handled.. handle_simple_irq() also unmasks when
> > > > it finishes handling an interrupt (something real time adds for some
> > > > reason) ..
> > > >
> > > > In terms of threading the irq everything is the same except there is no
> > > > unmask() call when the thread finishes ..
> > > >
> > >
> > > OK, to be honest, I never fully understood the concept of this
> > > "simple_irq". I figured it was because of the ARM architecture.
> >
> > If you read what I said compared with what Daniel said, you'll see that
> > adding the mask/unmask is _pointless_ because for the case where the
> > simple handler should be used, there is _no_ hardware masking available
> > except via the parent interrupt signal.
> >
> > So actually Daniel's argument misses the basic point - that using
> > handle_simple_irq for non-simple IRQs is just WRONG.
>
> Well we've got at least two ARM boards which need the additional unmask
> to work correctly with interrupt threading .. So there must be at least
> two ARM boards using handle_simple_irq incorrectly .. It sounds like you
> would prefer we send patches to move those handle_simple_irq users into
> another method ?
If you read my explaination of the purpose of handle_simple_irq() you'll
see that for the _correct_ case where this handler should be used, the
mask and unmask can only be empty. To repeat for the third time - it's
for the case where the hardware supplies NO way to mask the interrupt.
So adding calls to the mask/unmask methods would be pointless for the
correct use case.
If you do have the ability to mask, then you should be using the level or
edge handlers.
Hello Steven,
> Remy, sorry about this round-a-bout. But I don't have any of the hardware
> that this affects, and I'm just being cautious.
No problem, I expected this discussion when I submitted the patch. It
is logical that everybody is cautious on this subject. But still,
there is a bug here. The question is how to figure out the best
solution. (with the least chance of regression).
I do not think Russell is right here with assuming that the wrong
interrupt handler type is used. Looking at the behaviour of the
mainline kernel (non-RT), the implementation is quite different: On
mainline the handle_simple_irq() in chip.c is not re-entrant, the
masking is **only** done in case of errors, and therefor never
unmasked again, of course.
On RT the masking is done when the next interrupt arrives, while the
1st interrupt is in progress by the interrupt thread, so it is masked
under normal valid conditions, and never unmasked. So, this is where
the real bug is.
So, I see 2 solutions:
* Never mask the interrupt in the first place. (should also work for
this type of interrupt)
* Add an unmask to the interrupt, once it is handled. (I chose this
route with my patch)
It is definitely only related to interrupt threading, and thus only RT related.
Kind Regards,
Remy
On Thu, Nov 29, 2007 at 11:14:30AM +0100, Remy Bohmer wrote:
> I do not think Russell is right here with assuming that the wrong
> interrupt handler type is used. Looking at the behaviour of the
> mainline kernel (non-RT), the implementation is quite different: On
> mainline the handle_simple_irq() in chip.c is not re-entrant, the
> masking is **only** done in case of errors, and therefor never
> unmasked again, of course.
The issue comes down to a very simple question:
Are you using handle_simple_irq() with an interrupt which can be masked?
If the answer to that question is 'yes' then you're using the wrong handler.
If 'no' then it's the right handler and the mask/unmask methods associated
with the interrupt will be no-ops.
Therefore, if you have mask/unmask methods for a simple IRQ which actually
do something, clearly the first answer is the one which applies. You're
using the wrong handler. Use the level or edge handlers instead.
Hello Russell,
> If 'no' then it's the right handler and the mask/unmask methods associated
> with the interrupt will be no-ops.
I completely understand what you keep on saying, but that would imply
that the following piece of code in chip.c is completely bogus anyway!
(snip from mainline 2.6.23)
handle_simple_irq(unsigned int irq, struct irq_desc *desc)
...
if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
if (desc->chip->mask)
desc->chip->mask(irq);
...
}
Why trying to mask the interrupt if that is illegal use of the interrupt type?
This piece of code assumes that there CAN be a handler for masking
interrupts for the simple_irq type. This is in contradiction what you
are trying to explain.
If what you say is (completely) true the code is better to be:
....
if (desc->chip->mask)
BUG();
....
IMO, interrupts can still be simple_irq types if the interrupt source
does **not have to** be disabled to be able to handle them. These
(GPIO) interrupts signal an event, that is already gone when the
interrupt handler starts, they are **never** really pending, waiting
for some device driver to handle the event. The device does not really
care if the interrupt is really handled, it just generates a new
interrupt on the next event.
And yes, it is possible that these interrupts can be seperately
masked/unmasked, but it is not necessary, and therefor others chose
apparantly for the simple_irq() types.
So, I do not think that the argument of **can** masked/unmasked is
valid, but more the **need** to be masked/unmasked is valid.
I think we should not argue if this is correct use, or misuse.
Fact is now that there is a bug in RT, that this type of 'misuse' does
not work on RT and breaks the interrupt handling. There are even more
patches floating around to fix this.
According to what Daniel also mentions, there are already several
architectures that use this type of interrupt with a masking/unmasking
implementation, apparantly all without the **need** to masking them.
So, the code must prevent this type of 'misuse' (and thus not
supporting it as it is doing now) or fully support it on RT, just like
on mainline.
So, in short: IMO, only RT is broken, not mainline -> thus fix RT,
otherwise we will have regression compared to mainline.
Kind Regards,
Remy
2007/11/29, Russell King - ARM Linux <[email protected]>:
> On Thu, Nov 29, 2007 at 11:14:30AM +0100, Remy Bohmer wrote:
> > I do not think Russell is right here with assuming that the wrong
> > interrupt handler type is used. Looking at the behaviour of the
> > mainline kernel (non-RT), the implementation is quite different: On
> > mainline the handle_simple_irq() in chip.c is not re-entrant, the
> > masking is **only** done in case of errors, and therefor never
> > unmasked again, of course.
>
> The issue comes down to a very simple question:
>
> Are you using handle_simple_irq() with an interrupt which can be masked?
>
> If the answer to that question is 'yes' then you're using the wrong handler.
> If 'no' then it's the right handler and the mask/unmask methods associated
> with the interrupt will be no-ops.
>
> Therefore, if you have mask/unmask methods for a simple IRQ which actually
> do something, clearly the first answer is the one which applies. You're
> using the wrong handler. Use the level or edge handlers instead.
>
On Thu, Nov 29, 2007 at 12:27:30PM +0100, Remy Bohmer wrote:
> Hello Russell,
>
> > If 'no' then it's the right handler and the mask/unmask methods associated
> > with the interrupt will be no-ops.
>
> I completely understand what you keep on saying, but that would imply
> that the following piece of code in chip.c is completely bogus anyway!
> (snip from mainline 2.6.23)
It _is_ bogus.
> handle_simple_irq(unsigned int irq, struct irq_desc *desc)
> ...
> if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
> if (desc->chip->mask)
> desc->chip->mask(irq);
> ...
> }
>
> Why trying to mask the interrupt if that is illegal use of the interrupt
> type?
No idea, but that code is wrong. Take a look at the original code
which patch 3692/1 removed. You'll notice that it never touches
the mask/unmask handlers.
> This piece of code assumes that there CAN be a handler for masking
> interrupts for the simple_irq type. This is in contradiction what you
> are trying to explain.
Shrug. I'm explaining how _I_ designed it, and the purpose of it being
there.
If people insist on adding the mask/unmask crap to it, the function
might as well be deleted and be an alias for handle_level_IRQ. Because
that's _precisely_ what you lot are turning it into.
Ah, and looking at the changes to the file, the addition of the mask
and unmask was done by someone who didn't understand what this was
trying to do. So that change should be backed out.
On Thu, 29 Nov 2007, Russell King - ARM Linux wrote:
> On Thu, Nov 29, 2007 at 12:27:30PM +0100, Remy Bohmer wrote:
>
> Ah, and looking at the changes to the file, the addition of the mask
> and unmask was done by someone who didn't understand what this was
> trying to do. So that change should be backed out.
>
Perhaps only part of the change should be backed out. The part that masks
the irq in the handle_simple_irq code.
That's from commit 76d2160147f43f982dfe881404cfde9fd0a9da21 which is to
not disable an irq line when disable_irq is called. A form of lazy
disable irq.
This speeds up code that uses disable_irq, since the line is only masked
when an interrupt actually arrives. Using disable_irq / enable_irq does no
IRQ chip modifications if an interrupt from the IRQ line does not arrive
between the two.
Now the question is, can something that uses handle_simple_irq call
disable_irq? If there is no mask function, I would assume that this would
be a noop in such a case. If this is true, then we could remove the mask
from handle_simple_irq. But then we might want to add a BUG() in
disable_irq for simple_irqs.
-- Steve
Hello Russell,
> If people insist on adding the mask/unmask crap to it, the function
> might as well be deleted and be an alias for handle_level_IRQ. Because
> that's _precisely_ what you lot are turning it into.
First, I want to make clear that I am just debugging a problem on RT
that does not exist on mainline, and I am trying to find a way to get
it solved nicely _on RT_, and preferable in a way that it works for
everybody with the least chance for regression.
I already mentioned that RT is doing masking in this code during
normal use, where the mainline kernel does not do this, **except** in
an error situation.
I also mentioned that there are 2 ways of solving it on RT:
1) do not do masking at all, (just as on mainline), and only mask it
when there is an error (just as on mainline)
2) Fix it by adding an unmask, which I proposed in my first patch, and
which others also did in their patches. (not knowing your opinion, as
I do know)
Still, I believe, that the fact if a interrupt **can** be masked is
not a reason to forbid the simple_irq type(), and surely does not make
it automatically a level type interrupt.
The interrupt type I talk about is actually edge triggered (the
interrupt triggers on _BOTH_ edges of the input-line), but there is no
way of 'acknowledging' the interrupt, which makes the edge type
handler unsuitable, and much too heavy.
As mentioned, this type of irq is never pending, and unmasking it will
never lead to get a interrupt request immediately; the interrupt that
occurs during the masked time, is just lost.
So, as far as I can tell , the type really used on at91 for the GPIO
stuff _is_ a simple_irq as you describe, but one that can be
masked/unmasked **in case of errors**. It should never be masked
during normal use.
So, I propose option 1 to solve it on RT, and thus to trigger Steven
to NACK my first patch. I will try and see if I can make it work
_without_ masking on RT (except in case of errors, just as in
mainline).
...and probably add some clear description about the purpose of
simple_irq, especially related to masking...
Do you agree on this Russell?
> Ah, and looking at the changes to the file, the addition of the mask
> and unmask was done by someone who didn't understand what this was
> trying to do. So that change should be backed out.
Maybe he was trying to mask the irq during an error situation?
Kind regards,
Remy
On Thu, Nov 29, 2007 at 03:18:04PM +0100, Remy Bohmer wrote:
> Hello Russell,
>
> > If people insist on adding the mask/unmask crap to it, the function
> > might as well be deleted and be an alias for handle_level_IRQ. Because
> > that's _precisely_ what you lot are turning it into.
>
> First, I want to make clear that I am just debugging a problem on RT
> that does not exist on mainline, and I am trying to find a way to get
> it solved nicely _on RT_, and preferable in a way that it works for
> everybody with the least chance for regression.
While I realise that, I'm telling you that the _problem_ is being
caused by the wrong handler being used.
> I already mentioned that RT is doing masking in this code during
> normal use, where the mainline kernel does not do this, **except** in
> an error situation.
AT91 has edge based GPIO interrupts which need special handling so
edges aren't missed. That's what the edge handler is there for - to
ensure that edges aren't missed while the interrupt is soft-disabled.
SA1100 and PXA have exactly the same setup. They use the correct
handler. Why is AT91 special?
> So, as far as I can tell , the type really used on at91 for the GPIO
> stuff _is_ a simple_irq as you describe, but one that can be
> masked/unmasked **in case of errors**. It should never be masked
> during normal use.
>
> So, I propose option 1 to solve it on RT, and thus to trigger Steven
> to NACK my first patch. I will try and see if I can make it work
> _without_ masking on RT (except in case of errors, just as in
> mainline).
> ...and probably add some clear description about the purpose of
> simple_irq, especially related to masking...
>
> Do you agree on this Russell?
Clearly no, I disagree.
> > Ah, and looking at the changes to the file, the addition of the mask
> > and unmask was done by someone who didn't understand what this was
> > trying to do. So that change should be backed out.
>
> Maybe he was trying to mask the irq during an error situation?
Who knows. The patch says nothing about that change. The change was
never copied to me, so I've never reviewed it (and had it been, I
would've indicated that the change was wrong.)
Hello Russell,
> While I realise that, I'm telling you that the _problem_ is being
> caused by the wrong handler being used.
Okay, I believe you... (someone told me once, Russell is right, and if
you do not believe him, he is still right ;-)
> SA1100 and PXA have exactly the same setup. They use the correct
> handler. Why is AT91 special?
This remark is what convinced me :-))
I changed the interrupt handler from the simple_irq to the edge_irq,
and it works...!!
(I added a noop routine for that .ack part, because there is no ack)
I believe I was too focussed on the masking bug in the RT kernel on
the simple_irq() that I did not see that for the AT91 series the edge
type interrupt handler also works... (even better...) What I thought
was 1 single bug in the RT-kernel turned out to be a number of things
together that aren't correct, even for mainline.
So, to come to a conclusion: The masking bug in RT is still there in
the simple_irq path, and masking has to be removed from the simple_irq
code. Also for mainline. AT91 can live without simple_irq.
I think we are in sync again...
I will post a patch for the AT91 later on, after some more testing.
Kind Regards,
Remy
Hello All,
I tested some more with the edge_triggered interrupt handler on AT91,
and I had already a long time a problem with the AT91SAM9261-EK kit,
that the DM9000 Ethernet controller did not work _at all_ on RT. I
just tried if the edge triggered interrupt handler works on that board
also. And now that board also works properly on RT. (my first patch
did not solve this problem either, so apparantly that one still misses
interrupts.)
This proves again that the simple_irq usage on AT91 is just crap...
Russell, Thanks again for the latest (and greatest) hint. This saves
me a lot of debug time on that board.
Kind Regards,
Remy
2007/11/29, Remy Bohmer <[email protected]>:
> Hello Russell,
>
> > While I realise that, I'm telling you that the _problem_ is being
> > caused by the wrong handler being used.
>
> Okay, I believe you... (someone told me once, Russell is right, and if
> you do not believe him, he is still right ;-)
>
> > SA1100 and PXA have exactly the same setup. They use the correct
> > handler. Why is AT91 special?
>
> This remark is what convinced me :-))
>
> I changed the interrupt handler from the simple_irq to the edge_irq,
> and it works...!!
> (I added a noop routine for that .ack part, because there is no ack)
>
> I believe I was too focussed on the masking bug in the RT kernel on
> the simple_irq() that I did not see that for the AT91 series the edge
> type interrupt handler also works... (even better...) What I thought
> was 1 single bug in the RT-kernel turned out to be a number of things
> together that aren't correct, even for mainline.
>
> So, to come to a conclusion: The masking bug in RT is still there in
> the simple_irq path, and masking has to be removed from the simple_irq
> code. Also for mainline. AT91 can live without simple_irq.
> I think we are in sync again...
>
> I will post a patch for the AT91 later on, after some more testing.
>
>
> Kind Regards,
>
> Remy
>
On Thu, 29 Nov 2007, Remy Bohmer wrote:
> I changed the interrupt handler from the simple_irq to the edge_irq,
> and it works...!!
> (I added a noop routine for that .ack part, because there is no ack)
>
> I believe I was too focussed on the masking bug in the RT kernel on
> the simple_irq() that I did not see that for the AT91 series the edge
> type interrupt handler also works... (even better...) What I thought
> was 1 single bug in the RT-kernel turned out to be a number of things
> together that aren't correct, even for mainline.
>
> So, to come to a conclusion: The masking bug in RT is still there in
> the simple_irq path, and masking has to be removed from the simple_irq
> code. Also for mainline. AT91 can live without simple_irq.
> I think we are in sync again...
>
> I will post a patch for the AT91 later on, after some more testing.
Remy,
Thanks a lot for figuring this out!! Makes me feel better for the NACK ;-)
If you also want to send me a patch to remove the masking in the
simple_irq code, please do.
Thanks,
-- Steve
On Thu, 29 Nov 2007, Russell King - ARM Linux wrote:
> If people insist on adding the mask/unmask crap to it, the function
> might as well be deleted and be an alias for handle_level_IRQ. Because
> that's _precisely_ what you lot are turning it into.
>
> Ah, and looking at the changes to the file, the addition of the mask
> and unmask was done by someone who didn't understand what this was
> trying to do. So that change should be backed out.
It looks like a mechanical "fix them all" thingy. I never intended to
use the simple handler for anything else than a fast demux handler.
You are right, it needs to be backed out.
Thanks,
tglx
In commit 76d2160147f43f982dfe881404cfde9fd0a9da21 lazy irq disabling
was implemented, and the simple irq handler had a masking set to it.
Remy Bohmer discovered that some devices in the ARM architecture
would trigger the mask, but never unmask it. His patch to do the
unmasking was questioned by Russell King about masking simple irqs
to begin with. Looking further, it was discovered that the problems
Remy was seeing was due to improper use of the simple handler by
devices, and he later submitted patches to fix those. But the issue
that was uncovered was that the simple handler should never mask.
This patch reverts the masking in the simple handler.
[Note: This version is for the RT patch, and the IRQ_PENDING is needed
for threaded IRQs]
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/irq/chip.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
Index: linux-2.6.23.9-rt13/kernel/irq/chip.c
===================================================================
--- linux-2.6.23.9-rt13.orig/kernel/irq/chip.c
+++ linux-2.6.23.9-rt13/kernel/irq/chip.c
@@ -302,10 +302,9 @@ handle_simple_irq(unsigned int irq, stru
action = desc->action;
if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
IRQ_DISABLED)))) {
- if (desc->chip->mask)
- desc->chip->mask(irq);
desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
- desc->status |= IRQ_PENDING;
+ if (action && (desc->status & IRQ_INPROGRESS))
+ desc->status |= IRQ_PENDING;
goto out_unlock;
}
On Wed, Dec 12, 2007 at 02:40:09PM -0500, Steven Rostedt wrote:
>
> In commit 76d2160147f43f982dfe881404cfde9fd0a9da21 lazy irq disabling
> was implemented, and the simple irq handler had a masking set to it.
>
> Remy Bohmer discovered that some devices in the ARM architecture
> would trigger the mask, but never unmask it. His patch to do the
> unmasking was questioned by Russell King about masking simple irqs
> to begin with. Looking further, it was discovered that the problems
> Remy was seeing was due to improper use of the simple handler by
> devices, and he later submitted patches to fix those. But the issue
> that was uncovered was that the simple handler should never mask.
>
> This patch reverts the masking in the simple handler.
>
> [Note: This version is for the RT patch, and the IRQ_PENDING is needed
> for threaded IRQs]
>
> Signed-off-by: Steven Rostedt <[email protected]>
Acked-by: Russell King <[email protected]>
Thanks for submitting this.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
Hello Steven,
>
> In commit 76d2160147f43f982dfe881404cfde9fd0a9da21 lazy irq disabling
> was implemented, and the simple irq handler had a masking set to it.
>
> Remy Bohmer discovered that some devices in the ARM architecture
> would trigger the mask, but never unmask it. His patch to do the
> unmasking was questioned by Russell King about masking simple irqs
> to begin with. Looking further, it was discovered that the problems
> Remy was seeing was due to improper use of the simple handler by
> devices, and he later submitted patches to fix those. But the issue
> that was uncovered was that the simple handler should never mask.
>
> This patch reverts the masking in the simple handler.
Also:
Acked-by: Remy Bohmer <[email protected]>
Thanks for the effort also, I still had it on my todo list, but that
is needed anymore...
Remy
On Wed, 12 Dec 2007, Remy Bohmer wrote:
> Also:
> Acked-by: Remy Bohmer <[email protected]>
>
> Thanks for the effort also, I still had it on my todo list, but that
> is needed anymore...
No problem. Could you also ACK the one I sent for mainline.
Thanks,
-- Steve
> No problem. Could you also ACK the one I sent for mainline.
I will test it first tomorrow morning.
Remy