2013-03-07 13:53:48

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

Till reported that he spurious interrupt detection of threaded
interrupts is broken in two ways:

- note_interrupt() is called for each action thread of a shared
interrupt line. That's wrong as we are only interested whether none
of the device drivers felt responsible for the interrupt, but by
calling multiple times for a single interrupt line we account
IRQ_NONE even if one of the drivers felt responsible.

- note_interrupt() when called from the thread handler is not
serialized. That leaves the members of irq_desc which are used for
the spurious detection unprotected.

To solve this we need to defer the spurious detection of a threaded
interrupt to the next hardware interrupt context where we have
implicit serialization.

If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
check whether the previous interrupt requested a deferred check. If
not, we request a deferred check for the next hardware interrupt and
return.

If set, we check whether one of the interrupt threads signaled
success. Depending on this information we feed the result into the
spurious detector.

If one primary handler of a shared interrupt returns IRQ_HANDLED we
disable the deferred check of irq threads on the same line, as we have
found at least one device driver who cared.

Reported-by: Till Straumann <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
include/linux/irqdesc.h | 4 +++
kernel/irq/internals.h | 4 +++
kernel/irq/manage.c | 4 +--
kernel/irq/spurious.c | 57 ++++++++++++++++++++++++++++++++++++++++++++----
4 files changed, 63 insertions(+), 6 deletions(-)

Index: linux-2.6/include/linux/irqdesc.h
===================================================================
--- linux-2.6.orig/include/linux/irqdesc.h
+++ linux-2.6/include/linux/irqdesc.h
@@ -27,6 +27,8 @@ struct irq_desc;
* @irq_count: stats field to detect stalled irqs
* @last_unhandled: aging timer for unhandled count
* @irqs_unhandled: stats field for spurious unhandled interrupts
+ * @threads_handled: stats field for deferred spurious detection of threaded handlers
+ * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
* @lock: locking for SMP
* @affinity_hint: hint to user space for preferred irq affinity
* @affinity_notify: context for notification of affinity changes
@@ -52,6 +54,8 @@ struct irq_desc {
unsigned int irq_count; /* For detecting broken IRQs */
unsigned long last_unhandled; /* Aging timer for unhandled count */
unsigned int irqs_unhandled;
+ atomic_t threads_handled;
+ int threads_handled_last;
raw_spinlock_t lock;
struct cpumask *percpu_enabled;
#ifdef CONFIG_SMP
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -55,6 +55,10 @@ enum {
IRQS_SUSPENDED = 0x00000800,
};

+/* Bits for spurious detection of threaded handlers */
+#define SPURIOUS_MASK 0x7fffffff
+#define SPURIOUS_DEFERRED 0x80000000
+
#include "debug.h"
#include "settings.h"

Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -867,8 +867,8 @@ static int irq_thread(void *data)
irq_thread_check_affinity(desc, action);

action_ret = handler_fn(desc, action);
- if (!noirqdebug)
- note_interrupt(action->irq, desc, action_ret);
+ if (!noirqdebug && action_ret == IRQ_HANDLED)
+ atomic_inc(&desc->threads_handled);

wake_threads_waitq(desc);
}
Index: linux-2.6/kernel/irq/spurious.c
===================================================================
--- linux-2.6.orig/kernel/irq/spurious.c
+++ linux-2.6/kernel/irq/spurious.c
@@ -271,15 +271,64 @@ void note_interrupt(unsigned int irq, st
if (desc->istate & IRQS_POLL_INPROGRESS)
return;

- /* we get here again via the threaded handler */
- if (action_ret == IRQ_WAKE_THREAD)
- return;
-
if (bad_action_ret(action_ret)) {
report_bad_irq(irq, desc, action_ret);
return;
}

+ /*
+ * We cannot call note_interrupt from the threaded handler. So
+ * the threaded handlers store whether they sucessfully
+ * handled an interrupt. Here is the right place to take care
+ * of this.
+ */
+ if (action_ret & IRQ_WAKE_THREAD) {
+ /*
+ * There is a thread woken. Check whether one of the
+ * shared primary handlers returned IRQ_HANDLED. If
+ * not we defer the spurious detection to the next
+ * interrupt.
+ */
+ if (action_ret == IRQ_WAKE_THREAD) {
+ int handled;
+
+ /*
+ * We use bit 31 of thread_handled_last to
+ * note the deferred spurious detection
+ * active. No locking necessary as
+ * thread_handled_last is only accessed here
+ * and we have the guarantee that hard
+ * interrupts are not reentrant.
+ */
+ if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
+ desc->threads_handled_last |= SPURIOUS_DEFERRED;
+ return;
+ }
+ /*
+ * Check whether one of the threaded handlers
+ * returned IRQ_HANDLED since the last
+ * interrupt happened. Or the spurious
+ * deferred flag for ease of comparison and
+ * preservation in the writeback.
+ */
+ handled = atomic_read(&desc->threads_handled);
+ handled |= SPURIOUS_DEFERRED;
+ if (handled != desc->threads_handled_last) {
+ action_ret = IRQ_HANDLED;
+ desc->threads_handled_last = handled;
+ } else {
+ action_ret = IRQ_NONE;
+ }
+ } else {
+ /*
+ * One of the primary handlers returned
+ * IRQ_HANDLED. So we don't care about the
+ * threaded handlers on the same line.
+ */
+ desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
+ }
+ }
+
if (unlikely(action_ret == IRQ_NONE)) {
/*
* If we are seeing only the odd spurious IRQ caused by


2013-03-08 15:05:43

by Till Straumann

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

Thomas.

Thanks for your quick reply. I had a look at the patch and I have a few
comments.

1) I'm not sure adding the SPURIOUS_DEFERRED flag into
threads_handled_last is OK
- what happens if the atomic_t counter can hold more than 31 bits?
In this case,
when thread handlers increment the counter there is interference
with the flag.
If this is not harmful then it is at least ugly.

I'm not as familiar with the code as you are but wouldn't it be
simpler to always
defer spurious detection thus avoiding to have to keep track of the
state (deferral
active/inactive)? I.e., if any primary handler returns IRQ_HANDLED then
we simply increment the counter. note_interrupt() could then always
compare the
previous count to the current count and if they are equal conclude
that the interrupt
was not handled:

handle_irq_event_percpu()
{
...
if (!noirqdebug)
note_interrupt(irq, desc, retval);

if ( (retval & IRQ_HANDLED) )
atomic_inc(&desc->threads_handled);
}

and in 'note_interrupt()'

handled = atomic_read(&desc->threads_handled);
if ( desc->threads_handled_last == handled ) {
action_ret = IRQ_NONE;
} else {
action_ret = IRQ_HANDLED;
desc->threads_handled_last = handled;
}

Either way - I'm not sure what deferral does to the part of the
algorithm
in note_interrupt() which deals with misrouted interrupts since the
'action_ret' that goes into try_misrouted_irq() is delayed by one
interrupt
cycle.


2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and
I believe
this routine would also need to increment the 'threads_handled'
counter rather
than calling note_interrupt.

Cheers
- Till

On 03/07/2013 02:53 PM, Thomas Gleixner wrote:
> Till reported that he spurious interrupt detection of threaded
> interrupts is broken in two ways:
>
> - note_interrupt() is called for each action thread of a shared
> interrupt line. That's wrong as we are only interested whether none
> of the device drivers felt responsible for the interrupt, but by
> calling multiple times for a single interrupt line we account
> IRQ_NONE even if one of the drivers felt responsible.
>
> - note_interrupt() when called from the thread handler is not
> serialized. That leaves the members of irq_desc which are used for
> the spurious detection unprotected.
>
> To solve this we need to defer the spurious detection of a threaded
> interrupt to the next hardware interrupt context where we have
> implicit serialization.
>
> If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
> check whether the previous interrupt requested a deferred check. If
> not, we request a deferred check for the next hardware interrupt and
> return.
>
> If set, we check whether one of the interrupt threads signaled
> success. Depending on this information we feed the result into the
> spurious detector.
>
> If one primary handler of a shared interrupt returns IRQ_HANDLED we
> disable the deferred check of irq threads on the same line, as we have
> found at least one device driver who cared.
>
> Reported-by: Till Straumann <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> include/linux/irqdesc.h | 4 +++
> kernel/irq/internals.h | 4 +++
> kernel/irq/manage.c | 4 +--
> kernel/irq/spurious.c | 57 ++++++++++++++++++++++++++++++++++++++++++++----
> 4 files changed, 63 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/include/linux/irqdesc.h
> ===================================================================
> --- linux-2.6.orig/include/linux/irqdesc.h
> +++ linux-2.6/include/linux/irqdesc.h
> @@ -27,6 +27,8 @@ struct irq_desc;
> * @irq_count: stats field to detect stalled irqs
> * @last_unhandled: aging timer for unhandled count
> * @irqs_unhandled: stats field for spurious unhandled interrupts
> + * @threads_handled: stats field for deferred spurious detection of threaded handlers
> + * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
> * @lock: locking for SMP
> * @affinity_hint: hint to user space for preferred irq affinity
> * @affinity_notify: context for notification of affinity changes
> @@ -52,6 +54,8 @@ struct irq_desc {
> unsigned int irq_count; /* For detecting broken IRQs */
> unsigned long last_unhandled; /* Aging timer for unhandled count */
> unsigned int irqs_unhandled;
> + atomic_t threads_handled;
> + int threads_handled_last;
> raw_spinlock_t lock;
> struct cpumask *percpu_enabled;
> #ifdef CONFIG_SMP
> Index: linux-2.6/kernel/irq/internals.h
> ===================================================================
> --- linux-2.6.orig/kernel/irq/internals.h
> +++ linux-2.6/kernel/irq/internals.h
> @@ -55,6 +55,10 @@ enum {
> IRQS_SUSPENDED = 0x00000800,
> };
>
> +/* Bits for spurious detection of threaded handlers */
> +#define SPURIOUS_MASK 0x7fffffff
> +#define SPURIOUS_DEFERRED 0x80000000
> +
> #include "debug.h"
> #include "settings.h"
>
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -867,8 +867,8 @@ static int irq_thread(void *data)
> irq_thread_check_affinity(desc, action);
>
> action_ret = handler_fn(desc, action);
> - if (!noirqdebug)
> - note_interrupt(action->irq, desc, action_ret);
> + if (!noirqdebug && action_ret == IRQ_HANDLED)
> + atomic_inc(&desc->threads_handled);
>
> wake_threads_waitq(desc);
> }
> Index: linux-2.6/kernel/irq/spurious.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/spurious.c
> +++ linux-2.6/kernel/irq/spurious.c
> @@ -271,15 +271,64 @@ void note_interrupt(unsigned int irq, st
> if (desc->istate & IRQS_POLL_INPROGRESS)
> return;
>
> - /* we get here again via the threaded handler */
> - if (action_ret == IRQ_WAKE_THREAD)
> - return;
> -
> if (bad_action_ret(action_ret)) {
> report_bad_irq(irq, desc, action_ret);
> return;
> }
>
> + /*
> + * We cannot call note_interrupt from the threaded handler. So
> + * the threaded handlers store whether they sucessfully
> + * handled an interrupt. Here is the right place to take care
> + * of this.
> + */
> + if (action_ret & IRQ_WAKE_THREAD) {
> + /*
> + * There is a thread woken. Check whether one of the
> + * shared primary handlers returned IRQ_HANDLED. If
> + * not we defer the spurious detection to the next
> + * interrupt.
> + */
> + if (action_ret == IRQ_WAKE_THREAD) {
> + int handled;
> +
> + /*
> + * We use bit 31 of thread_handled_last to
> + * note the deferred spurious detection
> + * active. No locking necessary as
> + * thread_handled_last is only accessed here
> + * and we have the guarantee that hard
> + * interrupts are not reentrant.
> + */
> + if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
> + desc->threads_handled_last |= SPURIOUS_DEFERRED;
> + return;
> + }
> + /*
> + * Check whether one of the threaded handlers
> + * returned IRQ_HANDLED since the last
> + * interrupt happened. Or the spurious
> + * deferred flag for ease of comparison and
> + * preservation in the writeback.
> + */
> + handled = atomic_read(&desc->threads_handled);
> + handled |= SPURIOUS_DEFERRED;
> + if (handled != desc->threads_handled_last) {
> + action_ret = IRQ_HANDLED;
> + desc->threads_handled_last = handled;
> + } else {
> + action_ret = IRQ_NONE;
> + }
> + } else {
> + /*
> + * One of the primary handlers returned
> + * IRQ_HANDLED. So we don't care about the
> + * threaded handlers on the same line.
> + */
> + desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
> + }
> + }
> +
> if (unlikely(action_ret == IRQ_NONE)) {
> /*
> * If we are seeing only the odd spurious IRQ caused by

2013-03-08 16:12:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Fri, 8 Mar 2013, Till Straumann wrote:

> 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
> threads_handled_last is OK - what happens if the atomic_t counter
> can hold more than 31 bits? In this case, when thread handlers
> increment the counter there is interference with the flag. If
> this is not harmful then it is at least ugly.

atomic_t is going to stay 32 bit otherwise we'll have more horrible
problems than that one.

> I'm not as familiar with the code as you are but wouldn't it be
> simpler to always defer spurious detection thus avoiding to have to
> keep track of the state (deferral active/inactive)? I.e., if any
> primary handler returns IRQ_HANDLED then we simply increment the
> counter. note_interrupt() could then always compare the previous
> count to the current count and if they are equal conclude that the
> interrupt was not handled:

Yeah, we could do it that way. Would probably be simpler.

> handle_irq_event_percpu()
> {
> ...
> if (!noirqdebug)
> note_interrupt(irq, desc, retval);
>
> if ( (retval & IRQ_HANDLED) )
> atomic_inc(&desc->threads_handled);
> }
>
> and in 'note_interrupt()'
>
> handled = atomic_read(&desc->threads_handled);
> if ( desc->threads_handled_last == handled ) {
> action_ret = IRQ_NONE;
> } else {
> action_ret = IRQ_HANDLED;
> desc->threads_handled_last = handled;
> }
>
> Either way - I'm not sure what deferral does to the part of the algorithm
> in note_interrupt() which deals with misrouted interrupts since the
> 'action_ret' that goes into try_misrouted_irq() is delayed by one interrupt
> cycle.

That should not matter much methinks, but I'll try what explodes on
one of my affected machines.

>
> 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
> believe
> this routine would also need to increment the 'threads_handled' counter
> rather
> than calling note_interrupt.

That's a different issue. The nested_irq handler is for interrupts
which are demultiplexed by a primary threaded handler. That interrupt
is never handled in hard interrupt context. It's always called from
the context of the demultiplxing thread.

Thanks,

tglx

2013-03-08 17:19:09

by Till Straumann

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On 03/08/2013 05:12 PM, Thomas Gleixner wrote:
> On Fri, 8 Mar 2013, Till Straumann wrote:
>
>> 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
>> threads_handled_last is OK - what happens if the atomic_t counter
>> can hold more than 31 bits? In this case, when thread handlers
>> increment the counter there is interference with the flag. If
>> this is not harmful then it is at least ugly.
> atomic_t is going to stay 32 bit otherwise we'll have more horrible
> problems than that one.
I know. But this means that when the counter overflows 31 bits (2^31 - 1)
then it spills into the SPURIOUS_DEFERRED flag, right?
>
>> I'm not as familiar with the code as you are but wouldn't it be
>> simpler to always defer spurious detection thus avoiding to have to
>> keep track of the state (deferral active/inactive)? I.e., if any
>> primary handler returns IRQ_HANDLED then we simply increment the
>> counter. note_interrupt() could then always compare the previous
>> count to the current count and if they are equal conclude that the
>> interrupt was not handled:
> Yeah, we could do it that way. Would probably be simpler.
>
>> handle_irq_event_percpu()
>> {
>> ...
>> if (!noirqdebug)
>> note_interrupt(irq, desc, retval);
>>
>> if ( (retval & IRQ_HANDLED) )
>> atomic_inc(&desc->threads_handled);
>> }
>>
>> and in 'note_interrupt()'
>>
>> handled = atomic_read(&desc->threads_handled);
>> if ( desc->threads_handled_last == handled ) {
>> action_ret = IRQ_NONE;
>> } else {
>> action_ret = IRQ_HANDLED;
>> desc->threads_handled_last = handled;
>> }
>>
>> Either way - I'm not sure what deferral does to the part of the algorithm
>> in note_interrupt() which deals with misrouted interrupts since the
>> 'action_ret' that goes into try_misrouted_irq() is delayed by one interrupt
>> cycle.
> That should not matter much methinks, but I'll try what explodes on
> one of my affected machines.
>
>> 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
>> believe
>> this routine would also need to increment the 'threads_handled' counter
>> rather
>> than calling note_interrupt.
> That's a different issue. The nested_irq handler is for interrupts
> which are demultiplexed by a primary threaded handler. That interrupt
> is never handled in hard interrupt context. It's always called from
> the context of the demultiplxing thread.
So you are saying that there 'handle_nested_irq()' can never be executed
from more than one thread for a single interrupt?

I find, however, that e.g., the gpio-sx150x.c driver calls

request_threaded_irq() with IRQF_SHARED set and it's thread_fn does call
handle_nested_irq(). It would thus be possible that multiple drivers
could share an interrupt and each driver would call handle_nested_irq()
which in-turn executes note_interrupt(). This would again raise the
issues we already discussed (note_interrupt() not serialized and thinking
that an interrupt was not handled because it was handled by a different
thread).

Probably I'm missing something regarding the use of nested interrupts
- I would really appreciate if you could help me understand why
it should be OK for handle_nested_irq() to call note_interrupt().

Cheers
- Till
>
> Thanks,
>
> tglx
>
>

2013-03-08 19:41:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Fri, 8 Mar 2013, Till Straumann wrote:
> On 03/08/2013 05:12 PM, Thomas Gleixner wrote:
> > On Fri, 8 Mar 2013, Till Straumann wrote:
> >
> > > 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
> > > threads_handled_last is OK - what happens if the atomic_t counter
> > > can hold more than 31 bits? In this case, when thread handlers
> > > increment the counter there is interference with the flag. If
> > > this is not harmful then it is at least ugly.
> > atomic_t is going to stay 32 bit otherwise we'll have more horrible
> > problems than that one.
> I know. But this means that when the counter overflows 31 bits (2^31 - 1)
> then it spills into the SPURIOUS_DEFERRED flag, right?

Gah, yes. /me should stop doing overoptimizations :)

> > > 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
> > > believe
> > > this routine would also need to increment the 'threads_handled'
> > > counter
> > > rather
> > > than calling note_interrupt.
> > That's a different issue. The nested_irq handler is for interrupts
> > which are demultiplexed by a primary threaded handler. That interrupt
> > is never handled in hard interrupt context. It's always called from
> > the context of the demultiplxing thread.
> So you are saying that there 'handle_nested_irq()' can never be executed
> from more than one thread for a single interrupt?
>
> I find, however, that e.g., the gpio-sx150x.c driver calls
>
> request_threaded_irq() with IRQF_SHARED set and it's thread_fn does call
> handle_nested_irq(). It would thus be possible that multiple drivers
> could share an interrupt and each driver would call handle_nested_irq()
> which in-turn executes note_interrupt(). This would again raise the
> issues we already discussed (note_interrupt() not serialized and thinking
> that an interrupt was not handled because it was handled by a different
> thread).
>
> Probably I'm missing something regarding the use of nested interrupts
> - I would really appreciate if you could help me understand why
> it should be OK for handle_nested_irq() to call note_interrupt().

The thing about nested irqs is:

main irq is threaded (requested by the driver for stuff like i2c)

The handler of this irq reads a pending irq register in the chip and
then invokes handle_nested_irq() for each of the pending bits.

Those interrupts cannot be shared even if the driver request them as
shared:

irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
raw_spin_unlock_irq(&desc->lock);

action_ret = action->thread_fn(action->irq, action->dev_id);
if (!noirqdebug)
note_interrupt(irq, desc, action_ret);

raw_spin_lock_irq(&desc->lock);
irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);

So there is no loop over action->next. And even if that code would
loop over action next, then it still would be serialized

main irq is raised

-> wake thread

thread runs

read pending reg()

for each pending bit {

handle_nested_irq();
action = desc->action;

while (action) {
action->thread_fn()
action = action->next)
}
note_interrupt();
}

thread done

Hope that helps. Thanks,

tglx




2013-03-12 13:22:43

by Till Straumann

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

OK. Thanks for the explanation.

However - even if not strictly necessary - wouldn't it
be simpler and the code easier to understand if
spurious interrupt detection just always would use the
deferred algorithm?
- after handling by whatever scheme (hard, threaded, nested)
the counter is incremented
- the next hard IRQ calls note_interrupt() which checks
if the counter has changed since last time.

Just my 2 cents...

Thanks again
- Till

On 03/08/2013 08:41 PM, Thomas Gleixner wrote:
> On Fri, 8 Mar 2013, Till Straumann wrote:
>> On 03/08/2013 05:12 PM, Thomas Gleixner wrote:
>>> On Fri, 8 Mar 2013, Till Straumann wrote:
>>>
>>>> 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
>>>> threads_handled_last is OK - what happens if the atomic_t counter
>>>> can hold more than 31 bits? In this case, when thread handlers
>>>> increment the counter there is interference with the flag. If
>>>> this is not harmful then it is at least ugly.
>>> atomic_t is going to stay 32 bit otherwise we'll have more horrible
>>> problems than that one.
>> I know. But this means that when the counter overflows 31 bits (2^31 - 1)
>> then it spills into the SPURIOUS_DEFERRED flag, right?
> Gah, yes. /me should stop doing overoptimizations :)
>
>>>> 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
>>>> believe
>>>> this routine would also need to increment the 'threads_handled'
>>>> counter
>>>> rather
>>>> than calling note_interrupt.
>>> That's a different issue. The nested_irq handler is for interrupts
>>> which are demultiplexed by a primary threaded handler. That interrupt
>>> is never handled in hard interrupt context. It's always called from
>>> the context of the demultiplxing thread.
>> So you are saying that there 'handle_nested_irq()' can never be executed
>> from more than one thread for a single interrupt?
>>
>> I find, however, that e.g., the gpio-sx150x.c driver calls
>>
>> request_threaded_irq() with IRQF_SHARED set and it's thread_fn does call
>> handle_nested_irq(). It would thus be possible that multiple drivers
>> could share an interrupt and each driver would call handle_nested_irq()
>> which in-turn executes note_interrupt(). This would again raise the
>> issues we already discussed (note_interrupt() not serialized and thinking
>> that an interrupt was not handled because it was handled by a different
>> thread).
>>
>> Probably I'm missing something regarding the use of nested interrupts
>> - I would really appreciate if you could help me understand why
>> it should be OK for handle_nested_irq() to call note_interrupt().
> The thing about nested irqs is:
>
> main irq is threaded (requested by the driver for stuff like i2c)
>
> The handler of this irq reads a pending irq register in the chip and
> then invokes handle_nested_irq() for each of the pending bits.
>
> Those interrupts cannot be shared even if the driver request them as
> shared:
>
> irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
> raw_spin_unlock_irq(&desc->lock);
>
> action_ret = action->thread_fn(action->irq, action->dev_id);
> if (!noirqdebug)
> note_interrupt(irq, desc, action_ret);
>
> raw_spin_lock_irq(&desc->lock);
> irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
>
> So there is no loop over action->next. And even if that code would
> loop over action next, then it still would be serialized
>
> main irq is raised
>
> -> wake thread
>
> thread runs
>
> read pending reg()
>
> for each pending bit {
>
> handle_nested_irq();
> action = desc->action;
>
> while (action) {
> action->thread_fn()
> action = action->next)
> }
> note_interrupt();
> }
>
> thread done
>
> Hope that helps. Thanks,
>
> tglx
>
>
>
>
>

2014-04-07 18:39:01

by Austin Schuh

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

Hi Thomas,

Did anything come of this patch? Both Oliver and I have found that it
fixes real problems. I have multiple machines which have been running
with the patch since December with no ill effects.

Thanks,
Austin

On Mon, Jan 6, 2014 at 5:32 AM, Oliver Hartkopp <[email protected]> wrote:
> Hi Thomas,
>
> I just wanted to add my
>
> Tested-by: Oliver Hartkopp <[email protected]>
>
> In my setup with Core i7 and 20 CAN busses SJA1000 PCIe the problem
> disappeared with the discussed patch with the -rt kernel.
>
> The system was running at full CAN bus load over the weekend more than 72
> hours of operation without problems:
>
> CPU0 CPU1 CPU2 CPU3
> 0: 40 0 0 0 IO-APIC-edge timer
> 1: 1 0 0 0 IO-APIC-edge i8042
> 8: 0 0 1 0 IO-APIC-edge rtc0
> 9: 42 45 45 42 IO-APIC-fasteoi acpi
> 16: 9 8 8 8 IO-APIC-fasteoi ahci, ehci_hcd:usb1, can4, can5, can6, can7
> 17: 441468642 443275488 443609061 441436145 IO-APIC-fasteoi can8, can10, can11, can9
> 18: 441975412 438811422 437317802 441209092 IO-APIC-fasteoi can12, can13, can14, can15
> 19: 427310388 428661677 429813687 428095739 IO-APIC-fasteoi can0, can1, can2, can3, can16, can17, can18, can19
> (..)
>
> Before the having the patch, it lasted 1 minutes to 1.5 hours (usually ~3
> minutes) until the irq was killed due to the spurious detection using Linux
> 3.10.11-rt (Debian linux-image-3.10-0.bpo.3-rt-686-pae).
>
> I also tested the patch on different latest 3.13-rc5+ (non-rt) kernels for two
> weeks now without problems.
>
> If you want me to test an improved version (as Austin suggested below) please
> send a patch.
>
> Best regards,
> Oliver
>
> On 23.12.2013 20:25, Austin Schuh wrote:
>> Hi Thomas,
>>
>> Did anything happen with your patch to note_interrupt, originally
>> posted on May 8th of 2013? (https://lkml.org/lkml/2013/3/7/222)
>>
>> I am seeing an issue on a machine right now running a
>> config-preempt-rt kernel and a SJA1000 CAN card from PEAK. It works
>> for ~1 day, and then proceeds to die with a "Disabling IRQ #18"
>> message. I posted on the Linux CAN mailing list, and Oliver Hartkopp
>> was able to reproduce the issue only on a realtime kernel. A function
>> trace ending when the IRQ was disabled shows that note_interrupt is
>> being called regularly from the IRQ handler threads, and one of the
>> threads is doing work (and therefore calling note_interrupt with
>> IRQ_HANDLED).
>>
>> Oliver Hartkopp and I ran tests over the weekend on numerous machines
>> and verified that the patch that you proposed fixes the problem. We
>> think that the race condition that Till reported is causing the
>> problem here.
>>
>> In reply to the comment about using the upper bit of
>> threads_handled_last for holding the SPURIOUS_DEFERRED flag, while
>> that may still be an over-optimization, the code should still work.
>> All comparisons are done with the bit set, which just makes it a 31
>> bit counter. It will take 8 more days for the counter to overflow on
>> my machine, so I won't know for certain until then.
>>
>> My only concern is that there may still be a small race condition with
>> this new code. If the interrupt handler thread is running at a
>> realtime priority, but lower than another task, it may not get run
>> until a large number of IRQs get triggered, and then process them
>> quickly. With your new handler code, this would be counted as one
>> single handled interrupt. With the current constants, this is only a
>> problem if more than 1000 calls to the handler happen between IRQs. I
>> starved my card's irq threads by running 4 tasks at a higher realtime
>> priority than the handler threads, and saw the number of unhandled
>> IRQs jump from 1/100000 to 3/100000, so that problem may not show up
>> in practice.
>>
>> Austin Schuh
>>
>> Tested-by: Austin Schuh <[email protected]>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-07 18:40:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Mon, 7 Apr 2014, Austin Schuh wrote:

> Hi Thomas,
>
> Did anything come of this patch? Both Oliver and I have found that it
> fixes real problems. I have multiple machines which have been running
> with the patch since December with no ill effects.

No, sorry. It fell through the cracks. Care to resend ?

Thanks,

tglx

2014-04-07 20:05:16

by Austin Schuh

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Mon, Apr 7, 2014 at 11:41 AM, Thomas Gleixner <[email protected]> wrote:
> On Mon, 7 Apr 2014, Austin Schuh wrote:
>
>> Hi Thomas,
>>
>> Did anything come of this patch? Both Oliver and I have found that it
>> fixes real problems. I have multiple machines which have been running
>> with the patch since December with no ill effects.
>
> No, sorry. It fell through the cracks. Care to resend ?
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

You originally sent the patch out. I could send your patch out back
to you, but that feels a bit weird ;)

Austin

2014-04-07 20:07:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Mon, 7 Apr 2014, Austin Schuh wrote:

> On Mon, Apr 7, 2014 at 11:41 AM, Thomas Gleixner <[email protected]> wrote:
> > On Mon, 7 Apr 2014, Austin Schuh wrote:
> >
> >> Hi Thomas,
> >>
> >> Did anything come of this patch? Both Oliver and I have found that it
> >> fixes real problems. I have multiple machines which have been running
> >> with the patch since December with no ill effects.
> >
> > No, sorry. It fell through the cracks. Care to resend ?
> >
> > Thanks,
> >
> > tglx
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-can" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> You originally sent the patch out. I could send your patch out back
> to you, but that feels a bit weird ;)

Wheee. Let me dig in my archives ....

2014-04-07 20:08:39

by Austin Schuh

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Mon, Apr 7, 2014 at 1:07 PM, Thomas Gleixner <[email protected]> wrote:
> On Mon, 7 Apr 2014, Austin Schuh wrote:
>> You originally sent the patch out. I could send your patch out back
>> to you, but that feels a bit weird ;)
>
> Wheee. Let me dig in my archives ....

https://lkml.org/lkml/2013/3/7/222 in case that helps.

Austin

2014-04-28 20:21:01

by Austin Schuh

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Mon, Apr 7, 2014 at 1:08 PM, Austin Schuh <[email protected]> wrote:
> On Mon, Apr 7, 2014 at 1:07 PM, Thomas Gleixner <[email protected]> wrote:
>> On Mon, 7 Apr 2014, Austin Schuh wrote:
>>> You originally sent the patch out. I could send your patch out back
>>> to you, but that feels a bit weird ;)
>>
>> Wheee. Let me dig in my archives ....
>
> https://lkml.org/lkml/2013/3/7/222 in case that helps.

Did you find the patch? I didn't see anything go by (but I'm not on
the main mailing list and didn't find anything with a quick Google
search.) It would be nice to not need to run a custom kernel to keep
my machine running. I have what is probably a year split between 2
machines of runtime with the patch applied, and I haven't seen any
problems with it.

Austin

2014-04-28 20:43:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs

On Mon, 28 Apr 2014, Austin Schuh wrote:
> On Mon, Apr 7, 2014 at 1:08 PM, Austin Schuh <[email protected]> wrote:
> > On Mon, Apr 7, 2014 at 1:07 PM, Thomas Gleixner <[email protected]> wrote:
> >> On Mon, 7 Apr 2014, Austin Schuh wrote:
> >>> You originally sent the patch out. I could send your patch out back
> >>> to you, but that feels a bit weird ;)
> >>
> >> Wheee. Let me dig in my archives ....
> >
> > https://lkml.org/lkml/2013/3/7/222 in case that helps.
>
> Did you find the patch? I didn't see anything go by (but I'm not on
> the main mailing list and didn't find anything with a quick Google
> search.) It would be nice to not need to run a custom kernel to keep
> my machine running. I have what is probably a year split between 2
> machines of runtime with the patch applied, and I haven't seen any
> problems with it.

It's on my list, but Easter and traveling did not really help :)

There are a few issues with the patch I need to think through, but
I'll get to it in the next few days.