2015-12-08 11:02:54

by Sekhar Nori

[permalink] [raw]
Subject: [PATCH v2] irqchip: omap-intc: add support for spurious irq handling

Under some conditions, irq sorting procedure used
by INTC can go wrong resulting in a spurious irq
getting reported.

If this condition is not handled, it results in
endless stream of:

unexpected IRQ trap at vector 00

messages from ack_bad_irq()

Handle the spurious interrupt condition in omap-intc
driver to prevent this.

Signed-off-by: Sekhar Nori <[email protected]>
---
v2: increment error irq counter, use pr_err_once,
add a comment on tips to debug spurious irq
condition.

This patch results in a checkpatch warning about
extern definition of irq_err_count, but looks like
thats the prevalent method of accessing that counter.

drivers/irqchip/irq-omap-intc.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-omap-intc.c b/drivers/irqchip/irq-omap-intc.c
index 8587d0f8d8c0..639708de5529 100644
--- a/drivers/irqchip/irq-omap-intc.c
+++ b/drivers/irqchip/irq-omap-intc.c
@@ -47,6 +47,7 @@
#define INTC_ILR0 0x0100

#define ACTIVEIRQ_MASK 0x7f /* omap2/3 active interrupt bits */
+#define SPURIOUSIRQ_MASK (0x1ffffff << 7)
#define INTCPS_NR_ILR_REGS 128
#define INTCPS_NR_MIR_REGS 4

@@ -330,11 +331,35 @@ static int __init omap_init_irq(u32 base, struct device_node *node)
static asmlinkage void __exception_irq_entry
omap_intc_handle_irq(struct pt_regs *regs)
{
+ extern unsigned long irq_err_count;
u32 irqnr;

irqnr = intc_readl(INTC_SIR);
+
+ /*
+ * A spurious IRQ can result if interrupt that triggered the
+ * sorting is no longer active during the sorting (10 INTC
+ * functional clock cycles after interrupt assertion). Or a
+ * change in interrupt mask affected the result during sorting
+ * time. There is no special handling required except ignoring
+ * the SIR register value just read and retrying.
+ * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
+ *
+ * Many a times, a spurious interrupt situation has been fixed
+ * by adding a flush for the posted write acking the IRQ in
+ * the device driver. Typically, this is going be the device
+ * driver whose interrupt was handled just before the spurious
+ * IRQ occurred. Pay attention to those device drivers if you
+ * run into hitting the spurious IRQ condition below.
+ */
+ if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {
+ pr_err_once("%s: spurious irq!\n", __func__);
+ irq_err_count++;
+ omap_ack_irq(NULL);
+ return;
+ }
+
irqnr &= ACTIVEIRQ_MASK;
- WARN_ONCE(!irqnr, "Spurious IRQ ?\n");
handle_domain_irq(domain, irqnr, regs);
}

--
2.6.3


2015-12-08 13:46:13

by Felipe Balbi

[permalink] [raw]
Subject: Re: [PATCH v2] irqchip: omap-intc: add support for spurious irq handling


Hi,

Sekhar Nori <[email protected]> writes:
> Under some conditions, irq sorting procedure used
> by INTC can go wrong resulting in a spurious irq
> getting reported.
>
> If this condition is not handled, it results in
> endless stream of:
>
> unexpected IRQ trap at vector 00
>
> messages from ack_bad_irq()
>
> Handle the spurious interrupt condition in omap-intc
> driver to prevent this.
>
> Signed-off-by: Sekhar Nori <[email protected]>
> ---
> v2: increment error irq counter, use pr_err_once,
> add a comment on tips to debug spurious irq
> condition.
>
> This patch results in a checkpatch warning about
> extern definition of irq_err_count, but looks like
> thats the prevalent method of accessing that counter.
>
> drivers/irqchip/irq-omap-intc.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-omap-intc.c b/drivers/irqchip/irq-omap-intc.c
> index 8587d0f8d8c0..639708de5529 100644
> --- a/drivers/irqchip/irq-omap-intc.c
> +++ b/drivers/irqchip/irq-omap-intc.c
> @@ -47,6 +47,7 @@
> #define INTC_ILR0 0x0100
>
> #define ACTIVEIRQ_MASK 0x7f /* omap2/3 active interrupt bits */
> +#define SPURIOUSIRQ_MASK (0x1ffffff << 7)
> #define INTCPS_NR_ILR_REGS 128
> #define INTCPS_NR_MIR_REGS 4
>
> @@ -330,11 +331,35 @@ static int __init omap_init_irq(u32 base, struct device_node *node)
> static asmlinkage void __exception_irq_entry
> omap_intc_handle_irq(struct pt_regs *regs)
> {
> + extern unsigned long irq_err_count;
> u32 irqnr;
>
> irqnr = intc_readl(INTC_SIR);
> +
> + /*
> + * A spurious IRQ can result if interrupt that triggered the
> + * sorting is no longer active during the sorting (10 INTC
> + * functional clock cycles after interrupt assertion). Or a
> + * change in interrupt mask affected the result during sorting
> + * time. There is no special handling required except ignoring
> + * the SIR register value just read and retrying.
> + * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
> + *
> + * Many a times, a spurious interrupt situation has been fixed
> + * by adding a flush for the posted write acking the IRQ in
> + * the device driver. Typically, this is going be the device
> + * driver whose interrupt was handled just before the spurious
> + * IRQ occurred. Pay attention to those device drivers if you
> + * run into hitting the spurious IRQ condition below.
> + */
> + if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {

sounds like unlikely() wouldn't hurt here.

> + pr_err_once("%s: spurious irq!\n", __func__);
> + irq_err_count++;
> + omap_ack_irq(NULL);
> + return;
> + }
> +
> irqnr &= ACTIVEIRQ_MASK;
> - WARN_ONCE(!irqnr, "Spurious IRQ ?\n");
> handle_domain_irq(domain, irqnr, regs);

care to run kernel function profiler against omap_intc_handle_irq()
before and after this patch ?

--
balbi


Attachments:
signature.asc (818.00 B)

2015-12-10 15:17:12

by Sekhar Nori

[permalink] [raw]
Subject: Re: [PATCH v2] irqchip: omap-intc: add support for spurious irq handling

Hi Felipe,

On Tuesday 08 December 2015 07:15 PM, Felipe Balbi wrote:
> Sekhar Nori <[email protected]> writes:

>> + /*
>> + * A spurious IRQ can result if interrupt that triggered the
>> + * sorting is no longer active during the sorting (10 INTC
>> + * functional clock cycles after interrupt assertion). Or a
>> + * change in interrupt mask affected the result during sorting
>> + * time. There is no special handling required except ignoring
>> + * the SIR register value just read and retrying.
>> + * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
>> + *
>> + * Many a times, a spurious interrupt situation has been fixed
>> + * by adding a flush for the posted write acking the IRQ in
>> + * the device driver. Typically, this is going be the device
>> + * driver whose interrupt was handled just before the spurious
>> + * IRQ occurred. Pay attention to those device drivers if you
>> + * run into hitting the spurious IRQ condition below.
>> + */
>> + if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {
>
> sounds like unlikely() wouldn't hurt here.

I can add, but looks like it does not make a big difference. See below.

>
>> + pr_err_once("%s: spurious irq!\n", __func__);
>> + irq_err_count++;
>> + omap_ack_irq(NULL);
>> + return;
>> + }
>> +
>> irqnr &= ACTIVEIRQ_MASK;
>> - WARN_ONCE(!irqnr, "Spurious IRQ ?\n");
>> handle_domain_irq(domain, irqnr, regs);
>
> care to run kernel function profiler against omap_intc_handle_irq()
> before and after this patch ?

Before this patch I see average running time time of 34us. That
increases to 37.8us after this patch. With unlikely() the number I got
was 37.4us. So the benefit with unlikely() is in the noise range.

This was using AM335x EVM at 720 MHz.

Thanks,
Sekhar

2015-12-15 14:27:36

by Sekhar Nori

[permalink] [raw]
Subject: Re: [PATCH v2] irqchip: omap-intc: add support for spurious irq handling

On Thursday 10 December 2015 08:46 PM, Sekhar Nori wrote:
> Hi Felipe,
>
> On Tuesday 08 December 2015 07:15 PM, Felipe Balbi wrote:
>> Sekhar Nori <[email protected]> writes:
>
>>> + /*
>>> + * A spurious IRQ can result if interrupt that triggered the
>>> + * sorting is no longer active during the sorting (10 INTC
>>> + * functional clock cycles after interrupt assertion). Or a
>>> + * change in interrupt mask affected the result during sorting
>>> + * time. There is no special handling required except ignoring
>>> + * the SIR register value just read and retrying.
>>> + * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
>>> + *
>>> + * Many a times, a spurious interrupt situation has been fixed
>>> + * by adding a flush for the posted write acking the IRQ in
>>> + * the device driver. Typically, this is going be the device
>>> + * driver whose interrupt was handled just before the spurious
>>> + * IRQ occurred. Pay attention to those device drivers if you
>>> + * run into hitting the spurious IRQ condition below.
>>> + */
>>> + if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {
>>
>> sounds like unlikely() wouldn't hurt here.
>
> I can add, but looks like it does not make a big difference. See below.
>
>>
>>> + pr_err_once("%s: spurious irq!\n", __func__);
>>> + irq_err_count++;
>>> + omap_ack_irq(NULL);
>>> + return;
>>> + }
>>> +
>>> irqnr &= ACTIVEIRQ_MASK;
>>> - WARN_ONCE(!irqnr, "Spurious IRQ ?\n");
>>> handle_domain_irq(domain, irqnr, regs);
>>
>> care to run kernel function profiler against omap_intc_handle_irq()
>> before and after this patch ?
>
> Before this patch I see average running time time of 34us. That
> increases to 37.8us after this patch. With unlikely() the number I got
> was 37.4us. So the benefit with unlikely() is in the noise range.
>
> This was using AM335x EVM at 720 MHz.

Just sent a v3 with unlikely() and profiling information added to commit
message.

Thanks,
Sekhar