2003-05-05 13:11:29

by Shane Shrybman

[permalink] [raw]
Subject: 2.5.68-mmX: Drowning in irq 7: nobody cared!

Hi,

I am getting a lot of these in the logs. This is with the ALSA emu10k1
driver for a SB live card. This is a x86, UP, KT133 system with preempt
enabled. The system seems to be running fine.

handlers:
[<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
irq 7: nobody cared!
Call Trace:
[<c010c5c2>] handle_IRQ_event+0xa2/0x110
[<c010c7c0>] do_IRQ+0xa0/0x130
[<c010b14c>] common_interrupt+0x18/0x20
[<c012048c>] do_softirq+0x3c/0xa0
[<c010c826>] do_IRQ+0x106/0x130
[<c010b14c>] common_interrupt+0x18/0x20
[<c0108884>] default_idle+0x24/0x30
[<c0114e55>] apm_cpu_idle+0x125/0x170
[<c0114d30>] apm_cpu_idle+0x0/0x170
[<c0108860>] default_idle+0x0/0x30
[<c0108902>] cpu_idle+0x32/0x50
[<c0105000>] _stext+0x0/0x60
[<c02c46be>] start_kernel+0x15e/0x170

handlers:
[<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
irq 7: nobody cared!
Call Trace:
[<c010c5c2>] handle_IRQ_event+0xa2/0x110
[<c010c7c0>] do_IRQ+0xa0/0x130
[<c010b14c>] common_interrupt+0x18/0x20
[<c012048c>] do_softirq+0x3c/0xa0
[<c010c826>] do_IRQ+0x106/0x130
[<c010b14c>] common_interrupt+0x18/0x20
[<c01203dd>] current_kernel_time+0xd/0x40
[<c015f4d5>] inode_update_time+0x15/0x90
[<c01fd188>] memcpy_toiovec+0x68/0xb0
[<c0132af0>] generic_file_aio_write_nolock+0x390/0x9c0
[<c01fb9f7>] kfree_skbmem+0x17/0x20
[<c011b0f0>] autoremove_wake_function+0x0/0x40
[<c01f86dc>] sock_recvmsg+0x8c/0xb0
[<c013318f>] generic_file_write_nolock+0x6f/0x90
[<c0119cec>] __wake_up+0x1c/0x40
[<c0134737>] __alloc_pages+0x97/0x3a0
[<c0134a33>] __alloc_pages+0x393/0x3a0
[<c01f97f2>] sys_recvfrom+0xa2/0x100
[<c01f9836>] sys_recvfrom+0xe6/0x100
[<c0134a5a>] __get_free_pages+0x1a/0x50
[<c0134481>] free_hot_cold_page+0x21/0xf0
[<c0133350>] generic_file_writev+0x30/0x50
[<c014933d>] do_readv_writev+0x1bd/0x260
[<c0148e30>] do_sync_write+0x0/0xb0
[<c01f9ecf>] sys_socketcall+0x15f/0x1f0
[<c0149474>] vfs_writev+0x44/0x50
[<c01494e8>] sys_writev+0x28/0x40
[<c010a7df>] syscall_call+0x7/0xb

CPU0
0: 45130454 XT-PIC timer
1: 6730 XT-PIC i8042
2: 0 XT-PIC cascade
5: 278881 XT-PIC uhci-hcd, uhci-hcd
7: 128713 XT-PIC EMU10K1
8: 1 XT-PIC rtc
10: 983196 XT-PIC ide2, ide3, bttv0
11: 3031816 XT-PIC eth0
12: 60 XT-PIC i8042, i8042, i8042, i8042
14: 67179 XT-PIC ide0
15: 520 XT-PIC ide1
NMI: 0
LOC: 45131802
ERR: 182
MIS: 0

BTW: What about the 4 i8042's on irq 12. Is this normal/OK?

Regards,

Shane


2003-05-05 21:21:14

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

Shane Shrybman <[email protected]> wrote:
>
> Hi,
>
> I am getting a lot of these in the logs. This is with the ALSA emu10k1
> driver for a SB live card. This is a x86, UP, KT133 system with preempt
> enabled. The system seems to be running fine.
>
> handlers:
> [<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
> irq 7: nobody cared!

Beats me. Does this fix it up?

diff -puN sound/pci/emu10k1/irq.c~sound-irq-hack sound/pci/emu10k1/irq.c
--- 25/sound/pci/emu10k1/irq.c~sound-irq-hack Mon May 5 14:28:58 2003
+++ 25-akpm/sound/pci/emu10k1/irq.c Mon May 5 14:29:17 2003
@@ -147,5 +147,5 @@ irqreturn_t snd_emu10k1_interrupt(int ir
outl(IPR_FXDSP, emu->port + IPR);
}
}
- return IRQ_RETVAL(handled);
+ return IRQ_HANDLED;
}

_

2003-05-06 10:21:39

by Alan

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

On Llu, 2003-05-05 at 22:30, Andrew Morton wrote:
> Shane Shrybman <[email protected]> wrote:
> >
> > Hi,
> >
> > I am getting a lot of these in the logs. This is with the ALSA emu10k1
> > driver for a SB live card. This is a x86, UP, KT133 system with preempt
> > enabled. The system seems to be running fine.
> >
> > handlers:
> > [<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
> > irq 7: nobody cared!
>
> Beats me. Does this fix it up?

With APIC at least it doesnt suprise me the least. The IRQ hack seems
extremely racey. Remember on most systems (especially with PIII type
APIC) IRQ delivery is asynchronous to the bus so you get

IRQ arrives
sound card
loop
clean up IRQ
IRQ sent
still more work, do it
done
HANDLED

IRQ arrives
sound card
Umm duh no work for me
NOT HANDLED

Whine

For anything where you get pairs of close IRQ's

2003-05-06 10:38:09

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

On Tue, 6 May 2003, Alan Cox wrote:

> With APIC at least it doesnt suprise me the least. The IRQ hack seems
> extremely racey. Remember on most systems (especially with PIII type
> APIC) IRQ delivery is asynchronous to the bus so you get
>
> IRQ arrives
> sound card
> loop
> clean up IRQ
> IRQ sent
> still more work, do it
> done
> HANDLED
>
> IRQ arrives
> sound card
> Umm duh no work for me
> NOT HANDLED
>
> Whine
>
> For anything where you get pairs of close IRQ's

Shouldn't this also be observed more easily on P4/xAPIC since you can have
a pending vector in the IRR and ISR whilst the core processes one.

Zwane
--
function.linuxpower.ca

2003-05-06 12:03:50

by Alan

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

On Maw, 2003-05-06 at 11:41, Zwane Mwaikambo wrote:
> > For anything where you get pairs of close IRQ's
>
> Shouldn't this also be observed more easily on P4/xAPIC since you can have
> a pending vector in the IRR and ISR whilst the core processes one.

I don't know enough about the pending vector stuff. For the older APIC the
IRQ's go via a suprisingly slow seperate APIC bus (4 wire if I remember
rightly).

2003-05-06 15:03:08

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

Alan Cox <[email protected]> wrote:
>
> With APIC at least it doesnt suprise me the least. The IRQ hack seems
> extremely racey.

Good point. How about we do something like "if half of the past 1000
interrupts weren't handled then try to kill the IRQ"?



arch/i386/kernel/irq.c | 55 +++++++++++++++++++++++++++++++++++++++----------
include/linux/irq.h | 2 +
2 files changed, 46 insertions(+), 11 deletions(-)

diff -puN arch/i386/kernel/irq.c~irq-check-rate-limit arch/i386/kernel/irq.c
--- 25/arch/i386/kernel/irq.c~irq-check-rate-limit 2003-05-06 07:54:17.000000000 -0700
+++ 25-akpm/arch/i386/kernel/irq.c 2003-05-06 08:16:15.000000000 -0700
@@ -66,8 +66,12 @@
/*
* Controller mappings for all interrupt sources:
*/
-irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
- { [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
+irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned = {
+ [0 ... NR_IRQS-1] = {
+ .handler = &no_irq_type,
+ .lock = SPIN_LOCK_UNLOCKED
+ }
+};

static void register_irq_proc (unsigned int irq);

@@ -209,7 +213,6 @@ int handle_IRQ_event(unsigned int irq,
{
int status = 1; /* Force the "do bottom halves" bit */
int retval = 0;
- struct irqaction *first_action = action;

if (!(action->flags & SA_INTERRUPT))
local_irq_enable();
@@ -222,19 +225,43 @@ int handle_IRQ_event(unsigned int irq,
if (status & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
local_irq_disable();
- if (retval != 1) {
+ return status;
+}
+
+/*
+ * If 500 of the previous 1000 interrupts have not been handled then assume
+ * that the IRQ is stuck in some manner. Drop a diagnostic and try to turn the
+ * IRQ off.
+ *
+ * Called under desc->lock
+ */
+static void note_interrupt(irq_desc_t *desc, int irq, int status)
+{
+ if (status != IRQ_HANDLED)
+ desc->irqs_unhandled++;
+ desc->irq_count++;
+ if (desc->irq_count < 1000)
+ return;
+
+ desc->irq_count = 0;
+ if (desc->irqs_unhandled > 500) {
+ /*
+ * The interrupt is stuck
+ */
static int count = 100;
+ struct irqaction *action;
+
if (count) {
count--;
- if (retval) {
+ if (status) {
printk("irq event %d: bogus retval mask %x\n",
- irq, retval);
+ irq, status);
} else {
printk("irq %d: nobody cared!\n", irq);
}
dump_stack();
printk("handlers:\n");
- action = first_action;
+ action = desc->action;
do {
printk("[<%p>]", action->handler);
print_symbol(" (%s)",
@@ -243,9 +270,13 @@ int handle_IRQ_event(unsigned int irq,
action = action->next;
} while (action);
}
+ /*
+ * Now kill the IRQ
+ */
+ desc->status |= IRQ_DISABLED;
+ desc->handler->disable(irq);
}
-
- return status;
+ desc->irqs_unhandled = 0;
}

/*
@@ -418,10 +449,12 @@ asmlinkage unsigned int do_IRQ(struct pt
* SMP environment.
*/
for (;;) {
+ int status;
+
spin_unlock(&desc->lock);
- handle_IRQ_event(irq, &regs, action);
+ status = handle_IRQ_event(irq, &regs, action);
spin_lock(&desc->lock);
-
+ note_interrupt(desc, irq, status);
if (likely(!(desc->status & IRQ_PENDING)))
break;
desc->status &= ~IRQ_PENDING;
diff -puN include/linux/irq.h~irq-check-rate-limit include/linux/irq.h
--- 25/include/linux/irq.h~irq-check-rate-limit 2003-05-06 07:56:03.000000000 -0700
+++ 25-akpm/include/linux/irq.h 2003-05-06 08:05:17.000000000 -0700
@@ -61,6 +61,8 @@ typedef struct {
hw_irq_controller *handler;
struct irqaction *action; /* IRQ action list */
unsigned int depth; /* nested irq disables */
+ unsigned int irq_count; /* For detecting broken interrupts */
+ unsigned int irqs_unhandled;
spinlock_t lock;
} ____cacheline_aligned irq_desc_t;


_

2003-05-06 15:58:41

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

Alan Cox <[email protected]> wrote:
>
> It seems the heuristic is more complicated

Any suggestions?

2003-05-06 15:55:27

by Alan

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

On Maw, 2003-05-06 at 16:17, Andrew Morton wrote:
> Alan Cox <[email protected]> wrote:
> >
> > With APIC at least it doesnt suprise me the least. The IRQ hack seems
> > extremely racey.
>
> Good point. How about we do something like "if half of the past 1000
> interrupts weren't handled then try to kill the IRQ"?

And if its a sound card generating close pairs of IRQs you might still
trip. It seems the heuristic is more complicated

2003-05-06 21:34:27

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

Andrew Morton wrote:

> Alan Cox <[email protected]> wrote:
> >
> > It seems the heuristic is more complicated
>
> Any suggestions?


Does this pseudocode look like it would work? It should make it
only complain if two or more interrupts in a row go unhandled.


int last_irq_was_dropped[NR_IRQS];

/* call each handler in turn for this irq */

for (each_driver(irq)) {
ret = call_driver();
if (ret == irq_handled) {
if (unlikely(last_irq_was_dropped[irq])
last_irq_was_dropped[irq] = 0;
break;
}
}
if (ret != irq_handled) {
if (unlikely(last_irq_was_dropped[irq]))
complain();
else
last_irq_was_dropped[irq] = 1;
}

2003-05-07 21:58:06

by Shane Shrybman

[permalink] [raw]
Subject: Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!

Hi Andrew & Alan,

Sorry for the delay but the one liner below does seem to have cleared up
the issue. I have been running it for about eight hours, with some sound
on all the time, and haven't seen any 'nobody cared' messages.

BTW: I hand applied this to 2.5.69-mm1. So I am confident that this one
liner did fix it.

On Mon, 2003-05-05 at 17:30, Andrew Morton wrote:
> Shane Shrybman <[email protected]> wrote:
> >
> > Hi,
> >
> > I am getting a lot of these in the logs. This is with the ALSA emu10k1
> > driver for a SB live card. This is a x86, UP, KT133 system with preempt
> > enabled. The system seems to be running fine.
> >
> > handlers:
> > [<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
> > irq 7: nobody cared!
>
> Beats me. Does this fix it up?
>
> diff -puN sound/pci/emu10k1/irq.c~sound-irq-hack sound/pci/emu10k1/irq.c
> --- 25/sound/pci/emu10k1/irq.c~sound-irq-hack Mon May 5 14:28:58 2003
> +++ 25-akpm/sound/pci/emu10k1/irq.c Mon May 5 14:29:17 2003
> @@ -147,5 +147,5 @@ irqreturn_t snd_emu10k1_interrupt(int ir
> outl(IPR_FXDSP, emu->port + IPR);
> }
> }
> - return IRQ_RETVAL(handled);
> + return IRQ_HANDLED;
> }

Regards,

Shane