2004-10-29 06:28:59

by Kahro Raie

[permalink] [raw]
Subject: ERROR: Disabling IRQ #11

Description:
After my system has been up for about 10 minutes I allways get the following 2 line error message on every console:
Message from syslogd@etna at Fri Oct 29 08:46:55 2004 ...
etna kernel: Disabling IRQ #11

Keywords:
kernel irq 11 usb_hcd_irq usbcore

Kernel version:
Linux version 2.6.8-1-686 ([email protected]) (gcc version 3.3.4 (Debian 1:3.3.4-13)) #1 Thu Oct 7 03:15:25 EDT 2004

My system:
Debian GNU/Linux (testing)

Other notes:
I think that the dmesg info is the most relevant and descriptive as it contains a block that appares when the error is reported:
irq 11: nobody cared!
[<c010841a>] __report_bad_irq+0x2a/0x90
[<c0108510>] note_interrupt+0x70/0xb0
[<c01087f0>] do_IRQ+0x120/0x130
[<c0106a20>] common_interrupt+0x18/0x20
[<c028007b>] schedule+0x38b/0x4d0
[<c0104053>] default_idle+0x23/0x40
[<c01040e4>] cpu_idle+0x34/0x40
[<c03307b8>] start_kernel+0x1a8/0x1f0
[<c0330380>] unknown_bootoption+0x0/0x160
handlers:
[<e029c770>] (usb_hcd_irq+0x0/0x70 [usbcore])
[<e029c770>] (usb_hcd_irq+0x0/0x70 [usbcore])
Disabling IRQ #11

I don't need a replay for my mail, I just want to report that the bug still excists on my system and hope my info helps.

-----------------------------------------
ITV - Sinu lemmiksaated internetis!
http://www.itv.ee


Attachments:
proc.cpuinfo (407.00 B)
"proc.cpuinfo"
proc.iomem (661.00 B)
"proc.iomem"
proc.ioports (758.00 B)
"proc.ioports"
proc.modules (2.26 kB)
"proc.modules"
lspci (7.28 kB)
"lspci"
dmesg (13.41 kB)
"dmesg"
Download all attachments

2004-10-29 07:01:39

by Brown, Len

[permalink] [raw]
Subject: Re: ERROR: Disabling IRQ #11

On Fri, 2004-10-29 at 02:27, Kahro Raie wrote:
> Description:
> After my system has been up for about 10 minutes I allways get the
> following 2 line error message on every console:

> irq 11: nobody cared!
...
> Disabling IRQ #11

APIC error on CPU0: 00(01)

Hmmm, how did we take this interrupt with no bits set?
why do we have bit 0 (send checksum error) set after
we try to clear "errors"?

Did you not see this issue when running a different kernel, or do you
always see this issue?

Is the board over-clocked?

-Len


2004-10-29 11:28:05

by linux-os

[permalink] [raw]
Subject: Re: ERROR: Disabling IRQ #11

On Fri, 29 Oct 2004, Kahro Raie wrote:

> Description:
> After my system has been up for about 10 minutes I allways get
> the following 2 line error message on every console:
> Message from syslogd@etna at Fri Oct 29 08:46:55 2004 ...
> etna kernel: Disabling IRQ #11
>

Find the driver (module) that is using IRQ11. That module is
probably not returning the correct value from its ISR. That's
one of the changes in new kernels. ISRs now have to return values.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.

2004-10-29 15:38:05

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: ERROR: Disabling IRQ #11

On Fri, 29 Oct 2004, Len Brown wrote:

> APIC error on CPU0: 00(01)
>
> Hmmm, how did we take this interrupt with no bits set?
> why do we have bit 0 (send checksum error) set after
> we try to clear "errors"?

Please have a look at the relevant local APIC specification. For the
P6-class local APIC a write of zero (or likely any value -- I don't
remember) to the ESR makes the internal error status be copied to the
externally visible ESR. A read of the ESR returns its contents and clears
it. Thus the report is perfectly valid -- it means no uncleared error was
left over before.

For the record: for the P5-class local APIC the ESR is the only error
status and it is also cleared on a read. Thus for that implementation the
error codes reported would be reversed. Writes to the ESR have no effect
by definition. Unfortunately, a range of chips have suffered from an
erratum which makes data on writes to the ESR being actually recorded in
the register. As a result, we cannot just do a sequence consisting of a
write and a read -- we need to do that leading read to handle buggy chips
correctly. And we do need to write zero specifically as otherwise a bogus
error would be reported for them.

I don't remember what the specification for the P4-class local APIC is in
this area, or what other vendors' implementations do. The i82489DX APIC
does not implement error reporting.

Frankly, I think this P5-to-P6 APIC specification change is an
unnecessary annoyance for an OS developer. And there are more caveats
like this across local APIC implementations, this perhaps being the least
harmful one.

Maciej