2004-03-16 18:37:56

by Emmanuel Fleury

[permalink] [raw]
Subject: spurious 8259A interrupt

Hi,

I noticed today that I had several "spurious 8259A interrupt":

Dec 20 15:02:45 hermes vmunix: spurious 8259<3>[drm:radeon_cp_init]
*ERROR* radeon_cp_init called without lock held
...
Dec 20 16:54:17 hermes vmunix: spurious 8259A interrupt: IRQ7.
...
Jan 3 11:29:06 hermes vmunix: spurious 8259A interrupt: I<6>cs: memory
probe 0xa0000000-0xa0ffffff: clean.
...
Feb 29 12:59:39 hermes vmunix: spurious 8259A in<4>atkbd.c: Keyboard on
isa0060/serio0 reports too many keys pressed.
...
Mar 1 00:03:12 hermes vmunix: spurious 8259A interrupt:
I<3>[drm:radeon_cp_init] *ERROR* radeon_cp_init called without lock held
...
Mar 8 03:11:24 hermes vmunix: spurious 8259A interrupt:
I<7>orinoco_lock() called with hw_unavailable (dev=d5a80000)


After some Googling, I found out this:
http://test.linuxfromscratch.org/faq/#spurious-8259A-interrupt

So, I know it is no harm. But, it is possibly due to a device driver
which is not properly doing its job. Can somebody tell me how to correct
this bug (without have the work around of tingling klogd).

Regards
--
Emmanuel Fleury

Computer Science Department, | Office: B1-201
Aalborg University, | Phone: +45 96 35 72 23
Fredriks Bajersvej 7E, | Fax: +45 98 15 98 89
9220 Aalborg East, Denmark | Email: [email protected]


2004-03-16 20:32:54

by Robert_Hentosh

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Emmanuel Fleury
> Sent: Tuesday, March 16, 2004 12:36 PM
>
>
> Hi,
>
> I noticed today that I had several "spurious 8259A interrupt":
>
> Dec 20 15:02:45 hermes vmunix: spurious 8259<3>[drm:radeon_cp_init]

:: SNIP ::

So a co-worker of mine (Stuart Hayes) did some digging into this issue.
What he found after putting a scope on the system was, in our situation
it was harmless:

The problem was actually caused by another IRQ (in our instance it was
IRQ10 associated with a gigabit NIC). The following steps took place:

> IRQ10 asserted
> INTACK cycle lets PIC deliver vector to processor
> processor masks IRQ10 in PIC
> processor sends EOI command to PIC
> processor reads a status register in the NIC, which causes IRQ10 to
be deasserted
> processor unmasks IRQ10 in PIC
>
> Sometimes the processor would unmask IRQ10 almost immediately after
reading the status
> register in the NIC, which results in IRQ10 being unmasked before the
IRQ10 signal has
> finished going high. This causes the PIC to think that there is
another IRQ10, but,
> by the time the processor asks for the vector, IRQ10 is no longer
asserted.

The PIC defaults to IRQ7 because of its design, when IRQ10 was already
cleared. Sticking delays in is not viable in a generic ISR routing. A
possible fix to this issue would be to issue the EOI after the read to
the status register on the NIC, and I see some documentation on the PIC
that actually suggests that this is the way to service an interrupt.
This seemed like a risky change, since sending the EOI and using the
mask has been in use for some time and the change would effect all
devices using interrupts.

The spurious IRQ performance impact is negligible since it is logged
only once per IRQ at most.

2004-03-19 13:06:19

by Jamie Lokier

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

[email protected] wrote:
> > IRQ10 asserted
> > INTACK cycle lets PIC deliver vector to processor
> > processor masks IRQ10 in PIC
> > processor sends EOI command to PIC
> > processor reads a status register in the NIC, which causes IRQ10 to be
> > deasserted
> > processor unmasks IRQ10 in PIC

> The PIC defaults to IRQ7 because of its design, when IRQ10 was already
> cleared. Sticking delays in is not viable in a generic ISR routing. A
> possible fix to this issue would be to issue the EOI after the read to
> the status register on the NIC, and I see some documentation on the PIC
> that actually suggests that this is the way to service an interrupt.
> This seemed like a risky change, since sending the EOI and using the
> mask has been in use for some time and the change would effect all
> devices using interrupts.

That reminds me: why does Linux mask the IRQ anyway?

Why doesn't it simply call the handler functions, and then send EOI to
the PIC with no unmasking?

For those rare occasions when an interrupt handler wants to re-enable
interrupts (sti), _then_ it could mask the interrupt that called the handler.

Why wouldn't that work?

-- Jamie

2004-03-19 13:16:13

by Russell King

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Fri, Mar 19, 2004 at 01:06:10PM +0000, Jamie Lokier wrote:
> For those rare occasions when an interrupt handler wants to re-enable
> interrupts (sti), _then_ it could mask the interrupt that called the handler.

Interrupt handlers generally run with the CPU interrupt disable flag
cleared, so other interrupts can be serviced.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-03-19 13:28:28

by Maciej W. Rozycki

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Tue, 16 Mar 2004 [email protected] wrote:

> > Sometimes the processor would unmask IRQ10 almost immediately after
> reading the status
> > register in the NIC, which results in IRQ10 being unmasked before the
> IRQ10 signal has
> > finished going high. This causes the PIC to think that there is
> another IRQ10, but,
> > by the time the processor asks for the vector, IRQ10 is no longer
> asserted.
>
> The PIC defaults to IRQ7 because of its design, when IRQ10 was already
> cleared. Sticking delays in is not viable in a generic ISR routing. A
> possible fix to this issue would be to issue the EOI after the read to
> the status register on the NIC, and I see some documentation on the PIC
> that actually suggests that this is the way to service an interrupt.
> This seemed like a risky change, since sending the EOI and using the
> mask has been in use for some time and the change would effect all
> devices using interrupts.

The best way to deal with spurious interrupts is to ack the interrupt at
the device ASAP in the handler, especially if you know that the response
is slow.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-19 13:39:47

by Jamie Lokier

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

Russell King wrote:
> Interrupt handlers generally run with the CPU interrupt disable flag
> cleared, so other interrupts can be serviced.

Indeed. But why? What's the advantage?

The obvious thought is it might improve latency of interrupt handlers
which need reasonably low latency, when other handlers take a long time.

E.g. if the irq 1 handler takes a long time, multiple irq 2
interrupts can be serviced during it.

But that doesn't work, when there are no meaningful hardware
priorities: an irq 2 handler can be interrupted by the long irq 1
handler, maybe before it gets to do anything useful, and then the irq
2 interrupt doesn't have low latency.

(I gather that irq priorities aren't especially meaningful on the x86
platform, as brought up on another thread recently).

Perhaps it works out statistically better.

Can you confirm that it does work out statistically better, or that
there's something I didn't think of?

Thanks,
-- Jamie

2004-03-19 13:46:50

by Richard B. Johnson

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Fri, 19 Mar 2004, Jamie Lokier wrote:

> [email protected] wrote:
> > > IRQ10 asserted
> > > INTACK cycle lets PIC deliver vector to processor
> > > processor masks IRQ10 in PIC
> > > processor sends EOI command to PIC
> > > processor reads a status register in the NIC, which causes IRQ10 to be
> > > deasserted
> > > processor unmasks IRQ10 in PIC
>
> > The PIC defaults to IRQ7 because of its design, when IRQ10 was already
> > cleared. Sticking delays in is not viable in a generic ISR routing. A
> > possible fix to this issue would be to issue the EOI after the read to
> > the status register on the NIC, and I see some documentation on the PIC
> > that actually suggests that this is the way to service an interrupt.
> > This seemed like a risky change, since sending the EOI and using the
> > mask has been in use for some time and the change would effect all
> > devices using interrupts.
>
> That reminds me: why does Linux mask the IRQ anyway?
>
> Why doesn't it simply call the handler functions, and then send EOI to
> the PIC with no unmasking?
>
> For those rare occasions when an interrupt handler wants to re-enable
> interrupts (sti), _then_ it could mask the interrupt that called the handler.
>
> Why wouldn't that work?

It would work. However, the driver would then have to "know"
if the interrupt came from IO-APIC or from the 8259. It also
would have to "know" what IRQ it was actually using, etc.,
not just at configuration time, but forever. So, all the
dirty details were put in the kernel code so that the
ISR only needs to know it was called as a result of an
interrupt.

There is no problem with masking ON/OFF the interrupt
input to the 8259, In fact, this can be used to generate
another (unreliable) edge if the IRQ line is still asserted.

The IRQ7 spurious is usually an artifact of a crappy motherboard
design where the CPU "thinks" it was interrupted, but the
controller didn't wiggle the CPUs INT line. Once the INT
cycle starts, it must complete or the CPU would hang forever
waiting for the vector. Therefore, if the controller gets
the vector request from the CPU and it didn't actually interrupt,
the controller puts the IRQ7 vector on the bus, that ISR gets
called, and you get a "spurious interrupt". If you are
looking at the programming of the 8259, you are looking in
the wrong place. You need to look at the hardware timing on
the motherboard. That's where the problem originates. The
8259 is just doing its job, keeping the CPU running after
this spurious event.

FYI, the motherboards in the cheapie Dell machines we have
been getting (Optiplex GX260) are attrocious in this respect.
To prevent the error logs from getting filled up with
the "Spurious interrupt" messages, they need to be commented
out in kernels that run on these machines. Otherwise, we
get such messages about 50 or 60 times per hour. I note
that the original question came from somebody at Dell. They
really need to check their own back-yard before investigating
software.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.


2004-03-19 14:06:32

by Anton Blanchard

[permalink] [raw]
Subject: Re: spurious 8259A interrupt


> Indeed. But why? What's the advantage?

We enable IRQs during IRQ processing on ppc64 for one reason. We set the
IPI priority higher than normal IRQs so we can service it as soon as
possible and the calling cpu can move on.

> E.g. if the irq 1 handler takes a long time, multiple irq 2
> interrupts can be serviced during it.
>
> But that doesn't work, when there are no meaningful hardware
> priorities: an irq 2 handler can be interrupted by the long irq 1
> handler, maybe before it gets to do anything useful, and then the irq
> 2 interrupt doesn't have low latency.

Yeah. We have a huge number of possible priorities on some ppc64
interrupt controllers but it turns out we dont have a concept of
priority or tolerance of latency for devices in Linux. I could see us
wanting serial IRQs to have a higher priority than disk IRQs if the
information was there for us to exploit.

In the end we map all IRQs to the same priority on ppc64 which means we
will never take a recursive IRQ due to devices.

Anton

2004-03-19 14:39:55

by Jamie Lokier

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

Richard B. Johnson wrote:
> > For those rare occasions when an interrupt handler wants to
> > re-enable interrupts (sti), _then_ it could mask the interrupt
> > that called the handler.
>
> It would work. However, the driver would then have to "know"
> if the interrupt came from IO-APIC or from the 8259. It also
> would have to "know" what IRQ it was actually using, etc.,

The generic interrupt handlers in Linux _do_ know all that. They
need to, to mask the interrupt :)

The driver would not need to know: a global per-cpu variable can keep
track of where the most recent interrupt came from.

> So, all the dirty details were put in the kernel code so that the
> ISR only needs to know it was called as a result of an interrupt.

Yes, but that doesn't explain the masking. It explains why there are
generic interrupt handlers (which know about the 8259, various APIC
etc.) and the driver interrupt handlers.

> There is no problem with masking ON/OFF the interrupt
> input to the 8259, In fact, this can be used to generate
> another (unreliable) edge if the IRQ line is still asserted.

No, there is no problem. But there is a performance impact. Masking
and unmasking the 8259 is an I/O operation which may be quite slow.

If we don't need it, why do we take the performance hit?

> The IRQ7 spurious is usually an artifact of a crappy motherboard
> design [+ good explanation].

Thanks. That was very informative.

-- Jamie

2004-03-19 14:57:00

by Jamie Lokier

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

Anton Blanchard wrote:
> > Indeed. But why? What's the advantage?
>
> We enable IRQs during IRQ processing on ppc64 for one reason. We set the
> IPI priority higher than normal IRQs so we can service it as soon as
> possible and the calling cpu can move on.

Yes: when there are interrupt priorities, then enabling them at the
CPU and masking them at the controller is required.

Is that the reason for masking 8259 interrupts on x86 Linux?
I.e. are there any special "high priority" interrupts used on x86 Linux?

Otherwise, I don't see why we have the overhead of the extra I/O
operations to mask and unmask them. I'm sure there's a very good
reason: Linus wouldn't have written or accepted that code unless there
was a very good reason. But I would love to know what it is!

-- Jamie

2004-03-19 22:06:44

by Guennadi Liakhovetski

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Fri, 19 Mar 2004, Maciej W. Rozycki wrote:

> The best way to deal with spurious interrupts is to ack the interrupt at
> the device ASAP in the handler, especially if you know that the response
> is slow.

I am getting those from the lAPIC timer interrupt (on a VIA KM133 Duron
system). And the APIC timer interrupt IS acked (almost) immediately. So, I
have a choice: no NMI watchdog or that uncomfortably increasing ERR:
counter. Kernel 2.6.3.

Guennadi
---
Guennadi Liakhovetski


2004-03-21 17:58:17

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Friday 19 March 2004 14:48, Richard B. Johnson wrote:
>
> The IRQ7 spurious is usually an artifact of a crappy motherboard
> design where the CPU "thinks" it was interrupted, but the
> controller didn't wiggle the CPUs INT line.

Thanks for the nice explanation, Richard.

I even see them on my x86_64 box in 64 bit mode. (K8VT800 based)
Furtunately only occasionally.

I thought, AMD took the chance to fix that kind of crap in the new
architecture, but obviously they failed in this respect :-(

Pete

2004-03-22 09:03:05

by Maciej W. Rozycki

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Fri, 19 Mar 2004, Guennadi Liakhovetski wrote:

> > The best way to deal with spurious interrupts is to ack the interrupt at
> > the device ASAP in the handler, especially if you know that the response
> > is slow.
>
> I am getting those from the lAPIC timer interrupt (on a VIA KM133 Duron
> system). And the APIC timer interrupt IS acked (almost) immediately. So, I
> have a choice: no NMI watchdog or that uncomfortably increasing ERR:
> counter. Kernel 2.6.3.

Do you really get "spurious 8259A interrupt" messages for the local APIC
timer??? They don't ever leave the unit bound to the processor -- it has
to be something else. What is your contents of /proc/interrupts?

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-22 09:12:15

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Sun, 21 Mar 2004, Hans-Peter Jansen wrote:

> > The IRQ7 spurious is usually an artifact of a crappy motherboard
> > design where the CPU "thinks" it was interrupted, but the
> > controller didn't wiggle the CPUs INT line.
>
> Thanks for the nice explanation, Richard.

Unfortunately this needs not be the reason. Another possibility is a
crappy driver -- if a device generates a level-triggered interrupt which
does not deassert immediately after getting acked (perhaps because the IRQ
line is firmware-driven) and the handler in the driver doesn't ack it soon
enough, it's possible for the interrupt line to be still asserted after
exiting the handler. The processor may have enough time to accept the
interrupt again and with the right timing, the line may go inactive right
in a middle of the processor's interrupt acknowledge sequence. The 8259A
PIC will signal a spurious interrupt in this case.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-22 12:28:30

by Richard B. Johnson

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Sun, 21 Mar 2004, Hans-Peter Jansen wrote:

> On Friday 19 March 2004 14:48, Richard B. Johnson wrote:
> >
> > The IRQ7 spurious is usually an artifact of a crappy motherboard
> > design where the CPU "thinks" it was interrupted, but the
> > controller didn't wiggle the CPUs INT line.
>
> Thanks for the nice explanation, Richard.
>
> I even see them on my x86_64 box in 64 bit mode. (K8VT800 based)
> Furtunately only occasionally.
>
> I thought, AMD took the chance to fix that kind of crap in the new
> architecture, but obviously they failed in this respect :-(
>
> Pete

It isn't CPU-specific. It's motherboard glitch specific. If there
is ground-bounce on the motherboard or excessive induced
coupling, the CPU may occasionally get hit with a logic-level
that it "thinks" is an interrupt, even though no controller
actually generated it. Sometimes you can find a power supply
that helps. Power supplies can cause such problems if
a dynamic load (from the CPU executing some variable-load
pattern), coincides with some not-to-well damped pole in
the power-supply regulator feedback. This can cause a
periodic bounce (like 100 HZ) that causes logic levels
to go into and out of spec during certain execution
sequences. This can cause actual triggers to be sent
to the CPUs maskable and non-maskable interrupt pins.
Since the CPUs now-days have multiple levels of regulators,
their voltages are relatively constant. This means their
response to input logic levels won't track with something
tied only to the primary regulator in the cheapie power
supply.

In any event, spurious interrupts are hardware events,
not software. If you don't get too many of them they
are not bothersome and might even be called "normal".


Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.


2004-03-22 21:16:49

by Guennadi Liakhovetski

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Mon, 22 Mar 2004, Maciej W. Rozycki wrote:

> Do you really get "spurious 8259A interrupt" messages for the local APIC
> timer??? They don't ever leave the unit bound to the processor -- it has
> to be something else. What is your contents of /proc/interrupts?

Ok, here's exactly, what I see:
1) during start-up 1 message
spurious 8259A interrupt: IRQ7.
2) at run-time ERR: count increases - sometimes several per
second, sometimes it remains constant for some time.
3) No more "spurious" messages
4) I saw definitely situations, when between 2 /proc/interrupts snapshots
the sum of all (except the timer) interrupts was smaller, than the number
of errors, e.g.

CPU0 (2nd shot)
0: 36557 37638 +1081 XT-PIC timer
1: 59 65 +6 XT-PIC i8042
2: 0 0 XT-PIC cascade
5: 0 0 XT-PIC VIA686A
8: 3 3 XT-PIC rtc
9: 0 0 XT-PIC acpi, uhci_hcd, uhci_hcd
10: 0 0 XT-PIC eth0
12: 84 84 XT-PIC i8042
14: 1910 1918 +8 XT-PIC ide0
15: 1 1 XT-PIC ide1
NMI: 18 18
LOC: 36460 37541 +1081
ERR: 36 57 +21

ide0 + i8042 (keyboard) = 14, whereas errors increased by 21. So, if you
are right, than Alan's wrong (or my understanding of his statement), and
those spurious interrupts occur not only after real ones, or, one real
interrupt can produce several spurious ones.

Thanks
Guennadi
---
Guennadi Liakhovetski


2004-03-22 22:10:40

by Richard B. Johnson

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Mon, 22 Mar 2004, Guennadi Liakhovetski wrote:

> On Mon, 22 Mar 2004, Maciej W. Rozycki wrote:
>
> > Do you really get "spurious 8259A interrupt" messages for the local APIC
> > timer??? They don't ever leave the unit bound to the processor -- it has
> > to be something else. What is your contents of /proc/interrupts?
>
> Ok, here's exactly, what I see:
> 1) during start-up 1 message
> spurious 8259A interrupt: IRQ7.
> 2) at run-time ERR: count increases - sometimes several per
> second, sometimes it remains constant for some time.
> 3) No more "spurious" messages
> 4) I saw definitely situations, when between 2 /proc/interrupts snapshots
> the sum of all (except the timer) interrupts was smaller, than the number
> of errors, e.g.
>
> CPU0 (2nd shot)
> 0: 36557 37638 +1081 XT-PIC timer
> 1: 59 65 +6 XT-PIC i8042
> 2: 0 0 XT-PIC cascade
> 5: 0 0 XT-PIC VIA686A
> 8: 3 3 XT-PIC rtc
> 9: 0 0 XT-PIC acpi, uhci_hcd, uhci_hcd
> 10: 0 0 XT-PIC eth0
> 12: 84 84 XT-PIC i8042
> 14: 1910 1918 +8 XT-PIC ide0
> 15: 1 1 XT-PIC ide1
> NMI: 18 18
> LOC: 36460 37541 +1081
> ERR: 36 57 +21
>

First, you are using the 8259A (XT-PIC). This means you have
IO-APIC turned off (or it doesn't exist).

> ide0 + i8042 (keyboard) = 14, whereas errors increased by 21. So, if you
> are right, than Alan's wrong (or my understanding of his statement), and
> those spurious interrupts occur not only after real ones, or, one real
> interrupt can produce several spurious ones.

Neither. They are not related. As previously stated, a spurious
interrupt occurs when the CPU INT line becomes active, but no
interrupt controller caused it to happen. It's just that simple.
There is no magic. Given that the CPU needs to have a vector
placed on the bus every time the INT line goes active, the
interrupt controller says; "What's a mother to do?". To placate
the CPU and terminate the INT/INTA cycle, the controller puts
its lowest-priority vector on the bus. The CPU will eventually
branch to the address specified for that vector. The code there
says; "I'm IRQ7, I'm not even hooked...This must be a spurious
interrupt..."

Again, it's a hardware problem. It can't be fixed in software.
Of course, you could get rid of the "!&@#)$^#$)!@^$" message
that's mucking everybody up. Then you could use defective hardware
just like Win$


>
> Thanks
> Guennadi
> ---
> Guennadi Liakhovetski


Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.


2004-03-22 23:10:38

by Guennadi Liakhovetski

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Mon, 22 Mar 2004, Richard B. Johnson wrote:

> On Mon, 22 Mar 2004, Guennadi Liakhovetski wrote:
>
> > CPU0 (2nd shot)
> > 0: 36557 37638 +1081 XT-PIC timer
> > 1: 59 65 +6 XT-PIC i8042
> > 2: 0 0 XT-PIC cascade
> > 5: 0 0 XT-PIC VIA686A
> > 8: 3 3 XT-PIC rtc
> > 9: 0 0 XT-PIC acpi, uhci_hcd, uhci_hcd
> > 10: 0 0 XT-PIC eth0
> > 12: 84 84 XT-PIC i8042
> > 14: 1910 1918 +8 XT-PIC ide0
> > 15: 1 1 XT-PIC ide1
> > NMI: 18 18
> > LOC: 36460 37541 +1081
> > ERR: 36 57 +21
>
> First, you are using the 8259A (XT-PIC). This means you have
> IO-APIC turned off (or it doesn't exist).

I know. I never said there was one. I said, that the local APIC is used
for timer interupts - at least, this is how I interpret

Using local APIC timer interrupts.
calibrating APIC timer ...

Am I missing anything trivial?

> > ide0 + i8042 (keyboard) = 14, whereas errors increased by 21. So, if you
> > are right, than Alan's wrong (or my understanding of his statement), and
> > those spurious interrupts occur not only after real ones, or, one real
> > interrupt can produce several spurious ones.
>
> Neither. They are not related. As previously stated, a spurious
> interrupt occurs when the CPU INT line becomes active, but no
> interrupt controller caused it to happen. It's just that simple.

Yes, I saw this your explanation. Thanks again. But, I am not getting
those errors with local APIC disabled. That's why I thought "local APIC ->
timer -> spurious interrupts." Maybe I am wrong. But I also can't see how
enabling the lapic can cause, e.g., power supply glitches to become
visible. I would be happy and grateful to hear an explanation.

Thanks
Guennadi
---
Guennadi Liakhovetski


2004-03-22 23:35:21

by Richard B. Johnson

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Tue, 23 Mar 2004, Guennadi Liakhovetski wrote:

> On Mon, 22 Mar 2004, Richard B. Johnson wrote:
>
> > On Mon, 22 Mar 2004, Guennadi Liakhovetski wrote:
> >
> > > CPU0 (2nd shot)
> > > 0: 36557 37638 +1081 XT-PIC timer
> > > 1: 59 65 +6 XT-PIC i8042
> > > 2: 0 0 XT-PIC cascade
> > > 5: 0 0 XT-PIC VIA686A
> > > 8: 3 3 XT-PIC rtc
> > > 9: 0 0 XT-PIC acpi, uhci_hcd, uhci_hcd
> > > 10: 0 0 XT-PIC eth0
> > > 12: 84 84 XT-PIC i8042
> > > 14: 1910 1918 +8 XT-PIC ide0
> > > 15: 1 1 XT-PIC ide1
> > > NMI: 18 18
> > > LOC: 36460 37541 +1081
> > > ERR: 36 57 +21
> >
> > First, you are using the 8259A (XT-PIC). This means you have
> > IO-APIC turned off (or it doesn't exist).
>
> I know. I never said there was one. I said, that the local APIC is used
> for timer interupts - at least, this is how I interpret
>
> Using local APIC timer interrupts.
> calibrating APIC timer ...
>
> Am I missing anything trivial?
>

Yes. The interrupt status, above, clearly shows that the XT-PIC is
being used for timer interrupts. The local APIC timer is being used
instead of the PIT (Programmable Interval Timer at port 0x40,
channel 0). The IO-APIC contains, several timers as well as a
programmable interrupt controller and router, etc. You are not
using its interrupt controller, but you are using its timer,
best I can see.

> > > ide0 + i8042 (keyboard) = 14, whereas errors increased by 21. So, if you
> > > are right, than Alan's wrong (or my understanding of his statement), and
> > > those spurious interrupts occur not only after real ones, or, one real
> > > interrupt can produce several spurious ones.
> >
> > Neither. They are not related. As previously stated, a spurious
> > interrupt occurs when the CPU INT line becomes active, but no
> > interrupt controller caused it to happen. It's just that simple.
>
> Yes, I saw this your explanation. Thanks again. But, I am not getting
> those errors with local APIC disabled. That's why I thought "local APIC ->
> timer -> spurious interrupts." Maybe I am wrong. But I also can't see how
> enabling the lapic can cause, e.g., power supply glitches to become
> visible. I would be happy and grateful to hear an explanation.
>

Once you enable some other path to the CPUs INT line, you can
get some other condition. These paths may pick up different
noise or their logic returns may have different ground-bounce
conditions. Sometimes you can fix glitches like this by:

(1) Putting a metal post and a screw in every mounting
hole in your motherboard.... OR.

(2) Using only 1 or 2 metal posts to mount your motherboard
and using plastic insulators for the other holes.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.


2004-03-23 10:26:43

by Maciej W. Rozycki

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Mon, 22 Mar 2004, Guennadi Liakhovetski wrote:

> 1) during start-up 1 message
> spurious 8259A interrupt: IRQ7.

That's probably a result of our local APIC setup code being a bit
careless when seting up the ExtINTA mode. The mode is used for a
cooperation between an 8259A PIC and a local APIC in the so-called
virtual-wire mode.

> 2) at run-time ERR: count increases - sometimes several per
> second, sometimes it remains constant for some time.
> 3) No more "spurious" messages

It's output at most once not to clutter the log.

> 4) I saw definitely situations, when between 2 /proc/interrupts snapshots
> the sum of all (except the timer) interrupts was smaller, than the number
> of errors, e.g.
>
> CPU0 (2nd shot)
> 0: 36557 37638 +1081 XT-PIC timer
> 1: 59 65 +6 XT-PIC i8042
> 2: 0 0 XT-PIC cascade
> 5: 0 0 XT-PIC VIA686A
> 8: 3 3 XT-PIC rtc
> 9: 0 0 XT-PIC acpi, uhci_hcd, uhci_hcd
> 10: 0 0 XT-PIC eth0
> 12: 84 84 XT-PIC i8042
> 14: 1910 1918 +8 XT-PIC ide0
> 15: 1 1 XT-PIC ide1
> NMI: 18 18
> LOC: 36460 37541 +1081
> ERR: 36 57 +21
>
> ide0 + i8042 (keyboard) = 14, whereas errors increased by 21. So, if you
> are right, than Alan's wrong (or my understanding of his statement), and
> those spurious interrupts occur not only after real ones, or, one real
> interrupt can produce several spurious ones.

The PIT timer (IRQ 0 above) is edge-triggered, so it cannot cause
spurious interrupts as a trail of real ones. Ditto for the keyboard
controller (IRQ 1) and onboard IDE (IRQ 14). The local APIC timer
interrupt (LOC) doesn't go through the 8259A.

So without additonal debugging, I'd suspect noise on the interrupt lines,
either due to a board design error or an erratum in one of devices (IRQ
9?).

Maciej

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-23 10:29:43

by Maciej W. Rozycki

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Mon, 22 Mar 2004, Richard B. Johnson wrote:

> First, you are using the 8259A (XT-PIC). This means you have
> IO-APIC turned off (or it doesn't exist).

An I/O APIC can be used for the wirtual-wire mode as well. Using the
8259A doesn't preclude using an I/O APIC semi-transparently, with ExtINTA
messages travelling across the inter-APIC bus (depending on an APIC
implementation).

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-23 10:32:20

by Maciej W. Rozycki

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Mon, 22 Mar 2004, Richard B. Johnson wrote:

> Yes. The interrupt status, above, clearly shows that the XT-PIC is
> being used for timer interrupts. The local APIC timer is being used
> instead of the PIT (Programmable Interval Timer at port 0x40,

The local APIC timer is never used "instead" of the PIT. If both timers
are available, they are used independently for different puproses.

> channel 0). The IO-APIC contains, several timers as well as a
> programmable interrupt controller and router, etc. You are not
> using its interrupt controller, but you are using its timer,
> best I can see.

There is no timer source in any I/O APIC I know of.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-23 10:42:30

by Maciej W. Rozycki

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Tue, 23 Mar 2004, Guennadi Liakhovetski wrote:

> Yes, I saw this your explanation. Thanks again. But, I am not getting
> those errors with local APIC disabled. That's why I thought "local APIC ->

Is the local APIC normally disabled, i.e. do you see a message like:
"Local APIC disabled by BIOS -- reenabling." when you boot with your local
APIC enabled? That might explain the difference.

> timer -> spurious interrupts." Maybe I am wrong. But I also can't see how
> enabling the lapic can cause, e.g., power supply glitches to become
> visible. I would be happy and grateful to hear an explanation.

I don't know what setup you are useing, but depending on the
implementation, the local APIC may treat ExtINTA interrupts as
level-triggered or as edge-triggered. The latter setup is a design error
in my opinion (the 8259A has a level-triggered output) and may lead to
what you observe. As the local APIC latches edge-triggered interrupts it
receives (unlike the 8259A) a glitch on an interrupt line does not have to
last long enough for a CPU to accept it for a spurious interrupt to be
recorded.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-03-23 23:06:26

by Guennadi Liakhovetski

[permalink] [raw]
Subject: RE: spurious 8259A interrupt

On Tue, 23 Mar 2004, Maciej W. Rozycki wrote:

> On Tue, 23 Mar 2004, Guennadi Liakhovetski wrote:
>
> > Yes, I saw this your explanation. Thanks again. But, I am not getting
> > those errors with local APIC disabled. That's why I thought "local APIC ->
>
> Is the local APIC normally disabled, i.e. do you see a message like:
> "Local APIC disabled by BIOS -- reenabling." when you boot with your local
> APIC enabled? That might explain the difference.

Yes.

> what you observe. As the local APIC latches edge-triggered interrupts it
> receives (unlike the 8259A) a glitch on an interrupt line does not have to
> last long enough for a CPU to accept it for a spurious interrupt to be
> recorded.

Ok, I'll buy it. And thanks for the explanations!

Guennadi
---
Guennadi Liakhovetski


2004-03-24 15:28:24

by Jamie Lokier

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

Richard B. Johnson wrote:
> It isn't CPU-specific. It's motherboard glitch specific. If there
> is ground-bounce on the motherboard or excessive induced
> coupling, the CPU may occasionally get hit with a logic-level
> that it "thinks" is an interrupt, even though no controller
> actually generated it.

That doesn't seem plausible on an otherwise reliable computer.

Why would interrupt lines suffer ground-bounce logic glitches yet all
the data, address and control lines be fine?

-- Jamie

2004-03-24 15:55:43

by Richard B. Johnson

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Wed, 24 Mar 2004, Jamie Lokier wrote:

> Richard B. Johnson wrote:
> > It isn't CPU-specific. It's motherboard glitch specific. If there
> > is ground-bounce on the motherboard or excessive induced
> > coupling, the CPU may occasionally get hit with a logic-level
> > that it "thinks" is an interrupt, even though no controller
> > actually generated it.
>
> That doesn't seem plausible on an otherwise reliable computer.
>
> Why would interrupt lines suffer ground-bounce logic glitches yet all
> the data, address and control lines be fine?
>
> -- Jamie

It is absolutely plausible and, in fact, is what's happening.
An interrupt request is generated by hardware. There is no way
that mucking around with anything in the controller using
software can generate the spurious interrupt. even if you
deliberately enabled interrupts that were not connected
anywhere, they are always pulled to a logic-inactive state
by hardware. I keep hearing these buzzwords like INTA and
DTACK, etc., like you could somehow touch these PWB traces
in software. You can't. The confusion may come about because
with ix86 CPUs it is possible to generate a software interrupt.
I forget what the IRQ offset in Linux is, and I'm not going
to waste time looking it up, I think it's 0x70. So, if a
privileged task executed "int $0x77", the IRQ offset plus
the IRQ number of IRQ7, the code that says "spurious interrupt"
would get executed.

The trace going to the CPU for interrupt have absolutely nothing
in common with any buses. So, the fact that your computer
is otherwise reliable is not relevant. In fact, the spurious
interrupt is not relevant. The CPU time consumed by an
occasional glitch won't affect anything. The kernel logging
message should be commented out.

FYI all your computer hardware is unreliable. There are even bits
getting flipped in memory by cosmic rays as I write this. However,
the likelihood of a bit being flipped in memory that is currently
in use by any program or the kernel at the instant it's flipped
is low enough so you are unlikely to be affected.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.


2004-03-24 15:59:40

by Gabriel Paubert

[permalink] [raw]
Subject: Re: spurious 8259A interrupt

On Wed, Mar 24, 2004 at 03:28:00PM +0000, Jamie Lokier wrote:
> Richard B. Johnson wrote:
> > It isn't CPU-specific. It's motherboard glitch specific. If there
> > is ground-bounce on the motherboard or excessive induced
> > coupling, the CPU may occasionally get hit with a logic-level
> > that it "thinks" is an interrupt, even though no controller
> > actually generated it.
>
> That doesn't seem plausible on an otherwise reliable computer.
>
> Why would interrupt lines suffer ground-bounce logic glitches yet all
> the data, address and control lines be fine?

Two reasons at least:
- the data/address lines are always driven by a buffer when there
a transfer is taking place, while the interrupt lines are permanently
monitored but most of the time only held by passive pull-ups of a
much higher impedance.

- board designers know that the timing of data and addresses are
critical and take care during the layout. Interrupt lines come last
and are routed where there is room left, after all these are low
frequency signals...

Regards,
Gabriel