2002-01-03 19:56:14

by Steve Snyder

[permalink] [raw]
Subject: "APIC error on CPUx" - what does this mean?

I just noticed the following events in my system log:

Jan 3 14:03:39 mercury kernel: APIC error on CPU1: 00(02)
Jan 3 14:03:39 mercury kernel: APIC error on CPU0: 00(02)

Below I've listed the CPU/APIC-related parts of my system start-up.

This kernel is not patched with anything; it's just a generic v2.4.17.

I didn't notice any problem in operation when these events were logged.

Looking at the older logs I see that I got the same set of events on Dec.
8th, so I guess that rules out any 2.4.17-specific changes as a culprit.
Also, I wasn't updating the CPU microcode (see below) on Dec. 8th, so this
can be held blameless as well.

Please let me know if I can provide any additional info needed to diagnose
this error.

Thanks.

-------------------------------

kernel: Linux version 2.4.17 ([email protected]) (gcc version
2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Fri Dec 21 17:19:29 EST
2001
kernel: BIOS-provided physical RAM map:
kernel: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
kernel: BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
kernel: BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
kernel: BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
kernel: BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
kernel: BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
kernel: BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
kernel: found SMP MP-table at 000f5a60
kernel: hm, page 000f5000 reserved twice.
kernel: hm, page 000f6000 reserved twice.
kernel: hm, page 000f1000 reserved twice.
kernel: hm, page 000f2000 reserved twice.
kernel: On node 0 totalpages: 131056
kernel: zone(0): 4096 pages.
kernel: zone(1): 126960 pages.
kernel: zone(2): 0 pages.
kernel: Intel MultiProcessor Specification v1.1
kernel: Virtual Wire compatibility mode.
kernel: OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
kernel: Processor #0 Pentium(tm) Pro APIC version 17
kernel: Processor #1 Pentium(tm) Pro APIC version 17
kernel: I/O APIC #2 Version 17 at 0xFEC00000.
kernel: Processors: 2
kernel: Kernel command line: ro root=/dev/sda3 nousb
kernel: Initializing CPU#0
kernel: Detected 847.753 MHz processor.
kernel: Console: colour VGA+ 80x25
kernel: Calibrating delay loop... 1690.82 BogoMIPS
kernel: Memory: 513464k/524224k available (1227k kernel code, 10372k
reserved, 334k data, 212k init, 0k highmem)
kernel: Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
kernel: Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
kernel: Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
kernel: Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
kernel: Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
kernel: CPU: L1 I cache: 16K, L1 D cache: 16K
kernel: CPU: L2 cache: 256K
kernel: Intel machine check architecture supported.
kernel: Intel machine check reporting enabled on CPU#0.
kernel: Enabling fast FPU save and restore... done.
kernel: Enabling unmasked SIMD FPU exception support... done.
kernel: Checking 'hlt' instruction... OK.
kernel: POSIX conformance testing by UNIFIX
kernel: mtrr: v1.40 (20010327) Richard Gooch ([email protected])
kernel: mtrr: detected mtrr type: Intel
kernel: CPU: L1 I cache: 16K, L1 D cache: 16K
kernel: CPU: L2 cache: 256K
random: Initializing random number generator: succeededJan 2 23:06:08
kernel: Intel machine check reporting enabled on CPU#0.
kernel: CPU0: Intel Pentium III (Coppermine) stepping 03
kernel: per-CPU timeslice cutoff: 731.29 usecs.
kernel: enabled ExtINT on CPU#0
kernel: ESR value before enabling vector: 00000000
kernel: ESR value after enabling vector: 00000000
kernel: Booting processor 1/1 eip 2000
kernel: Initializing CPU#1
kernel: masked ExtINT on CPU#1
kernel: ESR value before enabling vector: 00000000
kernel: ESR value after enabling vector: 00000000
kernel: Calibrating delay loop... 1690.82 BogoMIPS
kernel: CPU: L1 I cache: 16K, L1 D cache: 16K
kernel: CPU: L2 cache: 256K
kernel: Intel machine check reporting enabled on CPU#1.
kernel: CPU1: Intel Pentium III (Coppermine) stepping 03
kernel: Total of 2 processors activated (3381.65 BogoMIPS).
kernel: ENABLING IO-APIC IRQs
kernel: Setting 2 in the phys_id_present_map
kernel: ...changing IO-APIC physical APIC ID to 2 ... ok.
kernel: ..TIMER: vector=0x31 pin1=2 pin2=0
kernel: testing the IO APIC.......................
kernel:
kernel: .................................... done.
kernel: Using local APIC timer interrupts.
kernel: calibrating APIC timer ...
kernel: ..... CPU clock speed is 847.7664 MHz.
kernel: ..... host bus clock speed is 99.7368 MHz.
kernel: cpu: 0, clocks: 997368, slice: 332456
kernel: CPU0<T0:997360,T1:664896,D:8,S:332456,C:997368>
kernel: cpu: 1, clocks: 997368, slice: 332456
kernel: CPU1<T0:997360,T1:332448,D:0,S:332456,C:997368>
kernel: checking TSC synchronization across CPUs: passed.
kernel: Waiting on wait_init_idle (map = 0x2)
kernel: All processors have done init_idle
kernel: mtrr: your CPUs had inconsistent fixed MTRR settings
kernel: mtrr: probably your BIOS does not setup all CPUsJan 2 23:06:09
kernel: PCI: PCI BIOS revision 2.10 entry at 0xfb150, last bus=1
kernel: PCI: Using configuration type 1
kernel: PCI: Probing PCI hardware
kernel: PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
kernel: Limiting direct PCI/PCI transfers.
.
.
.
kernel: IA-32 Microcode Update Driver: v1.09 <[email protected]>
kernel: microcode: CPU1 updated from revision 12 to 19, date=02062001
kernel: microcode: CPU0 updated from revision 12 to 19, date=02062001
kernel: microcode: freed 4096 bytes
kernel: IA-32 Microcode Update Driver v1.09 unregistered


2002-01-03 20:10:54

by listmail

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

Not sure if it will help or if its relavant toy uor situation, but I was
seeing this often several months ago on one of my machines. I took it
down for some work one day, and just happened to take the heat sync off...
This machine has a celeron CPU in it(2 in fact) well you know how you can
see the tops of the PINs in the packaging material. turns out that my
cooler was working too well, and I had been getting condensation on the
underside of the heatsync. well that caused many pins to become cross
connected, due to the oxidation of the metal. Your issue might be
similar.

On Thu, 3 Jan 2002, Steve Snyder wrote:

> I just noticed the following events in my system log:
>
> Jan 3 14:03:39 mercury kernel: APIC error on CPU1: 00(02)
> Jan 3 14:03:39 mercury kernel: APIC error on CPU0: 00(02)
>
> Below I've listed the CPU/APIC-related parts of my system start-up.
>
> This kernel is not patched with anything; it's just a generic v2.4.17.
>
> I didn't notice any problem in operation when these events were logged.
>
> Looking at the older logs I see that I got the same set of events on Dec.
> 8th, so I guess that rules out any 2.4.17-specific changes as a culprit.
> Also, I wasn't updating the CPU microcode (see below) on Dec. 8th, so this
> can be held blameless as well.
>
> Please let me know if I can provide any additional info needed to diagnose
> this error.
>
> Thanks.
>
> -------------------------------
>
> kernel: Linux version 2.4.17 ([email protected]) (gcc version
> 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Fri Dec 21 17:19:29 EST
> 2001
> kernel: BIOS-provided physical RAM map:
> kernel: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> kernel: BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> kernel: BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
> kernel: BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
> kernel: BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
> kernel: BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
> kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> kernel: BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
> kernel: found SMP MP-table at 000f5a60
> kernel: hm, page 000f5000 reserved twice.
> kernel: hm, page 000f6000 reserved twice.
> kernel: hm, page 000f1000 reserved twice.
> kernel: hm, page 000f2000 reserved twice.
> kernel: On node 0 totalpages: 131056
> kernel: zone(0): 4096 pages.
> kernel: zone(1): 126960 pages.
> kernel: zone(2): 0 pages.
> kernel: Intel MultiProcessor Specification v1.1
> kernel: Virtual Wire compatibility mode.
> kernel: OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
> kernel: Processor #0 Pentium(tm) Pro APIC version 17
> kernel: Processor #1 Pentium(tm) Pro APIC version 17
> kernel: I/O APIC #2 Version 17 at 0xFEC00000.
> kernel: Processors: 2
> kernel: Kernel command line: ro root=/dev/sda3 nousb
> kernel: Initializing CPU#0
> kernel: Detected 847.753 MHz processor.
> kernel: Console: colour VGA+ 80x25
> kernel: Calibrating delay loop... 1690.82 BogoMIPS
> kernel: Memory: 513464k/524224k available (1227k kernel code, 10372k
> reserved, 334k data, 212k init, 0k highmem)
> kernel: Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
> kernel: Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
> kernel: Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
> kernel: Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
> kernel: Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
> kernel: CPU: L1 I cache: 16K, L1 D cache: 16K
> kernel: CPU: L2 cache: 256K
> kernel: Intel machine check architecture supported.
> kernel: Intel machine check reporting enabled on CPU#0.
> kernel: Enabling fast FPU save and restore... done.
> kernel: Enabling unmasked SIMD FPU exception support... done.
> kernel: Checking 'hlt' instruction... OK.
> kernel: POSIX conformance testing by UNIFIX
> kernel: mtrr: v1.40 (20010327) Richard Gooch ([email protected])
> kernel: mtrr: detected mtrr type: Intel
> kernel: CPU: L1 I cache: 16K, L1 D cache: 16K
> kernel: CPU: L2 cache: 256K
> random: Initializing random number generator: succeededJan 2 23:06:08
> kernel: Intel machine check reporting enabled on CPU#0.
> kernel: CPU0: Intel Pentium III (Coppermine) stepping 03
> kernel: per-CPU timeslice cutoff: 731.29 usecs.
> kernel: enabled ExtINT on CPU#0
> kernel: ESR value before enabling vector: 00000000
> kernel: ESR value after enabling vector: 00000000
> kernel: Booting processor 1/1 eip 2000
> kernel: Initializing CPU#1
> kernel: masked ExtINT on CPU#1
> kernel: ESR value before enabling vector: 00000000
> kernel: ESR value after enabling vector: 00000000
> kernel: Calibrating delay loop... 1690.82 BogoMIPS
> kernel: CPU: L1 I cache: 16K, L1 D cache: 16K
> kernel: CPU: L2 cache: 256K
> kernel: Intel machine check reporting enabled on CPU#1.
> kernel: CPU1: Intel Pentium III (Coppermine) stepping 03
> kernel: Total of 2 processors activated (3381.65 BogoMIPS).
> kernel: ENABLING IO-APIC IRQs
> kernel: Setting 2 in the phys_id_present_map
> kernel: ...changing IO-APIC physical APIC ID to 2 ... ok.
> kernel: ..TIMER: vector=0x31 pin1=2 pin2=0
> kernel: testing the IO APIC.......................
> kernel:
> kernel: .................................... done.
> kernel: Using local APIC timer interrupts.
> kernel: calibrating APIC timer ...
> kernel: ..... CPU clock speed is 847.7664 MHz.
> kernel: ..... host bus clock speed is 99.7368 MHz.
> kernel: cpu: 0, clocks: 997368, slice: 332456
> kernel: CPU0<T0:997360,T1:664896,D:8,S:332456,C:997368>
> kernel: cpu: 1, clocks: 997368, slice: 332456
> kernel: CPU1<T0:997360,T1:332448,D:0,S:332456,C:997368>
> kernel: checking TSC synchronization across CPUs: passed.
> kernel: Waiting on wait_init_idle (map = 0x2)
> kernel: All processors have done init_idle
> kernel: mtrr: your CPUs had inconsistent fixed MTRR settings
> kernel: mtrr: probably your BIOS does not setup all CPUsJan 2 23:06:09
> kernel: PCI: PCI BIOS revision 2.10 entry at 0xfb150, last bus=1
> kernel: PCI: Using configuration type 1
> kernel: PCI: Probing PCI hardware
> kernel: PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
> kernel: Limiting direct PCI/PCI transfers.
> .
> .
> .
> kernel: IA-32 Microcode Update Driver: v1.09 <[email protected]>
> kernel: microcode: CPU1 updated from revision 12 to 19, date=02062001
> kernel: microcode: CPU0 updated from revision 12 to 19, date=02062001
> kernel: microcode: freed 4096 bytes
> kernel: IA-32 Microcode Update Driver v1.09 unregistered
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-01-03 21:47:50

by Ryan Cumming

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On January 3, 2002 11:55, Steve Snyder wrote:
> I just noticed the following events in my system log:
>
> Jan 3 14:03:39 mercury kernel: APIC error on CPU1: 00(02)
> Jan 3 14:03:39 mercury kernel: APIC error on CPU0: 00(02)
>
> Below I've listed the CPU/APIC-related parts of my system start-up.

I occasionaly get the same errors on my UP XP1800+ on a KT133A MB, except
they look like this:
APIC error on CPU0: 08(02)
APIC error on CPU0: 02(08)

uname -a:
Linux phalynx 2.4.17 #8 Wed Dec 26 20:41:16 PST 2001 i686 unknown

/proc/interrupts:
CPU0
0: 66440674 IO-APIC-edge timer
1: 378128 IO-APIC-edge keyboard
2: 0 XT-PIC cascade
8: 2 IO-APIC-edge rtc
9: 13768184 IO-APIC-level eth0
10: 131 IO-APIC-level usb-uhci, usb-uhci
11: 22007511 IO-APIC-level EMU10K1
12: 10497666 IO-APIC-edge PS/2 Mouse
14: 3717339 IO-APIC-edge ide0
15: 18254 IO-APIC-edge ide1
NMI: 0
LOC: 66439861
ERR: 45
MIS: 0

2002-01-03 22:29:12

by Alan

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

> I just noticed the following events in my system log:
>
> Jan 3 14:03:39 mercury kernel: APIC error on CPU1: 00(02)
> Jan 3 14:03:39 mercury kernel: APIC error on CPU0: 00(02)
>
> Please let me know if I can provide any additional info needed to diagnose
> this error.

The occasional APIC error is fine (its logging a hardware event - probably
something that caused enough noise to lose a message and retry it). The
APIC bus is designed to stand these occasional errors

2002-01-03 23:22:45

by Pozsar Balazs

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?


But what do these messages mean?

Are they some kind of retryable errors?
Do they have any consequences?

Thanks.

On Thu, 3 Jan 2002, Alan Cox wrote:

> > I just noticed the following events in my system log:
> >
> > Jan 3 14:03:39 mercury kernel: APIC error on CPU1: 00(02)
> > Jan 3 14:03:39 mercury kernel: APIC error on CPU0: 00(02)
> >
> > Please let me know if I can provide any additional info needed to diagnose
> > this error.
>
> The occasional APIC error is fine (its logging a hardware event - probably
> something that caused enough noise to lose a message and retry it). The
> APIC bus is designed to stand these occasional errors
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Balazs Pozsar

2002-01-07 10:15:08

by Chris Wedgwood

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Thu, Jan 03, 2002 at 10:38:27PM +0000, Alan Cox wrote:

The occasional APIC error is fine (its logging a hardware event -
probably something that caused enough noise to lose a message and
retry it). The APIC bus is designed to stand these occasional
errors

I'm curious... is there any way to determine what is causing these?
On a UP athlon I have:

cw:tty5@charon(cw)$ uname -r ; uptime && grep ERR /proc/interrupts
2.4.17-rc2
02:09:50 up 4 days, 5:18, 10 users, load average: 0.00, 0.00, 0.00
ERR: 5216

which equates several per minute at times... no funny hardware, not
running X11, and I don't remembering seeing these a while ago on this
same mainboard (but I never really looked either, so that might not be
true).

On a similar Athlon box, which has been up 32 days, I have nearly 45000
events, whilst on a UP P3 which has been up for 124 days I see none,
another UP PII which has been up for 196 days see's none, an SMP P3
which has been up 150 days sees none too... is this an Athlon or VIA
chipset quirk perhaps?



--cw

2002-01-07 12:30:49

by Petr Vandrovec

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On 7 Jan 02 at 23:17, Chris Wedgwood wrote:
> On Thu, Jan 03, 2002 at 10:38:27PM +0000, Alan Cox wrote:
> The occasional APIC error is fine (its logging a hardware event -
> probably something that caused enough noise to lose a message and
> retry it). The APIC bus is designed to stand these occasional
> errors
>
> I'm curious... is there any way to determine what is causing these?
> On a UP athlon I have:
>
> cw:tty5@charon(cw)$ uname -r ; uptime && grep ERR /proc/interrupts
> 2.4.17-rc2
> 02:09:50 up 4 days, 5:18, 10 users, load average: 0.00, 0.00, 0.00
> ERR: 5216
>
> which equates several per minute at times... no funny hardware, not
> running X11, and I don't remembering seeing these a while ago on this
> same mainboard (but I never really looked either, so that might not be
> true).

They are spurious IRQ 7, just message is printed only once during kernel
lifetime... I have about three spurious IRQ 7 per each 1000 interrupts
delivered to CPU. It is on A7V (Via KT133).
Best regards,
Petr Vandrovec
[email protected]

2002-01-07 13:05:40

by Chris Wedgwood

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Mon, Jan 07, 2002 at 01:29:42PM +0100, Petr Vandrovec wrote:

They are spurious IRQ 7, just message is printed only once during
kernel lifetime... I have about three spurious IRQ 7 per each 1000
interrupts delivered to CPU. It is on A7V (Via KT133).

Any idea _why_ these occur though? It seems some mainboards produce a
plethora of these whilst others never produce these...




--cw

2002-01-07 13:17:11

by Petr Vandrovec

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On 8 Jan 02 at 2:08, Chris Wedgwood wrote:
> On Mon, Jan 07, 2002 at 01:29:42PM +0100, Petr Vandrovec wrote:
>
> They are spurious IRQ 7, just message is printed only once during
> kernel lifetime... I have about three spurious IRQ 7 per each 1000
> interrupts delivered to CPU. It is on A7V (Via KT133).
>
> Any idea _why_ these occur though? It seems some mainboards produce a
> plethora of these whilst others never produce these...

Nope. Probably when CPU is in local APIC mode, it acknowledges interrupts
to chipset with different timming, and from time to time CPU still
sees IRQ pending, so it asks for vector, but as chipset has no
interrupt pending, it answers with IRQ7. I did no analysis to find
whether IRQ7 happens directly when we send confirmation to 8259,
or whether it happens due to some noise on IRQ line.

AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
APIC.
Best regards,
Petr Vandrovec
[email protected]

2002-01-07 13:23:34

by Alan

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

> whether IRQ7 happens directly when we send confirmation to 8259,
> or whether it happens due to some noise on IRQ line.
>
> AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
> APIC.

Are you using an AMD northbridge and VIA southbridge together ?

2002-01-07 13:40:24

by Chris Wedgwood

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Mon, Jan 07, 2002 at 02:16:28PM +0100, Petr Vandrovec wrote:

AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
APIC.

I have a dual P3 (VIA based) that does emits the Spurious Int7 message
shortly after boot (I've never checked /proc/interrupts though)
whereas my ServerWorks chipset never does this.


--cw

2002-01-07 14:40:50

by Martin Eriksson

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

----- Original Message -----
From: "Petr Vandrovec" <[email protected]>
To: "Chris Wedgwood" <[email protected]>
Cc: <[email protected]>; "Linux Kernel Mailing List"
<[email protected]>; <[email protected]>
Sent: Monday, January 07, 2002 2:16 PM
Subject: Re: "APIC error on CPUx" - what does this mean?


> On 8 Jan 02 at 2:08, Chris Wedgwood wrote:
> > On Mon, Jan 07, 2002 at 01:29:42PM +0100, Petr Vandrovec wrote:
> >
> > They are spurious IRQ 7, just message is printed only once during
> > kernel lifetime... I have about three spurious IRQ 7 per each 1000
> > interrupts delivered to CPU. It is on A7V (Via KT133).
> >
> > Any idea _why_ these occur though? It seems some mainboards produce a
> > plethora of these whilst others never produce these...
>
> Nope. Probably when CPU is in local APIC mode, it acknowledges interrupts
> to chipset with different timming, and from time to time CPU still
> sees IRQ pending, so it asks for vector, but as chipset has no
> interrupt pending, it answers with IRQ7. I did no analysis to find
> whether IRQ7 happens directly when we send confirmation to 8259,
> or whether it happens due to some noise on IRQ line.
>
> AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
> APIC.

This (APIC errors) happened very often with my BP6 board too, but with
recent kernels I don't get many of these messages.

As for spurious interrupts, I had these coming every five minutes on my
Cyrix 6x86/SiS 5597 box. But this "settled" when I moved the box to another
corner in the room, and also did some interior adjustments of cables and
such (it is a server running in a very crammed slimline desktop box, with a
HUB taped to the floppy mount =)

_____________________________________________________
| Martin Eriksson <[email protected]>
| MSc CSE student, department of Computing Science
| Ume? University, Sweden



2002-01-07 16:59:15

by Petr Vandrovec

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On 7 Jan 02 at 13:33, Alan Cox wrote:
> > whether IRQ7 happens directly when we send confirmation to 8259,
> > or whether it happens due to some noise on IRQ line.
> >
> > AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
> > APIC.
>
> Are you using an AMD northbridge and VIA southbridge together ?

No. It is fully-VIA motherboard (Asus A7V), VIA KT133 as a northbridge
and VIA686A as a southbridge, with 1GHz Athlon. And spurious IRQ happen
when either of (massive) IRQ sources (Promise UDMA, tulip-based network
card, an es137x soundcard) emits interrupts.

Problem is best visible when Promise is used in PIO mode with block size=512,
as in such case you can get thousands of IRQs from Promise in second, and
tenths of spurious IRQ7. But even if Promise emits in average one IRQ each
second (== idle system with running cron and atime updates on), I get
~10 of spurious IRQ7 during one hour.

I can get complete lspci -vvv at home, if you want.
Best regards,
Petr Vandrovec
[email protected]

2002-01-07 17:53:38

by Steven Walter

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Mon, Jan 07, 2002 at 02:16:28PM +0100, Petr Vandrovec wrote:
> Nope. Probably when CPU is in local APIC mode, it acknowledges interrupts
> to chipset with different timming, and from time to time CPU still
> sees IRQ pending, so it asks for vector, but as chipset has no
> interrupt pending, it answers with IRQ7. I did no analysis to find
> whether IRQ7 happens directly when we send confirmation to 8259,
> or whether it happens due to some noise on IRQ line.
>
> AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
> APIC.

My system is based on AMD's own 750 Irongate chipset, and it produces
both the initial spurious IRQ7 message and plentiful "ERR" interrupts:

srwalter@hapablap:~$ uptime
11:31am up 42 days, 18:19, 1 user, load average: 1.04, 1.02, 1.00
srwalter@hapablap:~$ cat /proc/interrupts | grep ERR
ERR: 67169
--
-Steven
In a time of universal deceit, telling the truth is a revolutionary act.
-- George Orwell
He's alive. He's alive! Oh, that fellow at RadioShack said I was mad!
Well, who's mad now?
-- Montgomery C. Burns

2002-01-07 18:22:28

by Rene Rebe

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

From: Steven Walter <[email protected]>
Subject: Re: "APIC error on CPUx" - what does this mean?
Date: Mon, 7 Jan 2002 11:32:27 -0600

> On Mon, Jan 07, 2002 at 02:16:28PM +0100, Petr Vandrovec wrote:
> > Nope. Probably when CPU is in local APIC mode, it acknowledges interrupts
> > to chipset with different timming, and from time to time CPU still
> > sees IRQ pending, so it asks for vector, but as chipset has no
> > interrupt pending, it answers with IRQ7. I did no analysis to find
> > whether IRQ7 happens directly when we send confirmation to 8259,
> > or whether it happens due to some noise on IRQ line.
> >
> > AFAIK it happens only on VIA based boards, and only if (AMD) CPU is using
> > APIC.
>
> My system is based on AMD's own 750 Irongate chipset, and it produces
> both the initial spurious IRQ7 message and plentiful "ERR" interrupts:

All my machines (from SiS 735, over Intel-BX to older Aladin-5
(super7) boxes) producing such "spurious IRQ7" with the actual 2.4.16
or 2.4.17 kernels ...

> srwalter@hapablap:~$ uptime
> 11:31am up 42 days, 18:19, 1 user, load average: 1.04, 1.02, 1.00
> srwalter@hapablap:~$ cat /proc/interrupts | grep ERR
> ERR: 67169
> --
> -Steven
> In a time of universal deceit, telling the truth is a revolutionary act.
> -- George Orwell
> He's alive. He's alive! Oh, that fellow at RadioShack said I was mad!
> Well, who's mad now?
> -- Montgomery C. Burns
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-01-08 12:14:41

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Mon, 7 Jan 2002, Petr Vandrovec wrote:

> No. It is fully-VIA motherboard (Asus A7V), VIA KT133 as a northbridge
> and VIA686A as a southbridge, with 1GHz Athlon. And spurious IRQ happen
> when either of (massive) IRQ sources (Promise UDMA, tulip-based network
> card, an es137x soundcard) emits interrupts.

A possible reason is the 8259A in the chipset deasserts its INT output
late enough for the Athlon CPU's local APIC to register another ExtINTA
interrupt sometimes, possibly under specific circumstances. If that's
true, either the chipset or the APIC (or both) is at fault for not meeting
the specs. It causes no system stability problem but there is a
performance hit due to these spurious interrupts.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-01-08 12:27:22

by Chris Wedgwood

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Tue, Jan 08, 2002 at 01:12:04PM +0100, Maciej W. Rozycki wrote:

A possible reason is the 8259A in the chipset deasserts its INT
output late enough for the Athlon CPU's local APIC to register
another ExtINTA interrupt sometimes, possibly under specific
circumstances.

Actully... we could potentially measure this... after an interrupt it
serviced (or before, or both) we could store the interrupt source
globally and the cycle counter... when a suprrious interrupt is
received check the last interrupt and how long ago it was and then
start looking for a pattern...



--cw

2002-01-08 17:37:01

by Petr Vandrovec

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On 9 Jan 02 at 1:30, Chris Wedgwood wrote:
> On Tue, Jan 08, 2002 at 01:12:04PM +0100, Maciej W. Rozycki wrote:
>
> A possible reason is the 8259A in the chipset deasserts its INT
> output late enough for the Athlon CPU's local APIC to register
> another ExtINTA interrupt sometimes, possibly under specific
> circumstances.
>
> Actully... we could potentially measure this... after an interrupt it
> serviced (or before, or both) we could store the interrupt source
> globally and the cycle counter... when a suprrious interrupt is
> received check the last interrupt and how long ago it was and then
> start looking for a pattern...

I instrumented kernel at home, and when spurious interrupt happens,
stack trace almost always says that spurious interrupt happened
during HLT in default_idle (if I disable ACPI...), ISR is always zero,
and IRR contains 0x00 (before parport driver is loaded) or 0x80
(after parport driver is loaded (without IRQ support)). Few times
stack trace was different, and pointed to ide__sti() in ide_do_request,
but it was < 5% of occurences.

As spurious IRQ happens during HLT, and IRR is clear at the time
we are going to ack IRQ, it looks like real spurious IRQ (caused by
noise?). Or delay between spurious one and real IRQ is really long.
I'll try some of your suggestions today night.
Best regards,
Petr Vandrovec
[email protected]

2002-01-09 12:43:14

by Petr Vandrovec

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On 8 Jan 02 at 18:35, I wrote:
>
> As spurious IRQ happens during HLT, and IRR is clear at the time
> we are going to ack IRQ, it looks like real spurious IRQ (caused by
> noise?). Or delay between spurious one and real IRQ is really long.
> I'll try some of your suggestions today night.

Hm, I'm missing something :-( It happens 4.9us after another IRQ
arrives on 8259 master, or ~100us after another IRQ arrives on slave.
Spurious IRQs do not happen when ELCR registers are cleared; unfortunately
Promise cannot live with edge trigerred interrupts.
IRQ1 is keyboard, IRQ5 es1371 and IRQ10 promise. I was able to get
spurious IRQ after either of these IRQs. I was not able to trigger
spurious IRQ from RTC irq, but maybe that I did not tried hard enough.

I have no idea why it happens 5us when IRQ arrives to master, but
100us when IRQ arrives to slave. And to make it even less clean,
when I replaced outb(xx, 0xA1) and outb(xx, 0x21) with outb_p,
spurious IRQ happens 6us when another IRQ arrive to master, but
12us when IRQ arrives to slave... I have no idea why it drops from
100us to 12us.

I have an idea that with outb() spurious IRQ happens some time
after Promise deasserts IRQ, while with outb_p() it happens when
we mask it. Unfortunately stack traces at arrival of spurious
IRQ (available on request, 120KB uncompressed) do not agree with
this idea - they always lead to default_idle. It is possible that
it always arrives in default_idle because of my CPU is so fast
that even endless stream of IRQs from 8259 arrives always when
CPU is in default_idle, but I have some doubts that 1GHz is fast
enough to see such effect.

Time stamps were obtained from TSC register of my 1GHz Athlon.
Best regards,
Petr Vandrovec
[email protected]

IRQ7 arrived, previous irq was 10, it happened 103028 ns ago
Master: ISR: 00, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4843934 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94475 ns ago
Master: ISR: 00, IRR: 84, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1586275 ns ago
Master: ISR: 04, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 5, it happened 4959 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4830426 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 5, it happened 4959 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4831445 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 5, it happened 4929 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4794576 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 5, it happened 5469 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4795415 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 5, it happened 4929 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4759656 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 5, it happened 4929 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4761185 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94389 ns ago
Master: ISR: 00, IRR: 84, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1609311 ns ago
Master: ISR: 04, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94607 ns ago
Master: ISR: 00, IRR: 84, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 195708 ns ago
Master: ISR: 04, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 101889 ns ago
Master: ISR: 00, IRR: 84, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1555839 ns ago
Master: ISR: 04, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94419 ns ago
Master: ISR: 00, IRR: 84, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1605751 ns ago
Master: ISR: 04, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 1, it happened 4930 ns ago
Master: ISR: 00, IRR: 80, IMR: DA, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 4484164 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 101889 ns ago
Master: ISR: 00, IRR: 84, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1638040 ns ago
Master: ISR: 04, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 101888 ns ago
Master: ISR: 00, IRR: 84, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1632361 ns ago
Master: ISR: 04, IRR: 80, IMR: F8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 95015 ns ago
Master: ISR: 00, IRR: 84, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1591085 ns ago
Master: ISR: 04, IRR: 80, IMR: F8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94688 ns ago
Master: ISR: 00, IRR: 84, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1591840 ns ago
Master: ISR: 04, IRR: 80, IMR: F8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94389 ns ago
Master: ISR: 00, IRR: 84, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1578193 ns ago
Master: ISR: 04, IRR: 80, IMR: F8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94444 ns ago
Master: ISR: 00, IRR: 84, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1575884 ns ago
Master: ISR: 04, IRR: 80, IMR: F8, ELCR: 20
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 101889 ns ago
Master: ISR: 00, IRR: A4, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1606411 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 103119 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 3797929 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 8495 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 00, IMR: AC, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1899226 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94419 ns ago
Master: ISR: 00, IRR: A4, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1594701 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 95105 ns ago
Master: ISR: 00, IRR: A4, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1674535 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 94389 ns ago
Master: ISR: 00, IRR: A4, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1597919 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 101889 ns ago
Master: ISR: 00, IRR: A4, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 0E
IRQ10 arrived after spurious IRQ. It happened 1640221 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 0E

IRQ7 arrived, previous irq was 10, it happened 101888 ns ago
Master: ISR: 00, IRR: A4, IMR: F8, ELCR: 00
Slave : ISR: 00, IRR: 04, IMR: A8, ELCR: 04
IRQ10 arrived after spurious IRQ. It happened 1636792 ns ago
Master: ISR: 04, IRR: A0, IMR: F8, ELCR: 00
Slave : ISR: 04, IRR: 04, IMR: A8, ELCR: 04

IRQ7 arrived, previous irq was 5, it happened 4929 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A
IRQ0 arrived after spurious IRQ. It happened 3868746 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A

IRQ7 arrived, previous irq was 5, it happened 4959 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A
IRQ0 arrived after spurious IRQ. It happened 3820746 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A

IRQ7 arrived, previous irq was 5, it happened 5004 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A
IRQ0 arrived after spurious IRQ. It happened 3822035 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A

IRQ7 arrived, previous irq was 5, it happened 4989 ns ago
Master: ISR: 00, IRR: A0, IMR: F8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A
IRQ0 arrived after spurious IRQ. It happened 3823625 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0A

IRQ7 arrived, previous irq was 1, it happened 4990 ns ago
Master: ISR: 00, IRR: 80, IMR: DA, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E
IRQ0 arrived after spurious IRQ. It happened 3422855 ns ago
Master: ISR: 01, IRR: 80, IMR: D8, ELCR: 20
Slave : ISR: 00, IRR: 00, IMR: A8, ELCR: 0E

2002-01-09 14:35:29

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: "APIC error on CPUx" - what does this mean?

On Wed, 9 Jan 2002, Petr Vandrovec wrote:

> I have an idea that with outb() spurious IRQ happens some time
> after Promise deasserts IRQ, while with outb_p() it happens when
> we mask it. Unfortunately stack traces at arrival of spurious
> IRQ (available on request, 120KB uncompressed) do not agree with
> this idea - they always lead to default_idle. It is possible that
> it always arrives in default_idle because of my CPU is so fast
> that even endless stream of IRQs from 8259 arrives always when
> CPU is in default_idle, but I have some doubts that 1GHz is fast
> enough to see such effect.

Note that if you are observing a spurious IRQ due to the trailing
interval of a level-triggered interrupt signal it's quite reasonable you
see a stack trace back to default_idle(). Such an IRQ happens just after
do_IRQ() (actually code following ret_from_intr) for the prevously
delivered real IRQ returns. If your system is not heavy loaded (little
time spent in the userland) most IRQs arrive in idle(), thus spurious ones
do so as well. A backtrace for a spurious IRQ and for the preceding real
one should point to the same root.

If you see spurious IRQs due to glitches the backtraces may differ. It
should be possible to observe under a load.

Still it's weird for *all* the devices you have to deassert their IRQ
lines so late. It would be worth verifying if that is the case, e.g. by
adding an udelay() just after code that is meant to cause a device to
deassert its IRQ line. If the problem persists regardless of the delays,
chances are there is an erratum in the APIC or in the chipset.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +