2006-05-22 07:38:14

by Vladimir Dvorak

[permalink] [raw]
Subject: APIC error on CPUx

Hello to all,

after mailserver installation im getting these messages from kernel:

APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU0: 00(40)
APIC error on CPU0: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)

Approximatelly from 2 to 5 messages per 24 hours.

Linux requisities:
Debian 3.1
Linux mailserver 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686
GNU/Linux

Hardware:
Intel SR1200


Solution from google ?
"upgrade BIOS firmware" - but Im sure the BIOS is the latest.


How serious is this problem ? Does some patch exist to workaround APIC
errors ?
What is the sense of putting "noapic and nolapic" parameters into the
cmdline ? ( Can these parameters cause performance decreasing ? )

Thank you !

Vladimir Dvorak



2006-05-22 11:51:44

by Andi Kleen

[permalink] [raw]
Subject: Re: APIC error on CPUx

Vladimir Dvorak <[email protected]> writes:
>
> Linux requisities:
> Debian 3.1
> Linux mailserver 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686

That's an ancient kernel.

> GNU/Linux
>
> Hardware:
> Intel SR1200

If it's an <=P3 class machine: most likely you have noise on the APIC bus.

-Andi

2006-05-22 11:58:50

by Vladimir Dvorak

[permalink] [raw]
Subject: Re: APIC error on CPUx

Andi Kleen wrote:

>Vladimir Dvorak <[email protected]> writes:
>
>
>>Linux requisities:
>>Debian 3.1
>>Linux mailserver 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686
>>
>>
>
>That's an ancient kernel.
>
>
Yes, I agree.

... but the latest in Debian/Sarge. :-)

Do you, Andi, thing that upgrade to latest vanilla one ( from
kernel.org ) should solve this problem ?

>
>
>>GNU/Linux
>>
>>Hardware:
>>Intel SR1200
>>
>>
>
>If it's an <=P3 class machine: most likely you have noise on the APIC bus.
>
>-Andi
>
>
>
Yes, you are right :

cat /proc/cpuinfo
...
model name : Intel(R) Pentium(R) III CPU family 1133MHz
...


"Noise on APIC bus" means - " a lot of interrupts from devices" ?

Thank you Andi!

Vladimir

2006-05-22 12:03:57

by Andi Kleen

[permalink] [raw]
Subject: Re: APIC error on CPUx

On Monday 22 May 2006 13:58, Vladimir Dvorak wrote:
> Andi Kleen wrote:
>
> >Vladimir Dvorak <[email protected]> writes:
> >
> >
> >>Linux requisities:
> >>Debian 3.1
> >>Linux mailserver 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686
> >>
> >>
> >
> >That's an ancient kernel.
> >
> >
> Yes, I agree.
>
> ... but the latest in Debian/Sarge. :-)
>
> Do you, Andi, thing that upgrade to latest vanilla one ( from
> kernel.org ) should solve this problem ?

Probably not.

>
> >
> >
> >>GNU/Linux
> >>
> >>Hardware:
> >>Intel SR1200
> >>
> >>
> >
> >If it's an <=P3 class machine: most likely you have noise on the APIC bus.
> >
> >-Andi
> >
> >
> >
> Yes, you are right :
>
> cat /proc/cpuinfo
> ...
> model name : Intel(R) Pentium(R) III CPU family 1133MHz
> ...
>
>
> "Noise on APIC bus" means - " a lot of interrupts from devices" ?

Usually a crappy/broken/misdesigned motherboard.

-Andi

2006-05-22 12:19:50

by Vladimir Dvorak

[permalink] [raw]
Subject: Re: APIC error on CPUx

Andi Kleen wrote:

>On Monday 22 May 2006 13:58, Vladimir Dvorak wrote:
>
>
>>Andi Kleen wrote:
>>
>>
>>
>>>Vladimir Dvorak <[email protected]> writes:
>>>
>>>
>>>
>>>
>>>>Linux requisities:
>>>>Debian 3.1
>>>>Linux mailserver 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686
>>>>
>>>>
>>>>
>>>>
>>>That's an ancient kernel.
>>>
>>>
>>>
>>>
>>Yes, I agree.
>>
>> ... but the latest in Debian/Sarge. :-)
>>
>>Do you, Andi, thing that upgrade to latest vanilla one ( from
>>kernel.org ) should solve this problem ?
>>
>>
>
>Probably not.
>
>
>
>>>
>>>
>>>
>>>
>>>>GNU/Linux
>>>>
>>>>Hardware:
>>>>Intel SR1200
>>>>
>>>>
>>>>
>>>>
>>>If it's an <=P3 class machine: most likely you have noise on the APIC bus.
>>>
>>>-Andi
>>>
>>>
>>>
>>>
>>>
>>Yes, you are right :
>>
>>cat /proc/cpuinfo
>>...
>>model name : Intel(R) Pentium(R) III CPU family 1133MHz
>>...
>>
>>
>>"Noise on APIC bus" means - " a lot of interrupts from devices" ?
>>
>>
>
>Usually a crappy/broken/misdesigned motherboard.
>
>-Andi
>
>
>
>
And, probably, the latest question related to this topic:

Can "noapic" or "nolapic" solve this ? Does it mean ( with these
parameters ) that devices will start to use 8259 interrupt controller
instead APIC ?

Is harmfull put "noapic" on "nolapic" to cmdline ?

Thank you.

Vladimir

2006-05-22 12:31:10

by Jan Engelhardt

[permalink] [raw]
Subject: Re: APIC error on CPUx

Hi,


I also have this [a similar] message; it is produced repeatedly between 2.5
and 5 seconds whenever the ISDN card is dialled in:
May 22 14:28:18 shanghai kernel: APIC error on CPU0: 02(02)

Linux shanghai 2.6.17-rc4 #1 Sat May 20 00:06:16 CEST 2006 i686 athlon
i386 GNU/Linux



>>>>>Debian 3.1
>>>>>Linux mailserver 2.6.8-3-686-smp #1 SMP Thu Feb 9 07:05:39 UTC 2006 i686
>>>>That's an ancient kernel.
>>>Yes, I agree.
>>> ... but the latest in Debian/Sarge. :-)
>>>Do you, Andi, thing that upgrade to latest vanilla one ( from
>>>kernel.org ) should solve this problem ?
>>Probably not.
>>>>
>>>>>GNU/Linux
>>>>>Hardware:
>>>>>Intel SR1200
>>>>>
>>>>If it's an <=P3 class machine: most likely you have noise on the APIC bus.
>>>>
>>>Yes, you are right :
>>>cat /proc/cpuinfo
>>>...
>>>model name : Intel(R) Pentium(R) III CPU family 1133MHz
>>>...
>>>
>>>"Noise on APIC bus" means - " a lot of interrupts from devices" ?

Yep.

14:29 shanghai:~ > (while :; do cat /proc/interrupts |grep -i hisax; sleep
1; done)
193: 1111340 IO-APIC-level SiS SI7012, HiSax
193: 1113106 IO-APIC-level SiS SI7012, HiSax
193: 1114857 IO-APIC-level SiS SI7012, HiSax
193: 1116599 IO-APIC-level SiS SI7012, HiSax
193: 1118328 IO-APIC-level SiS SI7012, HiSax
193: 1120093 IO-APIC-level SiS SI7012, HiSax
193: 1121858 IO-APIC-level SiS SI7012, HiSax
193: 1123608 IO-APIC-level SiS SI7012, HiSax
(Hisax: CONFIG_HISAX_NETJET=y)

The problem goes away with noapic or acpi=off, but of course that also
means you don't have IRQs > 15.

>>Usually a crappy/broken/misdesigned motherboard.

Elitegroup L7S7A2 here.

>And, probably, the latest question related to this topic:
>
>Can "noapic" or "nolapic" solve this ? Does it mean ( with these
>parameters ) that devices will start to use 8259 interrupt controller
>instead APIC ?
>
>Is harmfull put "noapic" on "nolapic" to cmdline ?


Jan Engelhardt
--

2006-05-22 12:54:12

by Andi Kleen

[permalink] [raw]
Subject: Re: APIC error on CPUx


> Can "noapic" or "nolapic" solve this ? Does it mean ( with these
> parameters ) that devices will start to use 8259 interrupt controller
> instead APIC ?

nolapic won't work on SMP (or rather forced UP mode)

You'll not see the messages anymore, but might get silent corruption.
Or it might work around it because it will make everything a bit slower.

>
> Is harmfull put "noapic" on "nolapic" to cmdline ?

If all devices can still be accessed noapic is just slower.

-Andi

2006-05-22 13:13:17

by Xavier Bestel

[permalink] [raw]
Subject: Re: APIC error on CPUx

On Mon, 2006-05-22 at 14:54, Andi Kleen wrote:
> > Can "noapic" or "nolapic" solve this ? Does it mean ( with these
> > parameters ) that devices will start to use 8259 interrupt controller
> > instead APIC ?
>
> nolapic won't work on SMP (or rather forced UP mode)
>
> You'll not see the messages anymore, but might get silent corruption.
> Or it might work around it because it will make everything a bit slower.
>
> >
> > Is harmfull put "noapic" on "nolapic" to cmdline ?
>
> If all devices can still be accessed noapic is just slower.

So, is there a software solution to this problem ?
I have the same one on an old Dell server.

Thanks,
Xav


2006-05-22 14:47:35

by Brown, Len

[permalink] [raw]
Subject: RE: APIC error on CPUx


>The problem goes away with noapic or acpi=off, but of course that also
>means you don't have IRQs > 15.

"acpi=off" is a superset of "noapic" here, presumably because the
board doesn't have MPS tables that describe the IOAPIC when ACPI is
off.

"noapic" is a perfectly reasonable thing to use if you don't
have a lot of interrupt sources and there is no more sharing
in PIC mode than IOAPIC mode.

The advantage of using IOAPIC mode is that the system has more interrupt
pins
availalble and this allows sharing to be avoided. It also allows
the system to target the interrupts to any processor when you
have more than one.

cheers,
-Len

2006-05-22 14:51:06

by Andi Kleen

[permalink] [raw]
Subject: Re: APIC error on CPUx


> "noapic" is a perfectly reasonable thing to use if you don't
> have a lot of interrupt sources and there is no more sharing
> in PIC mode than IOAPIC mode.

It makes interrupt handling much slower. The PIC accesses can be > 50%
of the interrupt handling cost.

Of course that tends to hide some bugs.

-Andi

2006-05-22 19:44:06

by Junio C Hamano

[permalink] [raw]
Subject: Re: APIC error on CPUx

Vladimir Dvorak <[email protected]> writes:

>>>>>GNU/Linux
>>>>>
>>>>>Hardware:
>>>>>Intel SR1200
>>>>>
>>>>If it's an <=P3 class machine: most likely you have noise on the APIC bus.
>>>>
>>>>-Andi
>>>>
>>>Yes, you are right :
>>>
>>>cat /proc/cpuinfo
>>>...
>>>model name : Intel(R) Pentium(R) III CPU family 1133MHz
>>>...
>>>
>>>"Noise on APIC bus" means - " a lot of interrupts from devices" ?
>>
>>Usually a crappy/broken/misdesigned motherboard.
>>
>>-Andi

I say a similar error message upon boot:

APIC error on CPU0: 00(40)
APIC error on CPU0: 40(40)

I run i386 kernel (Debian/etch) on Turion64 MT-30; it is an
Averatec 2155 notebook (aka MSI 1013).

2006-05-23 06:44:33

by Vladimir Dvorak

[permalink] [raw]
Subject: Re: APIC error on CPUx

Brown, Len wrote:

>
>
>
>>The problem goes away with noapic or acpi=off, but of course that also
>>means you don't have IRQs > 15.
>>
>>
>
>"acpi=off" is a superset of "noapic" here, presumably because the
>board doesn't have MPS tables that describe the IOAPIC when ACPI is
>off.
>
>"noapic" is a perfectly reasonable thing to use if you don't
>have a lot of interrupt sources and there is no more sharing
>in PIC mode than IOAPIC mode.
>
>The advantage of using IOAPIC mode is that the system has more interrupt
>pins
>availalble and this allows sharing to be avoided. It also allows
>the system to target the interrupts to any processor when you
>have more than one.
>
>cheers,
>-Len
>
>
>
My experience yesterday:

Server with 'noapic' cannot boot, kernel reports something like "Lost
interrupt: hde" ( it was said by my college, server is not in my
physical control ) . With 'acpi=off' it boots, but errors appear again.

Brief datasheet about the board is here:
http://www.abclinuxu.cz/images/hosting/sr1200.pdf

Vladimir

2006-05-23 20:17:46

by Brown, Len

[permalink] [raw]
Subject: RE: APIC error on CPUx

> http://www.abclinuxu.cz/images/hosting/sr1200.pdf

An Intel SCB2, a Dual P3/Serverworks board....
Run it in the default IOAPIC mode and ignore the warnings.
No, upgrading the kernel will almost certainly not make
any difference.
My note about running with "noapic" was mis-guided --
I didn't realize this was an SMP server board.

Curious, however that you can't boot in IOAPIC mode with acpi=off.
I thought that in that era they still had MPS support. You might
take a peek at the BIOS setup options. dmesg will also mention
MPS if it is there. However, even if you succeeded in booting
in acpi=off MPS IOAPIC mode, I would not expect it to have an
effect on the warnings you see.

cheers,
-Len

2006-05-23 20:37:51

by Vladimir Dvorak

[permalink] [raw]
Subject: Re: APIC error on CPUx

Linux version 2.6.8-3-686-smp (root@lart) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 SMP Thu Feb 9 07:05:39 UTC 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009d000 (usable)
BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000001ffc0000 (usable)
BIOS-e820: 000000001ffc0000 - 000000001ffff000 (ACPI data)
BIOS-e820: 000000001ffff000 - 0000000020000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec02000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
found SMP MP-table at 000ff780
On node 0 totalpages: 131008
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 126912 pages, LIFO batch:16
HighMem zone: 0 pages, LIFO batch:1
DMI 2.3 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: INTEL Product ID: SCB20 APIC at: 0xFEE00000
Processor #3 6:11 APIC version 17
Processor #0 6:11 APIC version 17
I/O APIC #4 Version 17 at 0xFEC00000.
I/O APIC #5 Version 17 at 0xFEC01000.
Enabling APIC mode: Flat. Using 2 I/O APICs
Processors: 2
Built 1 zonelists
Kernel command line: root=/dev/hde1 ro acpi=off
Initializing CPU#0
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 1130.651 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 510584k/524032k available (1665k kernel code, 12692k reserved, 771k data, 168k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 2236.41 BogoMIPS
Security Scaffold v1.0.0 initialized
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) III CPU family 1133MHz stepping 01
per-CPU timeslice cutoff: 1463.49 usecs.
task migration cache decay timeout: 2 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000004
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 3000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 2252.80 BogoMIPS
CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel(R) Pentium(R) III CPU family 1133MHz stepping 01
Total of 2 processors activated (4489.21 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 4 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 4 ... ok.
Setting 5 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 5 ... ok.
init IO_APIC IRQs
IO-APIC (apicid-pin) 4-0, 4-5, 4-9, 4-11, 5-0, 5-1, 5-6, 5-7, 5-8, 5-9, 5-10, 5-11, 5-12, 5-13, 5-14, 5-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
..... (found pin 0) ...works.
number of MP IRQ sources: 17.
number of IO-APIC #4 registers: 16.
number of IO-APIC #5 registers: 16.
testing the IO APIC.......................
IO APIC #4......
.... register #00: 04000000
....... : physical APIC id: 04
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 0 0 0 0 0 1 1 31
01 001 01 0 0 0 0 0 1 1 39
02 000 00 1 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 000 00 1 0 0 0 0 0 0 00
06 001 01 0 0 0 0 0 1 1 51
07 001 01 0 0 0 0 0 1 1 59
08 001 01 0 0 0 0 0 1 1 61
09 000 00 1 0 0 0 0 0 0 00
0a 001 01 1 1 0 1 0 1 1 69
0b 000 00 1 0 0 0 0 0 0 00
0c 001 01 0 0 0 0 0 1 1 71
0d 001 01 0 0 0 0 0 1 1 79
0e 001 01 0 0 0 0 0 1 1 81
0f 001 01 0 0 0 0 0 1 1 89
IO APIC #5......
.... register #00: 05000000
....... : physical APIC id: 05
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 01000000
....... : arbitration: 01
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 1 0 0 0 0 0 0 00
02 001 01 1 1 0 1 0 1 1 91
03 001 01 1 1 0 1 0 1 1 99
04 001 01 1 1 0 1 0 1 1 A1
05 001 01 1 1 0 1 0 1 1 A9
06 000 00 1 0 0 0 0 0 0 00
07 000 00 1 0 0 0 0 0 0 00
08 000 00 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 1 0 0 0 0 0 0 00
0f 000 00 1 0 0 0 0 0 0 00
Using vector-based indexing
IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ145 -> 1:2
IRQ153 -> 1:3
IRQ161 -> 1:4
IRQ169 -> 1:5
IRQ105 -> 0:10
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1130.0272 MHz.
..... host bus clock speed is 132.0972 MHz.
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
CPU0: online
domain 0: span 01
groups: 01
domain 1: span 03
groups: 01 02
CPU1: online
domain 0: span 02
groups: 02
domain 1: span 03
groups: 02 01
checking if image is initramfs...it isn't (ungzip failed); looks like an initrd
Freeing initrd memory: 4624k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfda75, last bus=2
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040326
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00f36e0
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0x2f9f, dseg 0xf0000
pnp: 00:09: ioport range 0x4d0-0x4d1 has been reserved
pnp: 00:09: ioport range 0xcf8-0xcff could not be reserved
pnp: 00:09: ioport range 0x40b-0x40b has been reserved
pnp: 00:09: ioport range 0x4d6-0x4d6 has been reserved
pnp: 00:09: ioport range 0xc00-0xc01 has been reserved
pnp: 00:09: ioport range 0xc14-0xc14 has been reserved
pnp: 00:09: ioport range 0xc49-0xc4a has been reserved
pnp: 00:0a: ioport range 0x580-0x58f has been reserved
pnp: 00:0a: ioport range 0x500-0x51f has been reserved
pnp: 00:0a: ioport range 0x5a0-0x5af has been reserved
pnp: 00:0a: ioport range 0x520-0x53f has been reserved
pnp: 00:0a: ioport range 0x560-0x574 has been reserved
pnp: 00:0a: ioport range 0x540-0x55f has been reserved
PnPBIOS: 17 nodes reported by PnP BIOS; 17 recorded by driver
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI: Using IRQ router ServerWorks [1166/0201] at 0000:00:0f.0
PCI->APIC IRQ transform: (B0,I2,P0) -> 153
PCI->APIC IRQ transform: (B0,I3,P0) -> 169
PCI->APIC IRQ transform: (B0,I4,P0) -> 161
PCI->APIC IRQ transform: (B0,I12,P0) -> 145
PCI->APIC IRQ transform: (B0,I15,P0) -> 105
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch ([email protected])
devfs: boot_options: 0x0
Initializing Cryptographic API
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial: 8250/16550 driver $Revision: 1.90 $ 48 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
NET: Registered protocol family 2
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET: Registered protocol family 8
NET: Registered protocol family 20
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 4624 blocks [1 disk] into ram disk... |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 168k freed
vesafb: probe of vesafb0 failed with error -6
NET: Registered protocol family 1
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PDC20267: IDE controller at PCI slot 0000:00:02.0
PDC20267: chipset revision 2
PDC20267: ROM enabled at 0xfe7e0000
PDC20267: 100% native mode on irq 153
PDC20267: (U)DMA Burst Bit ENABLED Primary MASTER Mode Secondary MASTER Mode.
ide2: BM-DMA at 0x1440-0x1447, BIOS settings: hde:pio, hdf:pio
ide3: BM-DMA at 0x1448-0x144f, BIOS settings: hdg:pio, hdh:pio
hde: ST380011A, ATA DISK drive
Using anticipatory io scheduler
ide2 at 0x1400-0x1407,0x140a on irq 153
hde: max request size: 1024KiB
hde: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
/dev/ide/host2/bus0/target0/lun0: p1 p2 p3
hdg: ST380011A, ATA DISK drive
ide3 at 0x1410-0x1417,0x140e on irq 153
hdg: max request size: 1024KiB
hdg: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
/dev/ide/host2/bus1/target0/lun0: p1
SvrWks CSB5: IDE controller at PCI slot 0000:00:0f.1
SvrWks CSB5: chipset revision 146
SvrWks CSB5: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x03a0-0x03a7, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0x03a8-0x03af, BIOS settings: hdc:pio, hdd:pio
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Adding 3903784k swap on /dev/hde2. Priority:-1 extents:1
EXT3 FS on hde1, internal journal
Real Time Clock Driver v1.12
Generic RTC Driver v1.07
Capability LSM initialized
kjournald starting. Commit interval 5 seconds
EXT3 FS on hdg1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on hde3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
e100: Intel(R) PRO/100 Network Driver, 3.0.18
e100: Copyright(c) 1999-2004 Intel Corporation
e100: eth0: e100_probe: addr 0xfe790000, irq 169, MAC addr 00:03:47:A5:33:5D
e100: eth1: e100_probe: addr 0xfe750000, irq 161, MAC addr 00:03:47:A5:33:5E
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
NET: Registered protocol family 10
Disabled Privacy Extensions on device c031f0c0(lo)
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
APIC error on CPU1: 00(40)
APIC error on CPU0: 00(40)
APIC error on CPU0: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU1: 40(40)


Attachments:
boot.txt (12.94 kB)

2006-05-24 05:16:03

by Brown, Len

[permalink] [raw]
Subject: RE: APIC error on CPUx

Vladimir's SCB2/Serverworks boots with and without "acpi=off",
and in both cases the IOAPICS are set up properly,
the device work, and there are the following messages:

APIC error on CPU1: 00(40)
APIC error on CPU0: 00(40)
APIC error on CPU0: 40(40)
APIC error on CPU0: 40(40)
APIC error on CPU0: 40(40)

These are the now infamous "Receive illegal vector" messages.
I expect this chipset has a physical APIC bus (rather than
the FSB delivery used today) which is mis-behaving.

I've never heard of this being associated with an actual
failure (such as a lost interrupt). This message is already
KERN_DEBUG -- can't get any lower priority than that.
Maybe we should put this message under apic_printk()?

Since acpi=off doesn't make any difference, I recommend
running in the default configuration without this parameter.
----
Jan's system has

APIC error on CPU0: 02(02)

Also seems to be receiving junk on the APIC bus.

> The problem goes away with noapic or acpi=off, but of course that also

> means you don't have IRQs > 15.

My comment about PIC-mode probably being okay applies to this
motherboard
but not Vladimir's above.

>>>Usually a crappy/broken/misdesigned motherboard.

>Elitegroup L7S7A2 here.

This is a SiS746 motherboard.
This kind of error seems to be relatively common on SiS.

-Len

2006-05-24 08:11:28

by Mikael Pettersson

[permalink] [raw]
Subject: RE: APIC error on CPUx

Brown, Len writes:
> Vladimir's SCB2/Serverworks boots with and without "acpi=off",
> and in both cases the IOAPICS are set up properly,
> the device work, and there are the following messages:
>
> APIC error on CPU1: 00(40)
> APIC error on CPU0: 00(40)
> APIC error on CPU0: 40(40)
> APIC error on CPU0: 40(40)
> APIC error on CPU0: 40(40)
>
> These are the now infamous "Receive illegal vector" messages.
> I expect this chipset has a physical APIC bus (rather than
> the FSB delivery used today) which is mis-behaving.
>
> I've never heard of this being associated with an actual
> failure (such as a lost interrupt). This message is already
> KERN_DEBUG -- can't get any lower priority than that.
> Maybe we should put this message under apic_printk()?

The default must be that APIC errors are logged. They are valid
indicators of hardware brokenness, and have in the past been
traced to bad mobos, bad PSUs, bad cooling, and even kernel bugs.
The fact that many systems manage to limp along in their presence
doesn't matter.

The default for apic_printk is to print nothing (apic_verbosity
== APIC_QUIET). You could add a fourth level (ERROR or WARNING)
between QUIET and VERBOSE, make apic_verbosity initialise to that
level, and use that level for the APIC error messages. That would
make the messages visible by default but possible to suppress for
knowledgeable users.

/Mikael

2006-05-25 11:51:46

by Jan Engelhardt

[permalink] [raw]
Subject: Re: APIC error on CPUx



>RAMDISK: Loading 4624 blocks [1 disk] into ram disk... |/-\|/-\

Did 2.6.8 really use a rotating dash for that or is it a custom patch?


Jan Engelhardt
--