2001-10-10 14:42:34

by Paul Larson

[permalink] [raw]
Subject: 2.4.11 APIC problems

Hadware is a Netvista PIII-850, single processor
I have priviously (2.4.10 and below) been able to use this machine with
IO-APIC turned on. I turned on Local APIC and IO-APIC in 2.4.11 and all
I got on boot was an endless stream of error messages that look like
this:
APIC error on CPU0: 08(08)

Here is the output from the serial console. I didn't see the previously
mention error message in the serial console though. Where I would have
expected to see it, all I got was random garbage.

Linux version 2.4.11 (root@avalon) (gcc version 2.95.3 20010315 (SuSE)) #1 Wed Oct 10 09:12:46 CDT 2001
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000fee06c0 (usable)
BIOS-e820: 000000000fee06c0 - 000000000fee66c0 (ACPI data)
BIOS-e820: 000000000fee66c0 - 000000000feee700 (ACPI NVS)
BIOS-e820: 000000000feee700 - 0000000010000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
found SMP MP-table at 0009fe00
hm, page 0009f000 reserved twice.
hm, page 000a0000 reserved twice.
hm, page 000ec000 reserved twice.
hm, page 000ed000 reserved twice.
On node 0 totalpages: 65248
zone(0): 4096 pages.
zone(1): 61152 pages.
zone(2): 0 pages.
Intel MultiProcessor Specification v1.1
Virtual Wire compatibility mode.
OEM ID: IBM-PCCO Product ID: CDT-BIOS MP APIC at: 0xFEE00000
Processor #0 Pentium(tm) Pro APIC version 17
I/O APIC #2 Version 32 at 0xFEC00000.
Processors: 1
Kernel command line: auto BOOT_IMAGE=new ro root=301 BOOT_FILE=/boot/bzImage console=ttyS0,9600
Initializing CPU#0
Detected 863.880 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1723.59 BogoMIPS
Memory: 254340k/260992k available (1288k kernel code, 6264k reserved, 435k data, 224k init, 0k highmem)
Dentry-cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel Pentium III (Coppermine) stepping 0a
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
..T08 oe
Po>: r8n>CC
e(oIr: rI <: (o 8CCCP3<o)U) )C<8one(P(<nrI :0U nr >r Ue
<ArC>n8n)oPI0 ProA0 <0r:C


---------------------
Thanks,
Paul Larson


2001-10-10 19:18:54

by Paul Larson

[permalink] [raw]
Subject: Re: 2.4.11 APIC problems

On Wed, 2001-10-10 at 16:40, Martin J. Bligh wrote:
> If you look at the VGA screen, what are the final messages before
> "APIC error on CPU0: 08(08)" ? You might want to add a "die()"
> call at the end of arch/i386/kernel/apic.c:smp_error_interrupt()
Looking at the vga screen without console-serial going on, I just saw an
endless stream of those errors. With console getting logged over the
serial line though, I would see the same random garbage in the output I
sent you earlier, and not see the error message.

So, I tried inserting the die() like you said without console serial. I
got pagefulls of dumps and pretty soon it rebooted itself. So, I logged
to the serial console again on the next reboot to capture the output.
Looks like we got the "APIC error" message in the log too. It's a
really long log so I attached it rather than putting it inline.

Thanks,
Paul Larson

2001-10-10 20:49:45

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.4.11 APIC problems

> So, I tried inserting the die() like you said without console serial. I
> got pagefulls of dumps and pretty soon it rebooted itself. So, I logged
> to the serial console again on the next reboot to capture the output.
> Looks like we got the "APIC error" message in the log too. It's a
> really long log so I attached it rather than putting it inline.

Ick. You need to disable the repeated interrupt. Try this instead of
the die:

cli();
__asm__ __volatile__ ("hlt");

And if it makes a huge logfile again, just mail the first bit .... I don't
care about huge emails, but others on lkml probably do ;-)

M.

PS. Nor do I care what the die says, I just want to stop the processor.
I want to know what it was doing just before the smp_error_interrupt.
There are more elegant solutions around to stop repeated APIC errors,
but this should be OK for debug.

2001-10-11 07:37:50

by Roman Kagan

[permalink] [raw]
Subject: Re: 2.4.11 APIC problems

I had exactly the same problem on my HP e800. I've tried to identify
what had changed between 2.4.10 and 2.4.11 in IO-APIC code, and I've
found out that probably the only change relevant to my uniprocessor
case is the change of the initial value of dest_mode field in struct
IO_APIC_route_entry in setup_IO_APIC_irqs() and
setup_ExtINT_IRQ0_pin() in arch/i386/kernel/io_apic.c from 1 to macro
INT_DELIVERY_MODE, which for uniprocessors was defined to be 0.

I don't claim I understand whether it is right or wrong, but the
following patch can fix _my_ problem:

--- linux/include/asm-i386/smp.h.int_delivery Wed Oct 10 13:36:11 2001
+++ linux/include/asm-i386/smp.h Wed Oct 10 18:17:06 2001
@@ -31,7 +31,7 @@
# define INT_DELIVERY_MODE 1 /* logical delivery broadcast to all procs */
# endif
#else
-# define INT_DELIVERY_MODE 0 /* physical delivery on LOCAL quad */
+# define INT_DELIVERY_MODE 1 /* logical delivery */
# define TARGET_CPUS 0x01
#endif


Cheers,
Roman.

2001-10-11 15:22:53

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.4.11 APIC problems - please apply patch

> I don't claim I understand whether it is right or wrong, but the
> following patch can fix _my_ problem:

This is right ... mea culpa. I tried to fix things too fast.

On the other hand, I don't see why it wouldn't work ... and indeed
it did work for people who originally tested it, but it's not what I intended
to do, and it's an unnecesary change from previous kernels.

> --- linux/include/asm-i386/smp.h.int_delivery Wed Oct 10 13:36:11 2001
> +++ linux/include/asm-i386/smp.h Wed Oct 10 18:17:06 2001
> @@ -31,7 +31,7 @@
> # define INT_DELIVERY_MODE 1 /* logical delivery broadcast to all procs */
> # endif
> #else
> -# define INT_DELIVERY_MODE 0 /* physical delivery on LOCAL quad */
> +# define INT_DELIVERY_MODE 1 /* logical delivery */
> # define TARGET_CPUS 0x01
> #endif

Linus - can you add this fix? I intended to keep the UP+IO_APIC
setup as before.

Sorry,

Martin.