2006-12-16 23:36:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work (was: Linux 2.6.20-rc1)



On Sun, 17 Dec 2006, Tobias Diedrich wrote:
>
> With commit b0268726 backed out, 2.6.20-rc1 boots fine.

Ok. It's sad, because that thing really did clean stuff up, and seemed
like a nice and robust approach.

Your dmesg is kind of interesting:

..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled(7)APIC error on CPU0: 04(40)
.. failed

where that APIC error on CPU0 seems to be a "Send accept error" and "Send
illegal vector" thing. I think we actually got the interrupt there, but
because we had some APIC setup bug, we didn't accept it properly, and it
resulted in that "APIC error" thing. Maybe.

This is a long shot, but I wonder if we should _wait_ for the APIC to
stabilize after we've unmasked the IRQ. Ie, if you could undo the back-out
(going back to the broken situation), and try the patch below, and see if
it makes a difference.

Unlikely, I know. I don't see anything wrong with the code, though, but
maybe I'm just blind.

Eric, Andi, Yinghai, do you see anything here to explain why that commit
breaks?

Linus

---
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 2a1dcd5..a8a09e0 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -294,7 +294,7 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin)

DO_ACTION( __mask, 0, |= 0x00010000, io_apic_sync(entry->apic) )
/* mask = 1 */
-DO_ACTION( __unmask, 0, &= 0xfffeffff, )
+DO_ACTION( __unmask, 0, &= 0xfffeffff, io_apic_sync(entry->apic) )
/* mask = 0 */

static void mask_IO_APIC_irq (unsigned int irq)


2006-12-17 14:57:18

by Tobias Diedrich

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work (was: Linux 2.6.20-rc1)

Linus Torvalds wrote:

> Your dmesg is kind of interesting:
>
> ..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled(7)APIC error on CPU0: 04(40)
> .. failed
>
> where that APIC error on CPU0 seems to be a "Send accept error" and "Send
> illegal vector" thing. I think we actually got the interrupt there, but
> because we had some APIC setup bug, we didn't accept it properly, and it
> resulted in that "APIC error" thing. Maybe.

I just tried changing the code so the "8259 IRQ0 enabled" case is
tested first and with that it boots fine.


Index: linux-2.6.20-rc1/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/x86_64/kernel/io_apic.c 2006-12-17 00:45:57.000000000 +0100
+++ linux-2.6.20-rc1/arch/x86_64/kernel/io_apic.c 2006-12-17 15:39:40.000000000 +0100
@@ -1615,6 +1615,7 @@
*/
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_EXTINT);
init_8259A(1);
+ enable_8259A_irq(0);

pin1 = find_isa_irq_pin(0, mp_INT);
apic1 = find_isa_irq_apic(0, mp_INT);



[ 0.000000] Linux version 2.6.20-rc1-amd64 (ranma@melchior) (gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)) #28 Sun Dec 17 15:40:22 CET 2006
[ 0.000000] Command line: root=/dev/sda5 resume=/dev/sda6 vga=6 apic=verbose apic=verbose ro [email protected]/,[email protected]/
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
[ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000003fee0000 (usable)
[ 0.000000] BIOS-e820: 000000003fee0000 - 000000003fee3000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000003fee3000 - 000000003fef0000 (ACPI data)
[ 0.000000] BIOS-e820: 000000003fef0000 - 000000003ff00000 (reserved)
[ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[ 0.000000] Entering add_active_range(0, 256, 261856) 1 entries of 256 used
[ 0.000000] end_pfn_map = 1048576
[ 0.000000] DMI 2.3 present.
[ 0.000000] ACPI: RSDP (v000 Nvidia ) @ 0x00000000000f7ce0
[ 0.000000] ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fee3040
[ 0.000000] ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fee30c0
[ 0.000000] ACPI: SSDT (v001 PTLTD POWERNOW 0x00000001 LTP 0x00000001) @ 0x000000003feec2c0
[ 0.000000] ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec400
[ 0.000000] ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec200
[ 0.000000] ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x0000000000000000
[ 0.000000] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[ 0.000000] Entering add_active_range(0, 256, 261856) 1 entries of 256 used
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1048576
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0 -> 159
[ 0.000000] 0: 256 -> 261856
[ 0.000000] On node 0 totalpages: 261759
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 1356 pages reserved
[ 0.000000] DMA zone: 2587 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 3524 pages used for memmap
[ 0.000000] DMA32 zone: 254236 pages, LIFO batch:31
[ 0.000000] Normal zone: 0 pages used for memmap
[ 0.000000] Nvidia board detected. Ignoring ACPI timer override.
[ 0.000000] If you got timer trouble try acpi_use_timer_override
[ 0.000000] ACPI: PM-Timer IO Port: 0x1008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ14 used by override.
[ 0.000000] ACPI: IRQ15 used by override.
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] mapped APIC to ffffffffff5fd000 ( fee00000)
[ 0.000000] mapped IOAPIC to ffffffffff5fc000 (00000000fec00000)
[ 0.000000] Nosave address range: 000000000009f000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000f0000
[ 0.000000] Nosave address range: 00000000000f0000 - 0000000000100000
[ 0.000000] Allocating PCI resources starting at 40000000 (gap: 3ff00000:b0100000)
[ 0.000000] Built 1 zonelists. Total pages: 256823
[ 0.000000] Kernel command line: root=/dev/sda5 resume=/dev/sda6 vga=6 apic=verbose apic=verbose ro [email protected]/,[email protected]/
[ 0.000000] netconsole: local port 6665
[ 0.000000] netconsole: local IP 192.168.8.241
[ 0.000000] netconsole: interface eth0
[ 0.000000] netconsole: remote port 514
[ 0.000000] netconsole: remote IP 255.255.255.255
[ 0.000000] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[ 27.616056] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
[ 27.616058] time.c: Detected 2009.285 MHz processor.
[ 27.621535] Console: colour VGA+ 80x60
[ 27.624157] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[ 27.624996] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 27.625240] Checking aperture...
[ 27.625347] CPU 0: aperture @ b2c2000000 size 32 MB
[ 27.625454] Aperture too small (32 MB)
[ 27.630247] No AGP bridge found
[ 27.638099] Memory: 1025356k/1047424k available (3261k kernel code, 21436k reserved, 1445k data, 200k init)
[ 27.719244] Calibrating delay using timer specific routine.. 4022.22 BogoMIPS (lpj=6701161)
[ 27.719489] Mount-cache hash table entries: 256
[ 27.719665] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 27.719774] CPU: L2 Cache: 512K (64 bytes/line)
[ 27.719899] CPU: AMD Athlon(tm) 64 Processor 3200+ stepping 02
[ 27.720052] ACPI: Core revision 20060707
[ 27.727794] enabled ExtINT on CPU#0
[ 27.727901] ESR value after enabling vector: 00000000, after 00000004
[ 27.728238] ENABLING IO-APIC IRQs
[ 27.728345] init IO_APIC IRQs
[ 27.728417] IO-APIC (apicid-pin) 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
[ 27.728548] ..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 disabled<6>Using local APIC timer interrupts.
^^^^^^^^
really
enabled
[ 27.811447] result 12558047
[ 27.811553] Detected 12.558 MHz APIC timer.

HTH,

--
Tobias PGP: http://9ac7e0bc.uguu.de
This mail is made of 100% recycled bits

2006-12-18 13:15:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work (was: Linux 2.6.20-rc1)

Tobias Diedrich <[email protected]> writes:

> Linus Torvalds wrote:
>
>> Your dmesg is kind of interesting:
>>
>> ..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled(7)APIC error on CPU0:
> 04(40)
>> .. failed
>>
>> where that APIC error on CPU0 seems to be a "Send accept error" and "Send
>> illegal vector" thing. I think we actually got the interrupt there, but
>> because we had some APIC setup bug, we didn't accept it properly, and it
>> resulted in that "APIC error" thing. Maybe.
>
> I just tried changing the code so the "8259 IRQ0 enabled" case is
> tested first and with that it boots fine.

Could you try removing the clear_IO_APIC_pin from try_io_apic_pin.

This isn't a complete fix but I believe for your hardware it will
fix the problem and it points at what the real fix is.

Not properly programming the io_apic for the case we want to test.

Eric

2006-12-18 15:23:39

by Tobias Diedrich

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work (was: Linux 2.6.20-rc1)

Eric W. Biederman wrote:
> Tobias Diedrich <[email protected]> writes:
>
> > Linus Torvalds wrote:
> >
> >> Your dmesg is kind of interesting:
> >>
> >> ..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled(7)APIC error on CPU0:
> > 04(40)
> >> .. failed
> >>
> >> where that APIC error on CPU0 seems to be a "Send accept error" and "Send
> >> illegal vector" thing. I think we actually got the interrupt there, but
> >> because we had some APIC setup bug, we didn't accept it properly, and it
> >> resulted in that "APIC error" thing. Maybe.
> >
> > I just tried changing the code so the "8259 IRQ0 enabled" case is
> > tested first and with that it boots fine.
>
> Could you try removing the clear_IO_APIC_pin from try_io_apic_pin.
>
> This isn't a complete fix but I believe for your hardware it will
> fix the problem and it points at what the real fix is.
>
> Not properly programming the io_apic for the case we want to test.

Yes, this works:

|[ 27.535937] init IO_APIC IRQs
|[ 27.536009] IO-APIC (apicid-pin) 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
|[ 27.536140] ..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 disabled<3> (clear_IO_APIC_pin not called)<3> .. failed
|[ 27.569357] ..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled<3> .. works
|[ 27.602547] Using local APIC timer interrupts.

I can also report, that updating the BIOS to version 0609 (released
last week or so, also adds the long-missing HPET support) also makes
the problem go away since the first testcase then already works.
I'm currently running with the BIOS downgraded to version 0402.

|[ 23.646371] ENABLING IO-APIC IRQs
|[ 23.646477] init IO_APIC IRQs
|[ 23.646479] IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
|[ 23.646674] ..TIMER: trying IO-APIC=0 PIN=2 with 8259 IRQ0 disabled<3> .. works
|[ 23.679872] Using local APIC timer interrupts.

Index: linux-2.6.20-rc1/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/x86_64/kernel/io_apic.c 2006-12-18 15:56:38.000000000 +0100
+++ linux-2.6.20-rc1/arch/x86_64/kernel/io_apic.c 2006-12-18 16:04:15.000000000 +0100
@@ -1586,9 +1586,11 @@
setup_nmi();
enable_8259A_irq(0);
}
+ apic_printk(APIC_QUIET, KERN_ERR " .. works\n");
return 1;
}
- clear_IO_APIC_pin(apic, pin);
+ printk(KERN_ERR " (clear_IO_APIC_pin not called)");
+ /* clear_IO_APIC_pin(apic, pin); */
apic_printk(APIC_QUIET, KERN_ERR " .. failed\n");
return 0;
}

HTH,

--
Tobias PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。

2006-12-18 15:34:46

by Tobias Diedrich

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work (was: Linux 2.6.20-rc1)

Tobias Diedrich wrote:

> I can also report, that updating the BIOS to version 0609 (released
> last week or so, also adds the long-missing HPET support) also makes
> the problem go away since the first testcase then already works.
> I'm currently running with the BIOS downgraded to version 0402.

In case someone is interested, here is the diff between the dmesg
from a boot with version 0402 and version 0609:

(Changelogs for BIOS releases would be nice, all they say on the
ASUS homepage is "support for new processors"...)

--- dmesg-notimes-20061218-2.6.20-rc1-bios-0402 2006-12-18 16:27:36.000000000 +0100
+++ dmesg-notimes-20061218-2.6.20-rc1-bios-0609-works 2006-12-18 16:27:45.000000000 +0100
@@ -13,14 +13,15 @@
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 261856) 1 entries of 256 used
end_pfn_map = 1048576
-DMI 2.3 present.
-ACPI: RSDP (v000 Nvidia ) @ 0x00000000000f7ce0
-ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fee3040
-ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fee30c0
-ACPI: SSDT (v001 PTLTD POWERNOW 0x00000001 LTP 0x00000001) @ 0x000000003feec2c0
-ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec400
-ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec200
-ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x0000000000000000
+DMI 2.4 present.
+ACPI: RSDP (v002 Nvidia ) @ 0x00000000000f7b70
+ACPI: XSDT (v001 Nvidia ASUSACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fee30c0
+ACPI: FADT (v003 Nvidia ASUSACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec5c0
+ACPI: SSDT (v001 PTLTD POWERNOW 0x00000001 LTP 0x00000001) @ 0x000000003feec7c0
+ACPI: HPET (v001 Nvidia ASUSACPI 0x42302e31 AWRD 0x00000098) @ 0x000000003feec900
+ACPI: MCFG (v001 Nvidia ASUSACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec980
+ACPI: MADT (v001 Nvidia ASUSACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003feec700
+ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x03000000) @ 0x0000000000000000
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 261856) 1 entries of 256 used
Zone PFN ranges:
@@ -37,8 +38,6 @@
DMA32 zone: 3524 pages used for memmap
DMA32 zone: 254236 pages, LIFO batch:31
Normal zone: 0 pages used for memmap
-Nvidia board detected. Ignoring ACPI timer override.
-If you got timer trouble try acpi_use_timer_override
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
@@ -48,13 +47,17 @@
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
+ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
+ACPI: IRQ0 used by override.
+ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
+ACPI: HPET id: 0x10de8201 base: 0xfefff000
Using ACPI (MADT) for SMP configuration information
mapped APIC to ffffffffff5fd000 ( fee00000)
mapped IOAPIC to ffffffffff5fc000 (00000000fec00000)
@@ -72,8 +75,8 @@
netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
-time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
-time.c: Detected 2009.284 MHz processor.
+time.c: Using 25.000000 MHz WALL HPET GTOD HPET/TSC timer.
+time.c: Detected 2009.513 MHz processor.
Console: colour VGA+ 80x60
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
@@ -82,7 +85,7 @@
Aperture too small (32 MB)
No AGP bridge found
Memory: 1025340k/1047424k available (3252k kernel code, 21452k reserved, 1468k data, 200k init)
-Calibrating delay using timer specific routine.. 4022.20 BogoMIPS (lpj=6701126)
+Calibrating delay using timer specific routine.. 4023.46 BogoMIPS (lpj=6703148)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
@@ -93,12 +96,11 @@
ESR value after enabling vector: 00000000, after 00000004
ENABLING IO-APIC IRQs
init IO_APIC IRQs
- IO-APIC (apicid-pin) 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
-..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 disabled<3> (clear_IO_APIC_pin not called)<3> .. failed
-..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled<3> .. works
+ IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
+..TIMER: trying IO-APIC=0 PIN=2 with 8259 IRQ0 disabled<3> .. works
Using local APIC timer interrupts.
-result 12558025
-Detected 12.558 MHz APIC timer.
+result 12559455
+Detected 12.559 MHz APIC timer.
testing NMI watchdog ... OK.
NET: Registered protocol family 16
ACPI: bus type pci registered
@@ -166,7 +168,7 @@
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
-number of MP IRQ sources: 16.
+number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

@@ -181,9 +183,9 @@
....... : arbitration: 02
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
- 00 001 01 0 0 0 0 0 1 1 20
+ 00 000 00 1 0 0 0 0 0 0 00
01 001 01 0 0 0 0 0 1 1 21
- 02 001 01 1 0 0 0 0 1 1 22
+ 02 001 01 0 0 0 0 0 1 1 20
03 001 01 0 0 0 0 0 1 1 23
04 001 01 1 0 0 0 0 1 1 24
05 001 01 1 0 0 0 0 1 1 25
@@ -206,9 +208,8 @@
16 000 00 1 0 0 0 0 0 0 00
17 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
-IRQ0 -> 0:0
+IRQ0 -> 0:2
IRQ1 -> 0:1
-IRQ2 -> 0:2
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
@@ -223,6 +224,8 @@
IRQ14 -> 0:14
IRQ15 -> 0:15
.................................... done.
+hpet0: at MMIO 0xfefff000, IRQs 2, 8, 31
+hpet0: 3 32-bit timers, 25000000 Hz
pnp: 00:01: ioport range 0x1000-0x107f could not be reserved
pnp: 00:01: ioport range 0x1080-0x10ff has been reserved
pnp: 00:01: ioport range 0x1400-0x147f has been reserved
@@ -313,9 +316,9 @@
ACPI: Fan [FAN] (on)
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI: Getting cpuindex for acpiid 0x1
-ACPI: Thermal Zone [THRM] (25 C)
+ACPI: Thermal Zone [THRM] (21 C)
Real Time Clock Driver v1.12ac
-hpet_acpi_add: no address or irqs in _CRS
+hpet_resources: 0xfefff000 is busy
Linux agpgart interface v0.101 (c) Dave Jones
loop: loaded (max 8 devices)
tun: Universal TUN/TAP device driver, 1.6

--
Tobias PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。

2006-12-18 15:44:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

Tobias Diedrich <[email protected]> writes:
> Eric W. Biederman wrote:
>> Could you try removing the clear_IO_APIC_pin from try_io_apic_pin.
>>
>> This isn't a complete fix but I believe for your hardware it will
>> fix the problem and it points at what the real fix is.
>>
>> Not properly programming the io_apic for the case we want to test.
>
> Yes, this works:

Thanks. The bug is simply that the new code doesn't setup the
ioapic for the cases it intends to test. But it does clear out
the original programming. So if the normal good case doesn't work the
code is going to have problems.


> I can also report, that updating the BIOS to version 0609 (released
> last week or so, also adds the long-missing HPET support) also makes
> the problem go away since the first testcase then already works.
> I'm currently running with the BIOS downgraded to version 0402.

Nice to hear, so this is clearly a software setup problem in the BIOS.

Andi do you think you could address this problem?

Eric

2006-12-19 08:00:30

by Lu, Yinghai

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

On 12/18/06, Eric W. Biederman <[email protected]> wrote:
> Thanks. The bug is simply that the new code doesn't setup the
> ioapic for the cases it intends to test. But it does clear out
> the original programming. So if the normal good case doesn't work the
> code is going to have problems.

Please check the patch.


Attachments:
(No filename) (325.00 B)
timers_12182006.patch (4.49 kB)
Download all attachments

2006-12-19 11:28:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

"Yinghai Lu" <[email protected]> writes:

> On 12/18/06, Eric W. Biederman <[email protected]> wrote:
>> Thanks. The bug is simply that the new code doesn't setup the
>> ioapic for the cases it intends to test. But it does clear out
>> the original programming. So if the normal good case doesn't work the
>> code is going to have problems.
>
> Please check the patch.

Getting there but I don't think we are quite there yet.

One of the issues that this does not address is that currently our probe
order in check_timer is wrong. We should first check what the BIOS
has told us about. And only if that fails should we start guessing,
common configurations.

So the pin2 case should be tested right after the pin1 case as we do
currently. On most new boards that will be a complete noop.

But it is better than our current blind guess at using ExtINT mode.

I figure after we try what the BIOS has told us about and that
has failed we should first try the common irq 0 apic mappings,
and then try the common ExtINT mappings.

The current code causes me to want to scream, it is so silly.

Eric

2006-12-20 06:50:39

by Lu, Yinghai

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

On 12/19/06, Eric W. Biederman <[email protected]> wrote:
> So the pin2 case should be tested right after the pin1 case as we do
> currently. On most new boards that will be a complete noop.
>
> But it is better than our current blind guess at using ExtINT mode.
>
> I figure after we try what the BIOS has told us about and that
> has failed we should first try the common irq 0 apic mappings,
> and then try the common ExtINT mappings.

Please check if this one is ok.


Attachments:
(No filename) (476.00 B)
timers_12192006.patch (6.05 kB)
Download all attachments

2006-12-21 19:15:20

by Tobias Diedrich

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

Yinghai Lu wrote:
> On 12/19/06, Eric W. Biederman <[email protected]> wrote:
> >So the pin2 case should be tested right after the pin1 case as we do
> >currently. On most new boards that will be a complete noop.
> >
> >But it is better than our current blind guess at using ExtINT mode.
> >
> >I figure after we try what the BIOS has told us about and that
> >has failed we should first try the common irq 0 apic mappings,
> >and then try the common ExtINT mappings.
>
> Please check if this one is ok.

Works fine for me.

FYI I'm off to my parents from Saturday onward, so after that I
can't test any patches for the next one or two weeks.

--
Tobias PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。

2006-12-21 20:47:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

"Yinghai Lu" <[email protected]> writes:

> On 12/19/06, Eric W. Biederman <[email protected]> wrote:
>> So the pin2 case should be tested right after the pin1 case as we do
>> currently. On most new boards that will be a complete noop.
>>
>> But it is better than our current blind guess at using ExtINT mode.
>>
>> I figure after we try what the BIOS has told us about and that
>> has failed we should first try the common irq 0 apic mappings,
>> and then try the common ExtINT mappings.
>
> Please check if this one is ok.
>
> [PATCH] x86_64: check_timer with io apic setup before try_apic_pin
>
> add io apic setup before try_apic_pin
>
> cc: Andi Kleen <[email protected]>
> cc: Eric W. Biederman <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>
>
> diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
> index 2a1dcd5..6d09fc0 100644
> --- a/arch/x86_64/kernel/io_apic.c
> +++ b/arch/x86_64/kernel/io_apic.c
> @@ -273,10 +273,17 @@ static void add_pin_to_irq(unsigned int irq, int apic, int
> pin)
> struct irq_pin_list *entry = irq_2_pin + irq;
>
> BUG_ON(irq >= NR_IRQS);
> - while (entry->next)
> + while (entry->next) {
> + if (entry->apic == apic && entry->pin == pin)
> + return;
> + if (entry->pin == -1)
> + break;
> entry = irq_2_pin + entry->next;
> + }
>
> if (entry->pin != -1) {
> + if (entry->apic == apic && entry->pin == pin)
> + return;
> entry->next = first_free_entry;
> entry = irq_2_pin + entry->next;
> if (++first_free_entry >= PIN_MAP_SIZE)

This change to add_pin_to_irq looks dubious.

We especially shouldn't hit a pin == -1 while next is still valid.
The problem is that the code that reads this at irq time does not
skip entries with entry->pin == -1.

Fixing the infrastructure should probably be a separate patch
so we don't get too many concepts confused in here.

> @@ -286,6 +293,24 @@ static void add_pin_to_irq(unsigned int irq, int apic, int
> pin)
> entry->pin = pin;
> }
>
> +static void remove_pin_to_irq(unsigned int irq, int apic, int pin)
> +{
> + struct irq_pin_list *entry = irq_2_pin + irq;
> +
> + BUG_ON(irq >= NR_IRQS);
> +
> + while (entry) {
> + if (entry->apic == apic && entry->pin == pin) {
> + entry->apic = -1;
> + entry->pin = -1;
> + break;
> + }
> + if (entry->next)
> + entry = irq_2_pin + entry->next;
> + }
> +
> +}
> +
This change to remove_pin_to_irq is simply wrong.

> +static int add_irq_entry(int type, int irqflag, int bus, int irq, int apic, int
> pin)
> +{
> + struct mpc_config_intsrc intsrc;
> + int idx;
> +
> + intsrc.mpc_type = MP_INTSRC;
> + intsrc.mpc_irqflag = irqflag; /* conforming */
> + intsrc.mpc_srcbus = bus;
> + intsrc.mpc_dstapic = (apic != -1) ? mp_ioapics[apic].mpc_apicid: MP_APIC_ALL;
> +
> + intsrc.mpc_irqtype = type;
> +
> + intsrc.mpc_srcbusirq = irq;
> + intsrc.mpc_dstirq = pin;
> +
> + mp_irqs [mp_irq_entries] = intsrc;
> + Dprintk("Int: type %d, pol %d, trig %d, bus %d,"
> + " IRQ %02x, APIC ID %x, APIC INT %02x\n",
> + intsrc.mpc_irqtype, intsrc.mpc_irqflag & 3,
> + (intsrc.mpc_irqflag >> 2) & 3, intsrc.mpc_srcbus,
> + intsrc.mpc_srcbusirq, intsrc.mpc_dstapic, intsrc.mpc_dstirq);
> + idx = mp_irq_entries;
> + if (++mp_irq_entries >= MAX_IRQ_SOURCES)
> + panic("Max # of irq sources exceeded!!\n");
> + return idx;

This is fairly sane but probably belongs in mptable.c as a helper.

> /*
> * Find the pin to which IRQ[irq] (ISA) is connected
> */
> @@ -1570,6 +1658,22 @@ static inline void unlock_ExtINT_logic(void)
> * fanatically on his truly buggy board.
> */
>
> +static void set_try_apic_pin(int apic, int pin, int type)
> +{
> + int idx;
> + int irq = 0;
> + int bus = 0; /* MP_ISA_BUS */
> + int irqflag = 5; /* MP_IRQ_TRIGGER_EDGE|MP_IRQ_POLARITY_HIGH */
> +
> + idx = find_irq_entry(apic,pin,type);
> +
> + if (idx == -1)
> + idx = add_irq_entry(type, irqflag, bus, irq, apic, pin);
> +
> + add_pin_to_irq(irq, apic, pin);
> + setup_IO_APIC_irq(apic, pin, idx, irq);
> +}
> +
> static int try_apic_pin(int apic, int pin, char *msg)
> {
> apic_printk(APIC_VERBOSE, KERN_INFO
> @@ -1588,7 +1692,7 @@ static int try_apic_pin(int apic, int pin, char *msg)
> }
> return 1;
> }
> - clear_IO_APIC_pin(apic, pin);
> +
> apic_printk(APIC_QUIET, KERN_ERR " .. failed\n");
> return 0;
> }
> @@ -1599,12 +1703,13 @@ static void check_timer(void)
> int apic1, pin1, apic2, pin2;
> int vector;
> cpumask_t mask;
> + int i;
>
> /*
> * get/set the timer IRQ vector:
> */
> - disable_8259A_irq(0);
> vector = assign_irq_vector(0, TARGET_CPUS, &mask);
> + disable_8259A_irq(0);

Moving disable_8259A_irq(0) appears to be useless code motion.

> /*
> * Subtle, code in do_timer_interrupt() expects an AEOI
> @@ -1621,33 +1726,51 @@ static void check_timer(void)
> pin2 = ioapic_i8259.pin;
> apic2 = ioapic_i8259.apic;
>
> - /* Do this first, otherwise we get double interrupts on ATI boards */
> - if ((pin1 != -1) && try_apic_pin(apic1, pin1,"with 8259 IRQ0 disabled"))
> - return;
> + apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d
> apic2=%d pin2=%d\n",
> + vector, apic1, pin1, apic2, pin2);
>
> - /* Now try again with IRQ0 8259A enabled.
> - Assumes timer is on IO-APIC 0 ?!? */
> - enable_8259A_irq(0);
> - unmask_IO_APIC_irq(0);
> - if (try_apic_pin(apic1, pin1, "with 8259 IRQ0 enabled"))
> - return;
> - disable_8259A_irq(0);
> + if (pin1 != -1) {
> + /* Do this first, otherwise we get double interrupts on ATI boards */
> + /* set_try_apic_pin will call disable_8259A_irq */
> + set_try_apic_pin(apic1, pin1, mp_INT);
> + unmask_IO_APIC_irq(0);
> + if (try_apic_pin(apic1, pin1,"with 8259 IRQ0 disabled"))
> + return;
>
> - /* Always try pin0 and pin2 on APIC 0 to handle buggy timer overrides
> - on Nvidia boards */
> - if (!(apic1 == 0 && pin1 == 0) &&
> - try_apic_pin(0, 0, "fallback with 8259 IRQ0 disabled"))
> - return;
> - if (!(apic1 == 0 && pin1 == 2) &&
> - try_apic_pin(0, 2, "fallback with 8259 IRQ0 disabled"))
> - return;
> + /* Now try again with IRQ0 8259A enabled.
> + Assumes timer is on IO-APIC 0 ?!? */
> + enable_8259A_irq(0);
> + if (try_apic_pin(apic1, pin1, "with 8259 IRQ0 enabled"))
> + return;
> + disable_8259A_irq(0);

I am still trying to understand this enable_8259A_irq(0) case.
As far as I can tell this is a very backwards way of enabling
an ExtINT, as such it shouldn't be used until later.

YH do you have any insight why on some Nvidia chipsets we apic 0 pin 2 doesn't
work for the timer interrupt. I thought that was what we were using in LinuxBIOS
for the mptable.

Eric