2006-01-03 04:46:47

by Vivek Goyal

[permalink] [raw]
Subject: Inclusion of x86_64 memorize ioapic at bootup patch

Hi Andi,

Can you please include the following patch. This patch has already been pushed
by Andrew.

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-rc5/2.6.15-rc5-mm3/broken-out/x86_64-io_apicc-memorize-at-bootup-where-the-i8259-is.patch

This patch is regarding remembering at boot up time where i8259 is connected
and restore the APIC settings back during kexec boot or kdump boot. This
enables getting timer interrupts in new kernel in legacy mode.

This patch is needed to make kexec and kdump work on some systems,
especially opteron boxes. Otherwise the second kernel does not receive
timer interrupts during early boot hence hangs.

I understand, that you are inclined towards remembering all the APIC states
and simply restore it back instead of putting hooks. This will work
well for kexec but not for kdump because in kdump system can crash on
non-boot cpu.

Restoring BIOS APIC state can make sure that BIOS designated boot cpu will
always be able to see timer interrupts in legacy mode but same does not
hold good if new kernel boots on some other cpu as is the case with kdump.

In case of kexec boot, we relocate to boot cpu but in case of kdump we
don't because it was suggested that in some extreme cases of crash, boot cpu
might not respond even to NMI and relocation to boot cpu will not be
possible.

Can you please re-consider this patch for inclusion.

Thanks
Vivek


2006-01-06 00:30:08

by Lu, Yinghai

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

the patch is good.

I tried LinuxBIOS with kexec.

without this patch: I need to disable acpi in kernel. otherwise the
kernel with acpi support can boot the second kernel, but the second
kernel will hang after

time.c: Using 14.318180 MHz HPET timer.
time.c: Detected 2197.663 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 1009152k/1048576k available (2967k kernel code, 39036k reserved, 1186k )


YH

On 1/2/06, Vivek Goyal <[email protected]> wrote:
> Hi Andi,
>
> Can you please include the following patch. This patch has already been pushed
> by Andrew.
>
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-rc5/2.6.15-rc5-mm3/broken-out/x86_64-io_apicc-memorize-at-bootup-where-the-i8259-is.patch
>
> This patch is regarding remembering at boot up time where i8259 is connected
> and restore the APIC settings back during kexec boot or kdump boot. This
> enables getting timer interrupts in new kernel in legacy mode.
>
> This patch is needed to make kexec and kdump work on some systems,
> especially opteron boxes. Otherwise the second kernel does not receive
> timer interrupts during early boot hence hangs.
>
> I understand, that you are inclined towards remembering all the APIC states
> and simply restore it back instead of putting hooks. This will work
> well for kexec but not for kdump because in kdump system can crash on
> non-boot cpu.
>
> Restoring BIOS APIC state can make sure that BIOS designated boot cpu will
> always be able to see timer interrupts in legacy mode but same does not
> hold good if new kernel boots on some other cpu as is the case with kdump.
>
> In case of kexec boot, we relocate to boot cpu but in case of kdump we
> don't because it was suggested that in some extreme cases of crash, boot cpu
> might not respond even to NMI and relocation to boot cpu will not be
> possible.
>
> Can you please re-consider this patch for inclusion.
>
> Thanks
> Vivek
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-01-06 00:39:43

by Andrew Morton

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Yinghai Lu <[email protected]> wrote:
>
> the patch is good.
>
> I tried LinuxBIOS with kexec.
>
> without this patch: I need to disable acpi in kernel. otherwise the
> kernel with acpi support can boot the second kernel, but the second
> kernel will hang after
>
> time.c: Using 14.318180 MHz HPET timer.
> time.c: Detected 2197.663 MHz processor.
> Console: colour VGA+ 80x25
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
> Memory: 1009152k/1048576k available (2967k kernel code, 39036k reserved, 1186k )
>
>

Please don't top-post.

>
> On 1/2/06, Vivek Goyal <[email protected]> wrote:
> > Hi Andi,
> >
> > Can you please include the following patch. This patch has already been pushed
> > by Andrew.
> >
> > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-rc5/2.6.15-rc5-mm3/broken-out/x86_64-io_apicc-memorize-at-bootup-where-the-i8259-is.patch

IIRC, I dropped this patch because of discouraging noises from Andi and
because underlying x86_64 changes broke it in ugly ways. It needs to be
redone and Andi's objections (whatever they were) need to be addressed or
argued about.

Right now the patch is rather dead.

2006-01-06 04:50:46

by Vivek Goyal

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

On Thu, Jan 05, 2006 at 04:38:48PM -0800, Andrew Morton wrote:
> Yinghai Lu <[email protected]> wrote:
> >
> > the patch is good.
> >
> > I tried LinuxBIOS with kexec.
> >
> > without this patch: I need to disable acpi in kernel. otherwise the
> > kernel with acpi support can boot the second kernel, but the second
> > kernel will hang after
> >
> > time.c: Using 14.318180 MHz HPET timer.
> > time.c: Detected 2197.663 MHz processor.
> > Console: colour VGA+ 80x25
> > Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
> > Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
> > Memory: 1009152k/1048576k available (2967k kernel code, 39036k reserved, 1186k )
> >
> >
>
> Please don't top-post.
>
> >
> > On 1/2/06, Vivek Goyal <[email protected]> wrote:
> > > Hi Andi,
> > >
> > > Can you please include the following patch. This patch has already been pushed
> > > by Andrew.
> > >
> > > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-rc5/2.6.15-rc5-mm3/broken-out/x86_64-io_apicc-memorize-at-bootup-where-the-i8259-is.patch
>
> IIRC, I dropped this patch because of discouraging noises from Andi and
> because underlying x86_64 changes broke it in ugly ways. It needs to be
> redone and Andi's objections (whatever they were) need to be addressed or
> argued about.
>

Andrew, as per my information this patch has not broken anything. It was
other patch which tried to initialize ioapics early which had broken some
sysmtems and that patch has already been dropped.

Andi's main concern with this patch is that it has got special case
knowledge of 8259 and legacy stuff. He would rather prefer, saving all the
APIC states early during boot and restore it back during reboot.

This shall work well for kexec but will not work for kdump as we might
crash on a non-boot cpu and second kernel will come up on a non-boot cpu.
Just restoring the APIC states shall ensure that kernel can boot well on
BIOS designated boot cpu but it does not hold good for other cpus. One
example is that other cpus will not receive timer interrupts during early
boot.

Hence there does not seem to be any escape route except relocate
to boot cpu after crash and second kernel comes up on BIOS designated
boot cpu. But after crash relocating to boot cpu might not be a very
reliable thing to do.

Thanks
Vivek

2006-01-06 08:03:58

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Andrew Morton <[email protected]> writes:
>
> Please don't top-post.
>
>>
>> On 1/2/06, Vivek Goyal <[email protected]> wrote:
>> > Hi Andi,
>> >
>> > Can you please include the following patch. This patch has already been
> pushed
>> > by Andrew.
>> >
>> >
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-rc5/2.6.15-rc5-mm3/broken-out/x86_64-io_apicc-memorize-at-bootup-where-the-i8259-is.patch
>
> IIRC, I dropped this patch because of discouraging noises from Andi and
> because underlying x86_64 changes broke it in ugly ways.

Ok. I just as extensively as I could and I can't find the under laying
x86_64 changes that Andi mentioned he was working on. I have looked
in current -mm and in Andi merge and experimental quilt trees. It
could be that I'm blind but I looked and I did not see them.

Even in the discussion where this was mentioned there never was a
semantic conflict. But rather two patches passing so close they
touched the same or neighboring lines of code.

> It needs to be
> redone and Andi's objections (whatever they were) need to be addressed or
> argued about.

The difference was one of approach. Andi wanted us to treat the apics
as black boxes and save and restore register values with no regard as
to what the registers did. This is theoretically more future proof,
but it looses flexibility.

My approach is to treat the apics as something we understand, and
simply save off the one small piece of information from the boot
time state that we can't discover any other way.

The x86_64-ioapic-virtual-wire-mode-fix.patch in 2.6.15-mm1 actually
takes advantage of the fact we understand what the apics are doing
to change the destination cpu, in the kexec on panic case. This
is something that cannot be done if we simply saved off the registers.

> Right now the patch is rather dead.

Current the referred to patch applies just fine, to 2.6.15,
and except for a conflict with the above mentioned patch which
applies fine to 2.6.15-mm1 as well.

Putting the apics in a state where we can use them if fundamental
so to booting a kernel so this is something we need to resolve
if we want kexec to be usable.

A revived version of the patch that applies without patch
follows.


Eric

2006-01-06 08:16:46

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] i386 io_apic: Use correct index variable when computing the apic that is in ExtInt mode.


Somehow in all of the chaos this one line bug fix got merged with
the another patch and was then discarded when issues were found
with that other patch.

From: Vivek Goyal <[email protected]>

A minor fix to the patch which remembers the location of where i8259 is
connected. Now counter i has been replaced by apic. counter i is having
some junk value which was leading to non-detection of i8259 connected to
IOAPIC.

---

arch/i386/kernel/io_apic.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

b5a215b462de26a1e6c21f607677796f0bb446aa
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 7554f8f..f2dd218 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1649,7 +1649,7 @@ static void __init enable_IO_APIC(void)
for(apic = 0; apic < nr_ioapics; apic++) {
int pin;
/* See if any of the pins is in ExtINT mode */
- for(pin = 0; pin < nr_ioapic_registers[i]; pin++) {
+ for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
struct IO_APIC_route_entry entry;
spin_lock_irqsave(&ioapic_lock, flags);
*(((int *)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);

2006-01-06 08:24:50

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] x86_64 io_apic: memorize at bootup where the i8259 is


Currently we attempt to restore virtual wire mode on reboot, which only
works if we can figure out where the i8259 is connected. This is very
useful when we are kexec another kernel and likely helpful to an peculiar
BIOS that make assumptions about how the system is setup.

Since the acpi MADT table does not provide the location where the i8259 is
connected we have to look at the hardware to figure it out.

Most systems have the i8259 connected the local apic of the cpu so won't be
affected but people running Opteron and some serverworks chipsets should be
able to use kexec now.

In addition this patch removes the hard coded assumption that the io_apic
that delivers isa interrups is always known to the kernel as io_apic 0.
There does not appear to be anything to guarantee that assumption is true.

This patch does not do a blind save and restore of ioapci registers
as that looses the flexibility that is present when you understand what
the registers actually do. Currently in the kexec on panic case we actually
use that flexibility to route interrupts all interrupts to the cpu we
are rebooting on.

---

arch/x86_64/kernel/io_apic.c | 143 ++++++++++++++++++++++++++++++++----------
1 files changed, 108 insertions(+), 35 deletions(-)

6a51f08f75e2087c50d088c8af21fb98f0ae87a6
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index ac7a273..de4ad4e 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -46,6 +46,9 @@ static int no_timer_check;

int disable_timer_pin_1 __initdata;

+/* Where if anywhere is the i8259 connect in external int mode */
+static struct { int pin, apic; } ioapic_i8259 = { -1, -1 };
+
static DEFINE_SPINLOCK(ioapic_lock);

/*
@@ -360,7 +363,7 @@ static int find_irq_entry(int apic, int
/*
* Find the pin to which IRQ[irq] (ISA) is connected
*/
-static int find_isa_irq_pin(int irq, int type)
+static int __init find_isa_irq_pin(int irq, int type)
{
int i;

@@ -378,6 +381,31 @@ static int find_isa_irq_pin(int irq, int
return -1;
}

+static int __init find_isa_irq_apic(int irq, int type)
+{
+ int i;
+
+ for (i = 0; i < mp_irq_entries; i++) {
+ int lbus = mp_irqs[i].mpc_srcbus;
+
+ if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA ||
+ mp_bus_id_to_type[lbus] == MP_BUS_EISA ||
+ mp_bus_id_to_type[lbus] == MP_BUS_MCA) &&
+ (mp_irqs[i].mpc_irqtype == type) &&
+ (mp_irqs[i].mpc_srcbusirq == irq))
+ break;
+ }
+ if (i < mp_irq_entries) {
+ int apic;
+ for(apic = 0; apic < nr_ioapics; apic++) {
+ if (mp_ioapics[apic].mpc_apicid == mp_irqs[i].mpc_dstapic)
+ return apic;
+ }
+ }
+
+ return -1;
+}
+
/*
* Find a specific PCI IRQ entry.
* Not an __init, possibly needed by modules
@@ -871,7 +899,7 @@ static void __init setup_IO_APIC_irqs(vo
* Set up the 8259A-master output pin as broadcast to all
* CPUs.
*/
-static void __init setup_ExtINT_IRQ0_pin(unsigned int pin, int vector)
+static void __init setup_ExtINT_IRQ0_pin(unsigned int apic, unsigned int pin, int vector)
{
struct IO_APIC_route_entry entry;
unsigned long flags;
@@ -905,8 +933,8 @@ static void __init setup_ExtINT_IRQ0_pin
* Add it to the IO-APIC irq-routing table:
*/
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(0, 0x10+2*pin, *(((int *)&entry)+0));
+ io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
+ io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
spin_unlock_irqrestore(&ioapic_lock, flags);

enable_8259A_irq(0);
@@ -1185,7 +1213,8 @@ void __apicdebuginit print_PIC(void)
static void __init enable_IO_APIC(void)
{
union IO_APIC_reg_01 reg_01;
- int i;
+ int i8259_apic, i8259_pin;
+ int i, apic;
unsigned long flags;

for (i = 0; i < PIN_MAP_SIZE; i++) {
@@ -1199,11 +1228,48 @@ static void __init enable_IO_APIC(void)
/*
* The number of IO-APIC IRQ registers (== #pins):
*/
- for (i = 0; i < nr_ioapics; i++) {
+ for (apic = 0; apic < nr_ioapics; apic++) {
spin_lock_irqsave(&ioapic_lock, flags);
- reg_01.raw = io_apic_read(i, 1);
+ reg_01.raw = io_apic_read(apic, 1);
spin_unlock_irqrestore(&ioapic_lock, flags);
- nr_ioapic_registers[i] = reg_01.bits.entries+1;
+ nr_ioapic_registers[apic] = reg_01.bits.entries+1;
+ }
+ for(apic = 0; apic < nr_ioapics; apic++) {
+ int pin;
+ /* See if any of the pins is in ExtINT mode */
+ for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+ struct IO_APIC_route_entry entry;
+ spin_lock_irqsave(&ioapic_lock, flags);
+ *(((int *)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
+ *(((int *)&entry) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+
+
+ /* If the interrupt line is enabled and in ExtInt mode
+ * I have found the pin where the i8259 is connected.
+ */
+ if ((entry.mask == 0) && (entry.delivery_mode == dest_ExtINT)) {
+ ioapic_i8259.apic = apic;
+ ioapic_i8259.pin = pin;
+ goto found_i8259;
+ }
+ }
+ }
+ found_i8259:
+ /* Look to see what if the MP table has reported the ExtINT */
+ i8259_pin = find_isa_irq_pin(0, mp_ExtINT);
+ i8259_apic = find_isa_irq_apic(0, mp_ExtINT);
+ /* Trust the MP table if nothing is setup in the hardware */
+ if ((ioapic_i8259.pin == -1) && (i8259_pin >= 0)) {
+ printk(KERN_WARNING "ExtINT not setup in hardware but reported by MP table\n");
+ ioapic_i8259.pin = i8259_pin;
+ ioapic_i8259.apic = i8259_apic;
+ }
+ /* Complain if the MP table and the hardware disagree */
+ if (((ioapic_i8259.apic != i8259_apic) || (ioapic_i8259.pin != i8259_pin)) &&
+ (i8259_pin >= 0) && (ioapic_i8259.pin >= 0))
+ {
+ printk(KERN_WARNING "ExtINT in hardware and MP table differ\n");
}

/*
@@ -1217,7 +1283,6 @@ static void __init enable_IO_APIC(void)
*/
void disable_IO_APIC(void)
{
- int pin;
/*
* Clear the IO-APIC before rebooting:
*/
@@ -1228,8 +1293,7 @@ void disable_IO_APIC(void)
* Put that IOAPIC in virtual wire mode
* so legacy interrupts can be delivered.
*/
- pin = find_isa_irq_pin(0, mp_ExtINT);
- if (pin != -1) {
+ if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;
unsigned long flags;

@@ -1240,7 +1304,7 @@ void disable_IO_APIC(void)
entry.polarity = 0; /* High */
entry.delivery_status = 0;
entry.dest_mode = 0; /* Physical */
- entry.delivery_mode = 7; /* ExtInt */
+ entry.delivery_mode = dest_ExtINT; /* ExtInt */
entry.vector = 0;
entry.dest.physical.physical_dest =
GET_APIC_ID(apic_read(APIC_ID));
@@ -1249,12 +1313,14 @@ void disable_IO_APIC(void)
* Add it to the IO-APIC irq-routing table:
*/
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(0, 0x10+2*pin, *(((int *)&entry)+0));
+ io_apic_write(ioapic_i8259.apic, 0x11+2*ioapic_i8259.pin,
+ *(((int *)&entry)+1));
+ io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
+ *(((int *)&entry)+1));
spin_unlock_irqrestore(&ioapic_lock, flags);
}

- disconnect_bsp_APIC(pin != -1);
+ disconnect_bsp_APIC(ioapci_i8259.pin != -1);
}

/*
@@ -1623,20 +1689,21 @@ static void setup_nmi (void)
*/
static inline void unlock_ExtINT_logic(void)
{
- int pin, i;
+ int apic, pin, i;
struct IO_APIC_route_entry entry0, entry1;
unsigned char save_control, save_freq_select;
unsigned long flags;

- pin = find_isa_irq_pin(8, mp_INT);
+ pin = find_isa_irq_pin(8, mp_INT);
+ apic = find_isa_irq_apic(8, mp_INT);
if (pin == -1)
return;

spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry0) + 1) = io_apic_read(0, 0x11 + 2 * pin);
- *(((int *)&entry0) + 0) = io_apic_read(0, 0x10 + 2 * pin);
+ *(((int *)&entry0) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
+ *(((int *)&entry0) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
spin_unlock_irqrestore(&ioapic_lock, flags);
- clear_IO_APIC_pin(0, pin);
+ clear_IO_APIC_pin(apic, pin);

memset(&entry1, 0, sizeof(entry1));

@@ -1649,8 +1716,8 @@ static inline void unlock_ExtINT_logic(v
entry1.vector = 0;

spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry1) + 1));
- io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry1) + 0));
+ io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry1) + 1));
+ io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry1) + 0));
spin_unlock_irqrestore(&ioapic_lock, flags);

save_control = CMOS_READ(RTC_CONTROL);
@@ -1668,11 +1735,11 @@ static inline void unlock_ExtINT_logic(v

CMOS_WRITE(save_control, RTC_CONTROL);
CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT);
- clear_IO_APIC_pin(0, pin);
+ clear_IO_APIC_pin(apic, pin);

spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry0) + 1));
- io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry0) + 0));
+ io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry0) + 1));
+ io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry0) + 0));
spin_unlock_irqrestore(&ioapic_lock, flags);
}

@@ -1684,7 +1751,7 @@ static inline void unlock_ExtINT_logic(v
*/
static inline void check_timer(void)
{
- int pin1, pin2;
+ int apic1, pin1, apic2, pin2;
int vector;

/*
@@ -1705,10 +1772,13 @@ static inline void check_timer(void)
init_8259A(1);
enable_8259A_irq(0);

- pin1 = find_isa_irq_pin(0, mp_INT);
- pin2 = find_isa_irq_pin(0, mp_ExtINT);
+ pin1 = find_isa_irq_pin(0, mp_INT);
+ apic1 = find_isa_irq_apic(0, mp_INT);
+ pin2 = ioapic_i8259.pin;
+ apic2 = ioapic_i8259.apic;

- apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X pin1=%d pin2=%d\n", vector, pin1, pin2);
+ apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
+ vector, apic1, pin1, apic2, pin2);

if (pin1 != -1) {
/*
@@ -1726,17 +1796,20 @@ static inline void check_timer(void)
clear_IO_APIC_pin(0, pin1);
return;
}
- clear_IO_APIC_pin(0, pin1);
- apic_printk(APIC_QUIET,KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n");
+ clear_IO_APIC_pin(apic1, pin1);
+ apic_printk(APIC_QUIET,KERN_ERR "..MP-BIOS bug: 8254 timer not "
+ "connected to IO-APIC\n");
}

- apic_printk(APIC_VERBOSE,KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
+ apic_printk(APIC_VERBOSE,KERN_INFO "...trying to set up timer (IRQ0) "
+ "through the 8259A ... ");
if (pin2 != -1) {
- apic_printk(APIC_VERBOSE,"\n..... (found pin %d) ...", pin2);
+ apic_printk(APIC_VERBOSE,"\n..... (found apic %d pin %d) ...",
+ apic2, pin2);
/*
* legacy devices should be connected to IO APIC #0
*/
- setup_ExtINT_IRQ0_pin(pin2, vector);
+ setup_ExtINT_IRQ0_pin(apic2, pin2, vector);
if (timer_irq_works()) {
printk("works.\n");
nmi_watchdog_default();
@@ -1748,7 +1821,7 @@ static inline void check_timer(void)
/*
* Cleanup, just in case ...
*/
- clear_IO_APIC_pin(0, pin2);
+ clear_IO_APIC_pin(apic2, pin2);
}
printk(" failed.\n");


2006-01-06 08:26:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Yinghai Lu <[email protected]> writes:

> the patch is good.
>
> I tried LinuxBIOS with kexec.
>
> without this patch: I need to disable acpi in kernel. otherwise the
> kernel with acpi support can boot the second kernel, but the second
> kernel will hang after
>
> time.c: Using 14.318180 MHz HPET timer.
> time.c: Detected 2197.663 MHz processor.
> Console: colour VGA+ 80x25
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
> Memory: 1009152k/1048576k available (2967k kernel code, 39036k reserved, 1186k )

Yes. This is the reason the patch was written. Every bios that
implements acpi has this problem.

Eric

2006-01-06 15:29:17

by Ronald G Minnich

[permalink] [raw]
Subject: Re: [LinuxBIOS] Inclusion of x86_64 memorize ioapic at bootup patch

I'm just doing a reply to this message so you all can continue the
discussion without the posting problems to the closed linuxbios list.

Yh Lu, please cc: me and ollie if you can but not the linuxbios list on
these discussions. It is going to annoy people when they get the bounce
message.

I should add that I never quite understood Andi's objections to the
patch being discussed.

thanks

ron

2006-01-06 18:59:16

by Andi Kleen

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

On Fri, Jan 06, 2006 at 01:02:16AM -0700, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
> >
> > Please don't top-post.
> >
> >>
> >> On 1/2/06, Vivek Goyal <[email protected]> wrote:
> >> > Hi Andi,
> >> >
> >> > Can you please include the following patch. This patch has already been
> > pushed
> >> > by Andrew.
> >> >
> >> >
> > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-rc5/2.6.15-rc5-mm3/broken-out/x86_64-io_apicc-memorize-at-bootup-where-the-i8259-is.patch
> >
> > IIRC, I dropped this patch because of discouraging noises from Andi and
> > because underlying x86_64 changes broke it in ugly ways.
>
> Ok. I just as extensively as I could and I can't find the under laying
> x86_64 changes that Andi mentioned he was working on. I have looked
> in current -mm and in Andi merge and experimental quilt trees. It
> could be that I'm blind but I looked and I did not see them.
>
> Even in the discussion where this was mentioned there never was a
> semantic conflict. But rather two patches passing so close they
> touched the same or neighboring lines of code.
>
> > It needs to be
> > redone and Andi's objections (whatever they were) need to be addressed or
> > argued about.
>
> The difference was one of approach. Andi wanted us to treat the apics
> as black boxes and save and restore register values with no regard as
> to what the registers did. This is theoretically more future proof,
> but it looses flexibility.

Well I still think it would be better to do it in the generic way,
but i'm not feeling very strongly about it anymore.

> to change the destination cpu, in the kexec on panic case. This
> is something that cannot be done if we simply saved off the registers.
>
> > Right now the patch is rather dead.
>
> Current the referred to patch applies just fine, to 2.6.15,
> and except for a conflict with the above mentioned patch which
> applies fine to 2.6.15-mm1 as well.


It conflicts with the x86-64 timer routing rewrite I did, but that's currently
on hold because it has some other issues. I can merge them later, no problem.

-Andi

2006-01-06 23:48:10

by Andi Kleen

[permalink] [raw]

2006-01-07 00:00:47

by Lu, Yinghai

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Eric,

Do you try kexec with Nvidia ck804 based MB? it seems some one modify
the mptable but not update the checksum ...

YH

The first kernel said:

..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
..... (found apic 0 pin 0) ...works.
testing the IO APIC.......................


.................................... done.
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.



LBsuse91AMD64:/x/xx/xx/elf # ../kexec -l ram0_2.5_2.6.15_k8.1_mydisk8_x86_64.elf
LBsuse91AMD64:/x/xx/xx/elf # ../kexec -e
Starting new kernel
Firmware type: LinuxBIOS
old bootloader convention, maybe loadlin?
Bootdata ok (command line is apic=debug pci=routeirq
ramdisk_size=65536 root=/dev/ram0 rw console=tty0
console=ttyS0,115200n8 )
Linux version 2.6.15-gdb9edfd7 (root@yhlunb) (gcc version 4.0.2
20050901 (prerelease) (SUSE Linux)) #7 SMP Fri Jan 6 15:18:18 PST 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000000e7c (reserved)
BIOS-e820: 0000000000000e7c - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 00000000000f0400 (reserved)
BIOS-e820: 0000000000100000 - 00000000c0000000 (usable)
BIOS-e820: 0000000100000000 - 0000000240000000 (usable)
ACPI: Unable to locate RSDP
Scanning NUMA topology in Northbridge 24
Number of nodes 4
Node 0 MemBase 0000000000000000 Limit 0000000080000000
Node 1 MemBase 0000000080000000 Limit 0000000140000000
Node 2 MemBase 0000000140000000 Limit 00000001c0000000
Node 3 MemBase 00000001c0000000 Limit 0000000240000000
Using node hash shift of 30
Bootmem setup node 0 0000000000000000-0000000080000000
Bootmem setup node 1 0000000080000000-0000000140000000
Bootmem setup node 2 0000000140000000-00000001c0000000
Bootmem setup node 3 00000001c0000000-0000000240000000
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
SMP mptable: checksum error!
BIOS bug, MP table errors detected!...
... disabling SMP support. (tell your hw vendor)
Allocating PCI resources starting at c4000000 (gap: c0000000:40000000)
Checking aperture...
CPU 0: aperture @ f8000000 size 64 MB
CPU 1: aperture @ f8000000 size 64 MB
CPU 2: aperture @ f8000000 size 64 MB
CPU 3: aperture @ f8000000 size 64 MB
Built 4 zonelists
Kernel command line: apic=debug pci=routeirq ramdisk_size=65536
root=/dev/ram0 rw console=tty0 console=ttyS0,115200n8
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 1809.308 MHz processor.
Console: colour dummy device 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Memory: 8223632k/9437184k available (2958k kernel code, 164588k
reserved, 1183k data, 228k init)
Calibrating delay using timer specific routine.. 3623.87 BogoMIPS (lpj=7247741)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Node 0 -> Core 0
mtrr: v2.0 (20020519)
weird, boot CPU (#16) not listed by the BIOS.
SMP motherboard not detected.
Getting VERSION: 40010
Getting VERSION: 40010
Getting ID: 10000000
Getting ID: ef000000
Getting LVT0: 700
Getting LVT1: 400
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/kernel/apic.c:333
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.15-gdb9edfd7 #7
RIP: 0010:[<ffffffff8056cd64>] <ffffffff8056cd64>{setup_local_APIC+23}
RSP: 0000:ffff810141c49eb8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
RDX: 00000000ffffff01 RSI: ffff810141c49f08 RDI: ffffffff80518fc0
RBP: 0000000000000000 R08: 0000000000000720 R09: 00000000ffffffff
R10: 00000000ffffffff R11: ffffffff8023dfa5 R12: 0000000000000010
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff80557800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000005adf18 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff810141c48000, task ffff810003619480)
Stack: 00000000ffffffff ffffffff8056d089 0000000000010000 0000000000000000
0000000000000000 0000000000000000 0000000000010000 0000000000000000
0000000000000000 0000000000000000
Call Trace:<ffffffff8056d089>{APIC_init_uniprocessor+151}
<ffffffff8056beb1>{smp_prepare_cpus+637}
<ffffffff8010b07a>{init+54} <ffffffff8010b044>{init+0}
<ffffffff8010e662>{child_rip+8} <ffffffff8010b044>{init+0}
<ffffffff8010e65a>{child_rip+0}

Code: 0f 0b 68 15 63 40 80 c2 4d 01 48 8b 05 bb 6d f0 ff ff 50 28
RIP <ffffffff8056cd64>{setup_local_APIC+23} RSP <ffff810141c49eb8>
<0>Kernel panic - not syncing: Attempted to kill init!

2006-01-07 00:31:06

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Yinghai Lu <[email protected]> writes:

> Eric,
>
> Do you try kexec with Nvidia ck804 based MB? it seems some one modify
> the mptable but not update the checksum ...

We've got a cluster using 2.6.14 booting over infiniband that
way.

Eric

2006-01-07 00:36:07

by Lu, Yinghai

[permalink] [raw]
Subject: RE: Inclusion of x86_64 memorize ioapic at bootup patch

Thanks. You don't need Etherboot with IB later....

How about the size of your first kernel and initrd? Are they in IDE
Flash?

YH

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Friday, January 06, 2006 4:30 PM
To: Lu, Yinghai
Cc: Andi Kleen; Vivek Goyal; Fastboot mailing list; linux kernel mailing
list; Morton Andrew Morton
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Yinghai Lu <[email protected]> writes:

> Eric,
>
> Do you try kexec with Nvidia ck804 based MB? it seems some one modify
> the mptable but not update the checksum ...

We've got a cluster using 2.6.14 booting over infiniband that
way.

Eric


2006-01-07 00:44:05

by Lu, Yinghai

[permalink] [raw]
Subject: Re: [PATCH] x86_64 io_apic: memorize at bootup where the i8259 is

On 1/6/06, Eric W. Biederman <[email protected]> wrote:
>
>@@ -1249,12 +1313,14 @@ void disable_IO_APIC(void)
> * Add it to the IO-APIC irq-routing table:
> */
> spin_lock_irqsave(&ioapic_lock, flags);
>- io_apic_write(0, 0x11+2*pin, *(((int *)&entry)+1));
>- io_apic_write(0, 0x10+2*pin, *(((int *)&entry)+0));
>+ io_apic_write(ioapic_i8259.apic, 0x11+2*ioapic_i8259.pin,
>+ *(((int *)&entry)+1));
>+ io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
>+ *(((int *)&entry)+1));
> spin_unlock_irqrestore(&ioapic_lock, flags);
> }
>
>- disconnect_bsp_APIC(pin != -1);
>+ disconnect_bsp_APIC(ioapci_i8259.pin != -1);
> }

There is a typo

+ io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
+ *(((int *)&entry)+1));

===>

+ io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
+ *(((int *)&entry)+0));

YH

2006-01-07 01:15:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

"Lu, Yinghai" <[email protected]> writes:

> Thanks. You don't need Etherboot with IB later....
>
> How about the size of your first kernel and initrd? Are they in IDE
> Flash?

Yes. It is a bproc system for LANL. 2 kernel monte finally broke
so we did a quick switch kexec to get things moving.

Small initrd are a separate issue entirely.

Eric

2006-01-07 01:30:58

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] x86_64 io_apic: memorize at bootup where the i8259 is (typo fix)


> There is a typo
Yep I fat fingered the merge. Thanks. Here is the correct patch.

Currently we attempt to restore virtual wire mode on reboot, which only
works if we can figure out where the i8259 is connected. This is very
useful when we are kexec another kernel and likely helpful to an peculiar
BIOS that make assumptions about how the system is setup.

Since the acpi MADT table does not provide the location where the i8259 is
connected we have to look at the hardware to figure it out.

Most systems have the i8259 connected the local apic of the cpu so won't be
affected but people running Opteron and some serverworks chipsets should be
able to use kexec now.

In addition this patch removes the hard coded assumption that the io_apic
that delivers isa interrups is always known to the kernel as io_apic 0.
There does not appear to be anything to guarantee that assumption is true.

This patch does not do a blind save and restore of ioapci registers
as that looses the flexibility that is present when you understand what
the registers actually do. Currently in the kexec on panic case we actually
use that flexibility to route interrupts all interrupts to the cpu we
are rebooting on.

Signed-off-by: Eric W. Biederman <[email protected]>


---

arch/x86_64/kernel/io_apic.c | 143 ++++++++++++++++++++++++++++++++----------
1 files changed, 108 insertions(+), 35 deletions(-)

357de3b2f2ca68436615e2b017f00b8588eabdf2
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index ac7a273..bbf7887 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -46,6 +46,9 @@ static int no_timer_check;

int disable_timer_pin_1 __initdata;

+/* Where if anywhere is the i8259 connect in external int mode */
+static struct { int pin, apic; } ioapic_i8259 = { -1, -1 };
+
static DEFINE_SPINLOCK(ioapic_lock);

/*
@@ -360,7 +363,7 @@ static int find_irq_entry(int apic, int
/*
* Find the pin to which IRQ[irq] (ISA) is connected
*/
-static int find_isa_irq_pin(int irq, int type)
+static int __init find_isa_irq_pin(int irq, int type)
{
int i;

@@ -378,6 +381,31 @@ static int find_isa_irq_pin(int irq, int
return -1;
}

+static int __init find_isa_irq_apic(int irq, int type)
+{
+ int i;
+
+ for (i = 0; i < mp_irq_entries; i++) {
+ int lbus = mp_irqs[i].mpc_srcbus;
+
+ if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA ||
+ mp_bus_id_to_type[lbus] == MP_BUS_EISA ||
+ mp_bus_id_to_type[lbus] == MP_BUS_MCA) &&
+ (mp_irqs[i].mpc_irqtype == type) &&
+ (mp_irqs[i].mpc_srcbusirq == irq))
+ break;
+ }
+ if (i < mp_irq_entries) {
+ int apic;
+ for(apic = 0; apic < nr_ioapics; apic++) {
+ if (mp_ioapics[apic].mpc_apicid == mp_irqs[i].mpc_dstapic)
+ return apic;
+ }
+ }
+
+ return -1;
+}
+
/*
* Find a specific PCI IRQ entry.
* Not an __init, possibly needed by modules
@@ -871,7 +899,7 @@ static void __init setup_IO_APIC_irqs(vo
* Set up the 8259A-master output pin as broadcast to all
* CPUs.
*/
-static void __init setup_ExtINT_IRQ0_pin(unsigned int pin, int vector)
+static void __init setup_ExtINT_IRQ0_pin(unsigned int apic, unsigned int pin, int vector)
{
struct IO_APIC_route_entry entry;
unsigned long flags;
@@ -905,8 +933,8 @@ static void __init setup_ExtINT_IRQ0_pin
* Add it to the IO-APIC irq-routing table:
*/
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(0, 0x10+2*pin, *(((int *)&entry)+0));
+ io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
+ io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
spin_unlock_irqrestore(&ioapic_lock, flags);

enable_8259A_irq(0);
@@ -1185,7 +1213,8 @@ void __apicdebuginit print_PIC(void)
static void __init enable_IO_APIC(void)
{
union IO_APIC_reg_01 reg_01;
- int i;
+ int i8259_apic, i8259_pin;
+ int i, apic;
unsigned long flags;

for (i = 0; i < PIN_MAP_SIZE; i++) {
@@ -1199,11 +1228,48 @@ static void __init enable_IO_APIC(void)
/*
* The number of IO-APIC IRQ registers (== #pins):
*/
- for (i = 0; i < nr_ioapics; i++) {
+ for (apic = 0; apic < nr_ioapics; apic++) {
spin_lock_irqsave(&ioapic_lock, flags);
- reg_01.raw = io_apic_read(i, 1);
+ reg_01.raw = io_apic_read(apic, 1);
spin_unlock_irqrestore(&ioapic_lock, flags);
- nr_ioapic_registers[i] = reg_01.bits.entries+1;
+ nr_ioapic_registers[apic] = reg_01.bits.entries+1;
+ }
+ for(apic = 0; apic < nr_ioapics; apic++) {
+ int pin;
+ /* See if any of the pins is in ExtINT mode */
+ for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+ struct IO_APIC_route_entry entry;
+ spin_lock_irqsave(&ioapic_lock, flags);
+ *(((int *)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
+ *(((int *)&entry) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+
+
+ /* If the interrupt line is enabled and in ExtInt mode
+ * I have found the pin where the i8259 is connected.
+ */
+ if ((entry.mask == 0) && (entry.delivery_mode == dest_ExtINT)) {
+ ioapic_i8259.apic = apic;
+ ioapic_i8259.pin = pin;
+ goto found_i8259;
+ }
+ }
+ }
+ found_i8259:
+ /* Look to see what if the MP table has reported the ExtINT */
+ i8259_pin = find_isa_irq_pin(0, mp_ExtINT);
+ i8259_apic = find_isa_irq_apic(0, mp_ExtINT);
+ /* Trust the MP table if nothing is setup in the hardware */
+ if ((ioapic_i8259.pin == -1) && (i8259_pin >= 0)) {
+ printk(KERN_WARNING "ExtINT not setup in hardware but reported by MP table\n");
+ ioapic_i8259.pin = i8259_pin;
+ ioapic_i8259.apic = i8259_apic;
+ }
+ /* Complain if the MP table and the hardware disagree */
+ if (((ioapic_i8259.apic != i8259_apic) || (ioapic_i8259.pin != i8259_pin)) &&
+ (i8259_pin >= 0) && (ioapic_i8259.pin >= 0))
+ {
+ printk(KERN_WARNING "ExtINT in hardware and MP table differ\n");
}

/*
@@ -1217,7 +1283,6 @@ static void __init enable_IO_APIC(void)
*/
void disable_IO_APIC(void)
{
- int pin;
/*
* Clear the IO-APIC before rebooting:
*/
@@ -1228,8 +1293,7 @@ void disable_IO_APIC(void)
* Put that IOAPIC in virtual wire mode
* so legacy interrupts can be delivered.
*/
- pin = find_isa_irq_pin(0, mp_ExtINT);
- if (pin != -1) {
+ if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;
unsigned long flags;

@@ -1240,7 +1304,7 @@ void disable_IO_APIC(void)
entry.polarity = 0; /* High */
entry.delivery_status = 0;
entry.dest_mode = 0; /* Physical */
- entry.delivery_mode = 7; /* ExtInt */
+ entry.delivery_mode = dest_ExtINT; /* ExtInt */
entry.vector = 0;
entry.dest.physical.physical_dest =
GET_APIC_ID(apic_read(APIC_ID));
@@ -1249,12 +1313,14 @@ void disable_IO_APIC(void)
* Add it to the IO-APIC irq-routing table:
*/
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(0, 0x10+2*pin, *(((int *)&entry)+0));
+ io_apic_write(ioapic_i8259.apic, 0x11+2*ioapic_i8259.pin,
+ *(((int *)&entry)+1));
+ io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
+ *(((int *)&entry)+0));
spin_unlock_irqrestore(&ioapic_lock, flags);
}

- disconnect_bsp_APIC(pin != -1);
+ disconnect_bsp_APIC(ioapci_i8259.pin != -1);
}

/*
@@ -1623,20 +1689,21 @@ static void setup_nmi (void)
*/
static inline void unlock_ExtINT_logic(void)
{
- int pin, i;
+ int apic, pin, i;
struct IO_APIC_route_entry entry0, entry1;
unsigned char save_control, save_freq_select;
unsigned long flags;

- pin = find_isa_irq_pin(8, mp_INT);
+ pin = find_isa_irq_pin(8, mp_INT);
+ apic = find_isa_irq_apic(8, mp_INT);
if (pin == -1)
return;

spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry0) + 1) = io_apic_read(0, 0x11 + 2 * pin);
- *(((int *)&entry0) + 0) = io_apic_read(0, 0x10 + 2 * pin);
+ *(((int *)&entry0) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
+ *(((int *)&entry0) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
spin_unlock_irqrestore(&ioapic_lock, flags);
- clear_IO_APIC_pin(0, pin);
+ clear_IO_APIC_pin(apic, pin);

memset(&entry1, 0, sizeof(entry1));

@@ -1649,8 +1716,8 @@ static inline void unlock_ExtINT_logic(v
entry1.vector = 0;

spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry1) + 1));
- io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry1) + 0));
+ io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry1) + 1));
+ io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry1) + 0));
spin_unlock_irqrestore(&ioapic_lock, flags);

save_control = CMOS_READ(RTC_CONTROL);
@@ -1668,11 +1735,11 @@ static inline void unlock_ExtINT_logic(v

CMOS_WRITE(save_control, RTC_CONTROL);
CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT);
- clear_IO_APIC_pin(0, pin);
+ clear_IO_APIC_pin(apic, pin);

spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry0) + 1));
- io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry0) + 0));
+ io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry0) + 1));
+ io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry0) + 0));
spin_unlock_irqrestore(&ioapic_lock, flags);
}

@@ -1684,7 +1751,7 @@ static inline void unlock_ExtINT_logic(v
*/
static inline void check_timer(void)
{
- int pin1, pin2;
+ int apic1, pin1, apic2, pin2;
int vector;

/*
@@ -1705,10 +1772,13 @@ static inline void check_timer(void)
init_8259A(1);
enable_8259A_irq(0);

- pin1 = find_isa_irq_pin(0, mp_INT);
- pin2 = find_isa_irq_pin(0, mp_ExtINT);
+ pin1 = find_isa_irq_pin(0, mp_INT);
+ apic1 = find_isa_irq_apic(0, mp_INT);
+ pin2 = ioapic_i8259.pin;
+ apic2 = ioapic_i8259.apic;

- apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X pin1=%d pin2=%d\n", vector, pin1, pin2);
+ apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
+ vector, apic1, pin1, apic2, pin2);

if (pin1 != -1) {
/*
@@ -1726,17 +1796,20 @@ static inline void check_timer(void)
clear_IO_APIC_pin(0, pin1);
return;
}
- clear_IO_APIC_pin(0, pin1);
- apic_printk(APIC_QUIET,KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n");
+ clear_IO_APIC_pin(apic1, pin1);
+ apic_printk(APIC_QUIET,KERN_ERR "..MP-BIOS bug: 8254 timer not "
+ "connected to IO-APIC\n");
}

- apic_printk(APIC_VERBOSE,KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
+ apic_printk(APIC_VERBOSE,KERN_INFO "...trying to set up timer (IRQ0) "
+ "through the 8259A ... ");
if (pin2 != -1) {
- apic_printk(APIC_VERBOSE,"\n..... (found pin %d) ...", pin2);
+ apic_printk(APIC_VERBOSE,"\n..... (found apic %d pin %d) ...",
+ apic2, pin2);
/*
* legacy devices should be connected to IO APIC #0
*/
- setup_ExtINT_IRQ0_pin(pin2, vector);
+ setup_ExtINT_IRQ0_pin(apic2, pin2, vector);
if (timer_irq_works()) {
printk("works.\n");
nmi_watchdog_default();
@@ -1748,7 +1821,7 @@ static inline void check_timer(void)
/*
* Cleanup, just in case ...
*/
- clear_IO_APIC_pin(0, pin2);
+ clear_IO_APIC_pin(apic2, pin2);
}
printk(" failed.\n");

--
1.0.GIT

2006-01-07 01:33:04

by Lu, Yinghai

[permalink] [raw]
Subject: RE: Inclusion of x86_64 memorize ioapic at bootup patch

I tried to comment the checksum check, then it can boot well...,
So it must be someone modify the range (add entries...) and do not
update the mptable correctly...

YH

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Friday, January 06, 2006 4:30 PM
To: Lu, Yinghai
Cc: Andi Kleen; Vivek Goyal; Fastboot mailing list; linux kernel mailing
list; Morton Andrew Morton
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

Yinghai Lu <[email protected]> writes:

> Eric,
>
> Do you try kexec with Nvidia ck804 based MB? it seems some one modify
> the mptable but not update the checksum ...

We've got a cluster using 2.6.14 booting over infiniband that
way.

Eric


2006-01-07 02:32:11

by Yinghai Lu

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

some code clear 3 byte in mptable from 0x468

00000460: 41 01 09 12 03 00 0f 00 41 02 09 13 03 00 0f 00
00000470: 41 03 09 10 03 03 05 00 43 00 ff 00 03 01 05 00
00000480: 43 00 ff 01

00000460: 41 01 09 12 03 00 0f 00 00 00 00 13 03 00 0f 00
00000470: 41 03 09 10 03 03 05 00 43 00 ff 00 03 01 05 00
00000480: 43 00 ff 01

it is third irq entry for pcie slot....

//Slot PCIE x4
for(i=0;i<4;i++) {
smp_write_intsrc(mc, mp_INT,
MP_IRQ_TRIGGER_LEVEL|MP_IRQ_POLARITY_LOW, bus_ck804b_4, (0x00<<2)|i,
apicid_ck804b, 0x10 + (1+i+4-sbdnb%4)%4);
}


/*Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN#*/
smp_write_intsrc(mc, mp_ExtINT,
MP_IRQ_TRIGGER_EDGE|MP_IRQ_POLARITY_HIGH, bus_isa, 0x0, MP_APIC_ALL,
0x0);
smp_write_intsrc(mc, mp_NMI,
MP_IRQ_TRIGGER_EDGE|MP_IRQ_POLARITY_HIGH, bus_isa, 0x0, MP_APIC_ALL,
0x1);

the range already in e820 reserved area...

Bootdata ok (command line is apic=debug pci=routeirq
ramdisk_size=65536 root=/dev/ram0 rw console=tty0
console=ttyS0,115200n8 )
Linux version 2.6.15-gdb9edfd7 (root@yhlunb) (gcc version 4.0.2
20050901 (prerelease) (SUSE Linux)) #13 SMP Fri Jan 6 17:58:25 PST
2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000000e7c (reserved)
BIOS-e820: 0000000000000e7c - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 00000000000f0400 (reserved)
BIOS-e820: 0000000000100000 - 00000000c0000000 (usable)
BIOS-e820: 0000000100000000 - 0000000240000000 (usable)

YH

2006-01-07 06:38:45

by Yinghai Lu

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

andi,

in the smpboot.c, why you need to use 0x467, and 0x469 ....

Dprintk("1.\n");
*((volatile unsigned short *) phys_to_virt(0x469)) = start_rip >> 4;
Dprintk("2.\n");
*((volatile unsigned short *) phys_to_virt(0x467)) = start_rip & 0xf;
Dprintk("3.\n");

YH

2006-01-07 07:20:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

It seems the i386 is the same.

also why the addr (0x467) is not word align....?

YH

2006-01-07 09:43:24

by Yinghai Lu

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

andi,

In LinuxBIOS, we don't set the MPS 0x467 and the AP still can be started by BSP.

are these really needed for x86_64?

Dprintk("Setting warm reset code and vector.\n");

CMOS_WRITE(0xa, 0xf);
local_flush_tlb();
Dprintk("1.\n");
*((volatile unsigned short *) phys_to_virt(0x469)) = start_rip >> 4;
Dprintk("2.\n");
*((volatile unsigned short *) phys_to_virt(0x467)) = start_rip & 0xf;
Dprintk("3.\n");

the STARTUP IPI should work well with MPS v1.4

YH

2006-01-07 12:48:09

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

yhlu <[email protected]> writes:

> andi,
>
> In LinuxBIOS, we don't set the MPS 0x467 and the AP still can be started by BSP.
>
> are these really needed for x86_64?
>
> Dprintk("Setting warm reset code and vector.\n");
>
> CMOS_WRITE(0xa, 0xf);
> local_flush_tlb();
> Dprintk("1.\n");
> *((volatile unsigned short *) phys_to_virt(0x469)) = start_rip >> 4;
> Dprintk("2.\n");
> *((volatile unsigned short *) phys_to_virt(0x467)) = start_rip & 0xf;
> Dprintk("3.\n");
>
> the STARTUP IPI should work well with MPS v1.4

There are very large x86 machines that a reset is sent to the remote
cpu and thus 0x40:0x67 and 0x40:0x67 becomes relevant.

YH you didn't do something foolish and put a linuxbios table below
0x500 did you?

Eric

2006-01-07 19:36:33

by Yinghai Lu

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

MPTABLE in LinuxBIOS is put from 0x20, if the system has too many cpu
and devices (slots) the mptable will get bigger than 0x464, so it
will use 0x40:67....

We need to put mptable to [0xf0000:0x100000] together with acpi tables.

and if it is bigger than 64k, then we have to put it on special
postion ...from 1K, and pass the posstion of mptable to the kernel via
command line.

I will update the code in LinuxBIOS.

Thanks

YH

2006-01-07 19:45:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

yhlu <[email protected]> writes:

> MPTABLE in LinuxBIOS is put from 0x20, if the system has too many cpu
> and devices (slots) the mptable will get bigger than 0x464, so it
> will use 0x40:67....

Then you or someone moved it. The base in low memory was originally
at 0x500, to avoid just these kinds of problems.

> We need to put mptable to [0xf0000:0x100000] together with acpi
> tables.

Or move it up a few bytes.

> and if it is bigger than 64k, then we have to put it on special
> postion ...from 1K, and pass the posstion of mptable to the kernel via
> command line.
>
> I will update the code in LinuxBIOS.

Thanks. It is always a good idea not to assign legacy regions of the
address space new meanings.

Eric

2006-01-07 21:35:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: Inclusion of x86_64 memorize ioapic at bootup patch

good, I will let it start from 0x500, and linuxbios table after it.
that is most easy, and don't need to use command line to pass the
info, because it the signature is before 1K.

YH