2003-09-09 20:34:51

by Mikael Pettersson

[permalink] [raw]
Subject: Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops

On Mon, 08 Sep 2003 19:22:17 -0400, Mathieu Desnoyers wrote:
>> >On kernel 2.4.21-pre2, there is a kernel oops before this, with a
>> >"Dereferencing NULL pointer".
>>
>> You didn't run that through ksymoops and post it, so how is anyone
>> supposed to be able to debug it?
>
>As only 2.4.21-pre2 and 2.4.21-pre3 kernels show this problem, I thought
>it has been corrected in 2.4.21-pre4. But, as it can be very useful in
>finding the problem, here are the ksymoops for 2.4.21-pre2 and
>2.4.21-pre3 kernels, quite similar though.
...
>Code; c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>
>00000000 <_EIP>:
>Code; c0115da7 <IO_APIC_get_PCI_irq_vector+17/130> <=====
> 0: 83 3c 90 ff cmpl $0xffffffff,(%eax,%edx,4) <=====

Ok, that one is line 295 in io_apic.c. It bombs in 2.4.21-pre{2,3}
because mp_bus_id_to_pci_bus was changed from a static array to
a dynamically allocated array. On your machine, smp_read_mpc() in
mpparse.c doesn't get to the point where it allocates that array,
so the array is NULL in io_apic.c and you get an oops.

Fixing the oops is easy (see below), but the real problem is
that 2.4.21-pre2 apparently broke MP table parsing on your HW.
I suggest you sprinkle tracing printk()s in setup/smpboot/mpparse
and compare 2.4.20 (good) and later (bad) to see where things
start to diverge.

/Mikael

--- linux-2.4.21-pre2/arch/i386/kernel/io_apic.c.~1~ 2003-09-09 21:27:39.000000000 +0200
+++ linux-2.4.21-pre2/arch/i386/kernel/io_apic.c 2003-09-09 22:17:02.464082064 +0200
@@ -292,7 +292,7 @@

Dprintk("querying PCI -> IRQ mapping bus:%d, slot:%d, pin:%d.\n",
bus, slot, pin);
- if (mp_bus_id_to_pci_bus[bus] == -1) {
+ if ((mp_bus_id_to_pci_bus==NULL) || mp_bus_id_to_pci_bus[bus] == -1) {
printk(KERN_WARNING "PCI BIOS passed nonexistent PCI bus %d!\n", bus);
return -1;
}


2003-09-10 10:26:24

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops

On Tue, 9 Sep 2003, Mikael Pettersson wrote:

> Ok, that one is line 295 in io_apic.c. It bombs in 2.4.21-pre{2,3}
> because mp_bus_id_to_pci_bus was changed from a static array to
> a dynamically allocated array. On your machine, smp_read_mpc() in
> mpparse.c doesn't get to the point where it allocates that array,
> so the array is NULL in io_apic.c and you get an oops.

As I have already written, the system uses a default MP configuration.
smp_read_mpc() isn't called at all. construct_default_ISA_mptable() is
used instead.

> Fixing the oops is easy (see below), but the real problem is
> that 2.4.21-pre2 apparently broke MP table parsing on your HW.
> I suggest you sprinkle tracing printk()s in setup/smpboot/mpparse
> and compare 2.4.20 (good) and later (bad) to see where things
> start to diverge.

There is no need to -- the problem is already known. Mikael, if you need
additional details on how default MP configurations work in our code, feel
free to ask. Unfortunately, I won't likely be able to do any coding
and/or testing in this area before October.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2003-09-10 16:19:21

by Mikael Pettersson

[permalink] [raw]
Subject: Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops

Maciej W. Rozycki writes:
> > Fixing the oops is easy (see below), but the real problem is
> > that 2.4.21-pre2 apparently broke MP table parsing on your HW.
> > I suggest you sprinkle tracing printk()s in setup/smpboot/mpparse
> > and compare 2.4.20 (good) and later (bad) to see where things
> > start to diverge.
>
> There is no need to -- the problem is already known. Mikael, if you need
> additional details on how default MP configurations work in our code, feel

I think I nailed it.

First I found one very strange thing in Mathieu's boot log:

--- mpbug-2.4.20 Wed Sep 10 17:19:05 2003
+++ mpbug-2.4.23-pre3 Wed Sep 10 17:18:44 2003
...
+DMI not present.
Intel MultiProcessor Specification v1.1
Virtual Wire compatibility mode.
Default MP configuration #6

This means construct_default_ISA_mptable() still gets called.
Ok so far.

...
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.

smp_found_config is true, we're now in setup_IO_APIC()
and have completed setup_ioapic_ids_from_mpc(). Ok so far.

-init IO_APIC IRQs
-IO-APIC (apicid-pin) 2-0 not connected.

THIS IS BAD. setup_IO_APIC() calls setup_IO_APIC_IRQs(),
which starts by printk()ing the first line above.
This line is missing from the 2.4.23-pre3 dmesg log, which
seems like an impossibility.

At this point I was thinking "memory corruption",
and the following struck me:

What used to be arrays (mp_irqs[] etc) are now pointers to
memory which is sized and allocated by smp_read_mpc().
In the case when construct_default_ISA_mptable() is called,
smp_read_mpc() is _not_ called, the pointers never get initialised,
and reads and writes of these arrays end up in la-la land.

The fix would be to add allocation and initialisation of
these pointers at the start of construct_default_ISA_mptable().

I'll prepare a patch doing this sometime tomorrow.

/Mikael

2003-09-10 16:59:51

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops

On Wed, 10 Sep 2003, Mikael Pettersson wrote:

> First I found one very strange thing in Mathieu's boot log:
>
> --- mpbug-2.4.20 Wed Sep 10 17:19:05 2003
> +++ mpbug-2.4.23-pre3 Wed Sep 10 17:18:44 2003
> ...
> +DMI not present.
> Intel MultiProcessor Specification v1.1
> Virtual Wire compatibility mode.
> Default MP configuration #6
>
> This means construct_default_ISA_mptable() still gets called.
> Ok so far.

Yep -- I've been aware of this.

> At this point I was thinking "memory corruption",
> and the following struck me:
>
> What used to be arrays (mp_irqs[] etc) are now pointers to
> memory which is sized and allocated by smp_read_mpc().
> In the case when construct_default_ISA_mptable() is called,
> smp_read_mpc() is _not_ called, the pointers never get initialised,
> and reads and writes of these arrays end up in la-la land.

Exactly.

> The fix would be to add allocation and initialisation of
> these pointers at the start of construct_default_ISA_mptable().

Possibly -- I haven't thought on how to fix it yet.

> I'll prepare a patch doing this sometime tomorrow.

Thanks a lot for taking care.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +