2006-10-08 07:03:34

by Martin Bligh

[permalink] [raw]
Subject: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

Not sure if you've seen this already ... catching up on test results.

This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
problem.

Full logs:

mm2 - http://test.kernel.org/abat/50727/debug/console.log
mm3 - http://test.kernel.org/abat/51442/debug/console.log

config - http://test.kernel.org/abat/51442/build/dotconfig

I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
is failing because bus->sysdata is NULL. The disassembly and
structure offsets seem to line up for that.

#define pcibus_to_node(bus) (
(struct pci_sysdata *)((bus)->sysdata))->node

struct pci_sysdata {
int domain; /* PCI domain */
int node; /* NUMA node */
};


BUG: unable to handle kernel NULL pointer dereference at virtual address
00000004
printing eip:
c02060d4
*pde = 0042c001
*pte = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 2
EIP: 0060:[<c02060d4>] Not tainted VLI
EFLAGS: 00010286 (2.6.18-mm2-autokern1 #1)
EIP is at pci_call_probe+0x19/0xb5
eax: 00000000 ebx: e778b400 ecx: e7400030 edx: c03b89a0
esi: e778b400 edi: 0000ffff ebp: e69a2fa0 esp: e740dea4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, ti=e740c000 task=e7400030 task.ti=e740c000)
Stack: ffffffed e778b400 c03b89a0 c02061a3 c03b89a0 e778b400 c03b873c
c03b89a0
e778b400 c03b89d4 c02061d6 c03b89a0 e778b400 e778b400 c03b89d4
e778b448
c0224800 e778b448 c03b89d4 e778b448 00000000 c03b89d4 c0224936
e69a2fa0
Call Trace:
[<c02061a3>] __pci_device_probe+0x33/0x47
[<c02061d6>] pci_device_probe+0x1f/0x34
[<c0224800>] really_probe+0x31/0xb9
[<c0224936>] driver_probe_device+0x93/0x9c
[<c02249b5>] __driver_attach+0x0/0x7c
[<c02249fc>] __driver_attach+0x47/0x7c
[<c0223e98>] bus_for_each_dev+0x47/0x6d
[<c01fd05a>] kobject_add+0xa9/0xf2
[<c0224a45>] driver_attach+0x14/0x18
[<c02249b5>] __driver_attach+0x0/0x7c
[<c022437b>] bus_add_driver+0x53/0xd0
[<c0224d99>] driver_register+0x74/0x77
[<c02063ea>] __pci_register_driver+0x6b/0x7a
[<c04146c3>] qla1280_init+0xc/0xf
[<c04007ff>] do_initcalls+0x55/0xe8
[<c0184095>] proc_mkdir+0x12/0x16
[<c0135136>] init_irq_proc+0x21/0x2f
[<c01003b8>] init+0x0/0x148
[<c010040d>] init+0x55/0x148
[<c01033c7>] kernel_thread_helper+0x7/0x10


2006-10-08 07:49:33

by Andrew Morton

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

On Sun, 08 Oct 2006 00:02:07 -0700
"Martin J. Bligh" <[email protected]> wrote:

> Not sure if you've seen this already ... catching up on test results.

Nope.

> This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
> problem.
>
> Full logs:
>
> mm2 - http://test.kernel.org/abat/50727/debug/console.log
> mm3 - http://test.kernel.org/abat/51442/debug/console.log
>
> config - http://test.kernel.org/abat/51442/build/dotconfig
>
> I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
> is failing because bus->sysdata is NULL. The disassembly and
> structure offsets seem to line up for that.
>
> #define pcibus_to_node(bus) (
> (struct pci_sysdata *)((bus)->sysdata))->node
>
> struct pci_sysdata {
> int domain; /* PCI domain */
> int node; /* NUMA node */
> };
>
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000004

I don't see anything in the mm1->mm2 additions which might have caused
this. Don't know, sorry. Bisection time?


> printing eip:
> c02060d4
> *pde = 0042c001
> *pte = 00000000
> Oops: 0000 [#1]
> SMP
> last sysfs file:
> Modules linked in:
> CPU: 2
> EIP: 0060:[<c02060d4>] Not tainted VLI
> EFLAGS: 00010286 (2.6.18-mm2-autokern1 #1)
> EIP is at pci_call_probe+0x19/0xb5
> eax: 00000000 ebx: e778b400 ecx: e7400030 edx: c03b89a0
> esi: e778b400 edi: 0000ffff ebp: e69a2fa0 esp: e740dea4
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 1, ti=e740c000 task=e7400030 task.ti=e740c000)
> Stack: ffffffed e778b400 c03b89a0 c02061a3 c03b89a0 e778b400 c03b873c
> c03b89a0
> e778b400 c03b89d4 c02061d6 c03b89a0 e778b400 e778b400 c03b89d4
> e778b448
> c0224800 e778b448 c03b89d4 e778b448 00000000 c03b89d4 c0224936
> e69a2fa0
> Call Trace:
> [<c02061a3>] __pci_device_probe+0x33/0x47
> [<c02061d6>] pci_device_probe+0x1f/0x34
> [<c0224800>] really_probe+0x31/0xb9
> [<c0224936>] driver_probe_device+0x93/0x9c
> [<c02249b5>] __driver_attach+0x0/0x7c
> [<c02249fc>] __driver_attach+0x47/0x7c
> [<c0223e98>] bus_for_each_dev+0x47/0x6d
> [<c01fd05a>] kobject_add+0xa9/0xf2
> [<c0224a45>] driver_attach+0x14/0x18
> [<c02249b5>] __driver_attach+0x0/0x7c
> [<c022437b>] bus_add_driver+0x53/0xd0
> [<c0224d99>] driver_register+0x74/0x77
> [<c02063ea>] __pci_register_driver+0x6b/0x7a
> [<c04146c3>] qla1280_init+0xc/0xf
> [<c04007ff>] do_initcalls+0x55/0xe8
> [<c0184095>] proc_mkdir+0x12/0x16
> [<c0135136>] init_irq_proc+0x21/0x2f
> [<c01003b8>] init+0x0/0x148
> [<c010040d>] init+0x55/0x148
> [<c01033c7>] kernel_thread_helper+0x7/0x10

2006-10-09 17:20:16

by Badari Pulavarty

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

On Sun, 2006-10-08 at 00:02 -0700, Martin J. Bligh wrote:
> Not sure if you've seen this already ... catching up on test results.
>
> This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
> problem.
>
> Full logs:
>
> mm2 - http://test.kernel.org/abat/50727/debug/console.log
> mm3 - http://test.kernel.org/abat/51442/debug/console.log
>
> config - http://test.kernel.org/abat/51442/build/dotconfig
>
> I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
> is failing because bus->sysdata is NULL. The disassembly and
> structure offsets seem to line up for that.
>
> #define pcibus_to_node(bus) (
> (struct pci_sysdata *)((bus)->sysdata))->node
>
> struct pci_sysdata {
> int domain; /* PCI domain */
> int node; /* NUMA node */
> };
>

Martin,

Jeff moved "node" to a proper field in sysdata, instead
of overloading sysdata itself. I think this is causing the
problem. I guess we could end up with sysdata = NULL in some
cases ? Since you are the NUMA-Q expert, where does sysdata
gets set for NUMA-Q ? :)

-mm2 changed:

#define pcibus_to_node(bus) ((long) (bus)->sysdata)

to

#define pcibus_to_node(bus) ((struct pci_sysdata *)((bus)->sysdata))-
>node


Thanks,
Badari


> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000004
> printing eip:
> c02060d4
> *pde = 0042c001
> *pte = 00000000
> Oops: 0000 [#1]
> SMP
> last sysfs file:
> Modules linked in:
> CPU: 2
> EIP: 0060:[<c02060d4>] Not tainted VLI
> EFLAGS: 00010286 (2.6.18-mm2-autokern1 #1)
> EIP is at pci_call_probe+0x19/0xb5
> eax: 00000000 ebx: e778b400 ecx: e7400030 edx: c03b89a0
> esi: e778b400 edi: 0000ffff ebp: e69a2fa0 esp: e740dea4
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 1, ti=e740c000 task=e7400030 task.ti=e740c000)
> Stack: ffffffed e778b400 c03b89a0 c02061a3 c03b89a0 e778b400 c03b873c
> c03b89a0
> e778b400 c03b89d4 c02061d6 c03b89a0 e778b400 e778b400 c03b89d4
> e778b448
> c0224800 e778b448 c03b89d4 e778b448 00000000 c03b89d4 c0224936
> e69a2fa0
> Call Trace:
> [<c02061a3>] __pci_device_probe+0x33/0x47
> [<c02061d6>] pci_device_probe+0x1f/0x34
> [<c0224800>] really_probe+0x31/0xb9
> [<c0224936>] driver_probe_device+0x93/0x9c
> [<c02249b5>] __driver_attach+0x0/0x7c
> [<c02249fc>] __driver_attach+0x47/0x7c
> [<c0223e98>] bus_for_each_dev+0x47/0x6d
> [<c01fd05a>] kobject_add+0xa9/0xf2
> [<c0224a45>] driver_attach+0x14/0x18
> [<c02249b5>] __driver_attach+0x0/0x7c
> [<c022437b>] bus_add_driver+0x53/0xd0
> [<c0224d99>] driver_register+0x74/0x77
> [<c02063ea>] __pci_register_driver+0x6b/0x7a
> [<c04146c3>] qla1280_init+0xc/0xf
> [<c04007ff>] do_initcalls+0x55/0xe8
> [<c0184095>] proc_mkdir+0x12/0x16
> [<c0135136>] init_irq_proc+0x21/0x2f
> [<c01003b8>] init+0x0/0x148
> [<c010040d>] init+0x55/0x148
> [<c01033c7>] kernel_thread_helper+0x7/0x10
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-10-09 17:24:43

by Martin Bligh

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

Badari Pulavarty wrote:
> On Sun, 2006-10-08 at 00:02 -0700, Martin J. Bligh wrote:
>
>>Not sure if you've seen this already ... catching up on test results.
>>
>>This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
>>problem.
>>
>>Full logs:
>>
>>mm2 - http://test.kernel.org/abat/50727/debug/console.log
>>mm3 - http://test.kernel.org/abat/51442/debug/console.log
>>
>>config - http://test.kernel.org/abat/51442/build/dotconfig
>>
>>I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
>>is failing because bus->sysdata is NULL. The disassembly and
>>structure offsets seem to line up for that.
>>
>>#define pcibus_to_node(bus) (
>> (struct pci_sysdata *)((bus)->sysdata))->node
>>
>>struct pci_sysdata {
>> int domain; /* PCI domain */
>> int node; /* NUMA node */
>>};
>>
>
>
> Martin,
>
> Jeff moved "node" to a proper field in sysdata, instead
> of overloading sysdata itself. I think this is causing the
> problem. I guess we could end up with sysdata = NULL in some
> cases ? Since you are the NUMA-Q expert, where does sysdata
> gets set for NUMA-Q ? :)
>
> -mm2 changed:
>
> #define pcibus_to_node(bus) ((long) (bus)->sysdata)
>
> to
>
> #define pcibus_to_node(bus) ((struct pci_sysdata *)((bus)->sysdata))-
>
>>node

Buggered if I know, that's some strange pci thing ;-)

But can we revert whatever patch that was until it gets fixed, please?

Thanks,

M.

2006-10-09 17:34:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

Martin Bligh wrote:
> Badari Pulavarty wrote:
>> On Sun, 2006-10-08 at 00:02 -0700, Martin J. Bligh wrote:
>>
>>> Not sure if you've seen this already ... catching up on test results.
>>>
>>> This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
>>> problem.
>>>
>>> Full logs:
>>>
>>> mm2 - http://test.kernel.org/abat/50727/debug/console.log
>>> mm3 - http://test.kernel.org/abat/51442/debug/console.log
>>>
>>> config - http://test.kernel.org/abat/51442/build/dotconfig
>>>
>>> I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
>>> is failing because bus->sysdata is NULL. The disassembly and
>>> structure offsets seem to line up for that.
>>>
>>> #define pcibus_to_node(bus) (
>>> (struct pci_sysdata *)((bus)->sysdata))->node
>>>
>>> struct pci_sysdata {
>>> int domain; /* PCI domain */
>>> int node; /* NUMA node */
>>> };
>>>
>>
>>
>> Martin,
>>
>> Jeff moved "node" to a proper field in sysdata, instead
>> of overloading sysdata itself. I think this is causing the
>> problem. I guess we could end up with sysdata = NULL in some
>> cases ? Since you are the NUMA-Q expert, where does sysdata gets set
>> for NUMA-Q ? :)
>>
>> -mm2 changed:
>>
>> #define pcibus_to_node(bus) ((long) (bus)->sysdata)
>>
>> to
>> #define pcibus_to_node(bus) ((struct pci_sysdata *)((bus)->sysdata))-
>>
>>> node
>
> Buggered if I know, that's some strange pci thing ;-)
>
> But can we revert whatever patch that was until it gets fixed, please?

It needs to get fixed, otherwise whose buses of PCI devices disappear on
some machines.

Can you turn on PCI debugging?

Jeff



2006-10-09 18:46:59

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

Jeff Garzik [[email protected]] wrote:
| Martin Bligh wrote:
| >Badari Pulavarty wrote:
| >>On Sun, 2006-10-08 at 00:02 -0700, Martin J. Bligh wrote:
| >>
| >>>Not sure if you've seen this already ... catching up on test results.
| >>>
| >>>This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
| >>>problem.
| >>>
| >>>Full logs:
| >>>
| >>>mm2 - http://test.kernel.org/abat/50727/debug/console.log
| >>>mm3 - http://test.kernel.org/abat/51442/debug/console.log
| >>>
| >>>config - http://test.kernel.org/abat/51442/build/dotconfig
| >>>
| >>>I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
| >>>is failing because bus->sysdata is NULL. The disassembly and
| >>>structure offsets seem to line up for that.
| >>>
| >>>#define pcibus_to_node(bus) (
| >>> (struct pci_sysdata *)((bus)->sysdata))->node
| >>>
| >>>struct pci_sysdata {
| >>> int domain; /* PCI domain */
| >>> int node; /* NUMA node */
| >>>};
| >>>
| >>
| >>
| >>Martin,
| >>
| >>Jeff moved "node" to a proper field in sysdata, instead
| >>of overloading sysdata itself. I think this is causing the
| >>problem. I guess we could end up with sysdata = NULL in some
| >>cases ? Since you are the NUMA-Q expert, where does sysdata gets set
| >>for NUMA-Q ? :)
| >>
| >>-mm2 changed:
| >>
| >>#define pcibus_to_node(bus) ((long) (bus)->sysdata)
| >>
| >>to
| >>#define pcibus_to_node(bus) ((struct pci_sysdata *)((bus)->sysdata))-
| >>
| >>>node
| >
| >Buggered if I know, that's some strange pci thing ;-)
| >
| >But can we revert whatever patch that was until it gets fixed, please?
|
| It needs to get fixed, otherwise whose buses of PCI devices disappear on
| some machines.
|
| Can you turn on PCI debugging?
|
| Jeff

I turned on CONFIG_PCI_DEBUG. already had CONFIG_KERNEL_DEBUG and
CONFIG_DEBUG_INFO. Booted with "debug" parameter.

Here are the last few messages. Complete dmesg attached.

Let me know if there are other config tokens or boot options that may
provide more info.

Suka
---

PCI: Calling quirk c024da00 for 0000:00:0a.0
PCI: Calling quirk c031ce30 for 0000:00:0a.0
PCI: Calling quirk c024da00 for 0000:00:0b.0
PCI: Calling quirk c031ce30 for 0000:00:0b.0
PCI: Calling quirk c024da00 for 0000:00:0e.0
PCI: Calling quirk c031ce30 for 0000:00:0e.0
PCI: Calling quirk c024da00 for 0000:00:0e.1
PCI: Calling quirk c031ce30 for 0000:00:0e.1
PCI: Calling quirk c024da00 for 0000:00:0e.2
PCI: Calling quirk c031ce30 for 0000:00:0e.2
PCI: Calling quirk c024da00 for 0000:00:0e.3
PCI: Calling quirk c031ce30 for 0000:00:0e.3
PCI: Calling quirk c024da00 for 0000:00:10.0
PCI: Calling quirk c031ce30 for 0000:00:10.0
PCI: Calling quirk c024da00 for 0000:00:12.0
PCI: Calling quirk c05093b0 for 0000:00:12.0
PCI: Calling quirk c031ce30 for 0000:00:12.0
PCI: Calling quirk c024da00 for 0000:00:14.0
PCI: Calling quirk c05093b0 for 0000:00:14.0
PCI: Calling quirk c031ce30 for 0000:00:14.0
PCI: Calling quirk c024da00 for 0000:01:0c.0
PCI: Calling quirk c031ce30 for 0000:01:0c.0
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
loop: loaded (max 8 devices)
Intel(R) PRO/1000 Network Driver - version 7.2.9-k2
Copyright (c) 1999-2006 Intel Corporation.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
c024e92c
*pde = 00529001
*pte = 00000000
Oops: 0000 [#1]
SMP

last sysfs file:
Modules linked in:
CPU: 0
EIP: 0060:[<c024e92c>] Not tainted VLI
EFLAGS: 00010286 (2.6.18-mm3 #3)
EIP is at pci_call_probe+0x1c/0xe0
eax: 00000000 ebx: dfe63c00 ecx: c10fca90 edx: c048ad60
esi: dfe63c00 edi: 0000000f ebp: dfc3fd00 esp: c10ffe70
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, ti=c10fe000 task=c10fca90 task.ti=c10fe000)
Stack: dfe63c00 ffffffed ffffffed dfe63c00 c048b1c0 c024ea55 c048b1c0 dfe63c00
c048ad60 c048b1c0 dfe63c00 c048b1f4 c024ea9f c048b1c0 dfe63c00 dfe63c48
dfc3fd00 c02761f9 dfe63c48 00000286 dfff9060 000000d0 c048b1f4 dfc3fd00
Call Trace:
[<c024ea55>] __pci_device_probe+0x65/0x80
[<c024ea9f>] pci_device_probe+0x2f/0x50
[<c02761f9>] really_probe+0xf9/0x100
[<c02762e8>] driver_probe_device+0xc8/0xe0
[<c03ae92d>] klist_next+0x5d/0xa0
[<c02763a0>] __driver_attach+0x0/0xa0
[<c0276430>] __driver_attach+0x90/0xa0
[<c0275409>] bus_for_each_dev+0x69/0x80
[<c0276465>] driver_attach+0x25/0x30
[<c02763a0>] __driver_attach+0x0/0xa0
[<c0275ad3>] bus_add_driver+0x73/0x140
[<c024edc4>] __pci_register_driver+0x74/0x90
[<c050c6f9>] tulip_init+0x29/0x30
[<c04f2a62>] do_initcalls+0x42/0x140
[<c01433eb>] register_irq_proc+0xab/0xd0
[<c01003f0>] init+0x0/0x1a0
[<c0143479>] init_irq_proc+0x39/0x50
[<c01003f0>] init+0x0/0x1a0
[<c0100451>] init+0x61/0x1a0
[<c0103c0b>] kernel_thread_helper+0x7/0x1c
=======================
Code: 74 92 eb 8e 8d 74 26 00 8d bc 27 00 00 00 00 57 56 53 83 ec 08 8b 5c 24 1c 89 e0 25 00 e0 ff ff 8b 08 8b 43 10 8b 79 5c 8b 40 44 <8b> 50 04 85 d2 78 11 0f a3 15 c0 9f 4e c0 19 c0 85 c0 0f 85 8c


Attachments:
(No filename) (5.02 kB)
dmesg-3.txt (13.48 kB)
dmesg-3.txt
Download all attachments

2006-10-20 17:08:37

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

Martin Bligh wrote:
> Badari Pulavarty wrote:
>> On Sun, 2006-10-08 at 00:02 -0700, Martin J. Bligh wrote:
>>
>>> Not sure if you've seen this already ... catching up on test results.
>>>
>>> This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
>>> problem.
>>>
>>> Full logs:
>>>
>>> mm2 - http://test.kernel.org/abat/50727/debug/console.log
>>> mm3 - http://test.kernel.org/abat/51442/debug/console.log
>>>
>>> config - http://test.kernel.org/abat/51442/build/dotconfig
>>>
>>> I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
>>> is failing because bus->sysdata is NULL. The disassembly and
>>> structure offsets seem to line up for that.
>>>
>>> #define pcibus_to_node(bus) (
>>> (struct pci_sysdata *)((bus)->sysdata))->node
>>>
>>> struct pci_sysdata {
>>> int domain; /* PCI domain */
>>> int node; /* NUMA node */
>>> };
>>>
>>
>>
>> Martin,
>>
>> Jeff moved "node" to a proper field in sysdata, instead
>> of overloading sysdata itself. I think this is causing the
>> problem. I guess we could end up with sysdata = NULL in some
>> cases ? Since you are the NUMA-Q expert, where does sysdata gets set
>> for NUMA-Q ? :)
>>
>> -mm2 changed:
>>
>> #define pcibus_to_node(bus) ((long) (bus)->sysdata)
>>
>> to
>> #define pcibus_to_node(bus) ((struct pci_sysdata *)((bus)->sysdata))-
>>
>>> node
>
> Buggered if I know, that's some strange pci thing ;-)
>
> But can we revert whatever patch that was until it gets fixed, please?

Unless I am going very very mad, this has came up once before some
months ago. We went through lots of pain finding the cause of this for
NUMA-Q and fixing it. Something about not having a sysdata and needing
to initialise it.

Thought so, this was all discussed back in December 2005.

http://lkml.org/lkml/2005/12/20/226

I'll go see if I can forward port the patch and address the remaining
issues with it.

-apw

2006-10-20 17:38:14

by Greg KH

[permalink] [raw]
Subject: Re: Panic in pci_call_probe from 2.6.18-mm2 and 2.6.18-mm3

On Fri, Oct 20, 2006 at 06:07:39PM +0100, Andy Whitcroft wrote:
> Martin Bligh wrote:
> > Badari Pulavarty wrote:
> >> On Sun, 2006-10-08 at 00:02 -0700, Martin J. Bligh wrote:
> >>
> >>> Not sure if you've seen this already ... catching up on test results.
> >>>
> >>> This was on NUMA-Q, on both -mm2 and -mm3. -mm1 didn't suffer from this
> >>> problem.
> >>>
> >>> Full logs:
> >>>
> >>> mm2 - http://test.kernel.org/abat/50727/debug/console.log
> >>> mm3 - http://test.kernel.org/abat/51442/debug/console.log
> >>>
> >>> config - http://test.kernel.org/abat/51442/build/dotconfig
> >>>
> >>> I'm guessing from the 00000004 that the pcibus_to_node(dev->bus)
> >>> is failing because bus->sysdata is NULL. The disassembly and
> >>> structure offsets seem to line up for that.
> >>>
> >>> #define pcibus_to_node(bus) (
> >>> (struct pci_sysdata *)((bus)->sysdata))->node
> >>>
> >>> struct pci_sysdata {
> >>> int domain; /* PCI domain */
> >>> int node; /* NUMA node */
> >>> };
> >>>
> >>
> >>
> >> Martin,
> >>
> >> Jeff moved "node" to a proper field in sysdata, instead
> >> of overloading sysdata itself. I think this is causing the
> >> problem. I guess we could end up with sysdata = NULL in some
> >> cases ? Since you are the NUMA-Q expert, where does sysdata gets set
> >> for NUMA-Q ? :)
> >>
> >> -mm2 changed:
> >>
> >> #define pcibus_to_node(bus) ((long) (bus)->sysdata)
> >>
> >> to
> >> #define pcibus_to_node(bus) ((struct pci_sysdata *)((bus)->sysdata))-
> >>
> >>> node
> >
> > Buggered if I know, that's some strange pci thing ;-)
> >
> > But can we revert whatever patch that was until it gets fixed, please?
>
> Unless I am going very very mad, this has came up once before some
> months ago. We went through lots of pain finding the cause of this for
> NUMA-Q and fixing it. Something about not having a sysdata and needing
> to initialise it.
>
> Thought so, this was all discussed back in December 2005.
>
> http://lkml.org/lkml/2005/12/20/226
>
> I'll go see if I can forward port the patch and address the remaining
> issues with it.

Yes, and I explicitly asked if this issue had been addressed again in
these patches. That is why I rejected them oh so long ago...

bleah.

greg k-h