2006-10-02 20:10:50

by Martin Bligh

[permalink] [raw]
Subject: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

Panics on boot.

http://test.kernel.org/abat/50728/debug/console.log

Unable to handle kernel NULL pointer dereference at 0000000000000500 RIP:
[<ffffffff803fa9af>] mptspi_dv_renegotiate_work+0x10/0x4a
PGD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 14, comm: events/0 Not tainted 2.6.18-mm2-autokern1 #1
RIP: 0010:[<ffffffff803fa9af>] [<ffffffff803fa9af>]
mptspi_dv_renegotiate_work+0x10/0x4a
RSP: 0000:ffff8101000e1e20 EFLAGS: 00010286
RAX: 0000000000000001 RBX: ffff810001fea8c0 RCX: 000000000000001f
RDX: 0000000000000000 RSI: ffff810001fea8c0 RDI: 0000000000001fea
RBP: ffff8101000e1e30 R08: ffff8101000e0000 R09: 0000000000000011
R10: ffff810001014820 R11: ffff810001014820 R12: 0000000000000500
R13: ffff810001ef1640 R14: 0000000000000202 R15: ffff810001fea8c0
FS: 0000000000000000(0000) GS:ffffffff80582000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000500 CR3: 0000000000201000 CR4: 00000000000006e0
Process events/0 (pid: 14, threadinfo ffff8101000e0000, task
ffff8100816b1040)
Stack: ffff810001fea8c0 ffff810001fea8c8 ffff8101000e1e70 ffffffff802387e3
ffffffff803fa99f ffff810001ef1640 ffff810001f0dd40 ffffffff80238827
00000000fffffffc ffffffff804b0298 ffff8101000e1f00 ffffffff8023891a
Call Trace:
[<ffffffff802387e3>] run_workqueue+0xa2/0xe6
[<ffffffff803fa99f>] mptspi_dv_renegotiate_work+0x0/0x4a
[<ffffffff80238827>] worker_thread+0x0/0x126
[<ffffffff8023891a>] worker_thread+0xf3/0x126
[<ffffffff80224498>] default_wake_function+0x0/0xf
[<ffffffff80224498>] default_wake_function+0x0/0xf
[<ffffffff80238827>] worker_thread+0x0/0x126
[<ffffffff8023b984>] kthread+0xd0/0xfc
[<ffffffff8020a658>] child_rip+0xa/0x12
[<ffffffff8023b8b4>] kthread+0x0/0xfc


2006-10-02 20:39:55

by Andrew Morton

[permalink] [raw]
Subject: Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

On Mon, 02 Oct 2006 13:10:26 -0700
Martin Bligh <[email protected]> wrote:

> Panics on boot.
>
> http://test.kernel.org/abat/50728/debug/console.log
>
> Unable to handle kernel NULL pointer dereference at 0000000000000500 RIP:
> [<ffffffff803fa9af>] mptspi_dv_renegotiate_work+0x10/0x4a
> PGD 0
> Oops: 0000 [1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
> Pid: 14, comm: events/0 Not tainted 2.6.18-mm2-autokern1 #1
> RIP: 0010:[<ffffffff803fa9af>] [<ffffffff803fa9af>]
> mptspi_dv_renegotiate_work+0x10/0x4a
> RSP: 0000:ffff8101000e1e20 EFLAGS: 00010286
> RAX: 0000000000000001 RBX: ffff810001fea8c0 RCX: 000000000000001f
> RDX: 0000000000000000 RSI: ffff810001fea8c0 RDI: 0000000000001fea
> RBP: ffff8101000e1e30 R08: ffff8101000e0000 R09: 0000000000000011
> R10: ffff810001014820 R11: ffff810001014820 R12: 0000000000000500
> R13: ffff810001ef1640 R14: 0000000000000202 R15: ffff810001fea8c0
> FS: 0000000000000000(0000) GS:ffffffff80582000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000500 CR3: 0000000000201000 CR4: 00000000000006e0
> Process events/0 (pid: 14, threadinfo ffff8101000e0000, task
> ffff8100816b1040)
> Stack: ffff810001fea8c0 ffff810001fea8c8 ffff8101000e1e70 ffffffff802387e3
> ffffffff803fa99f ffff810001ef1640 ffff810001f0dd40 ffffffff80238827
> 00000000fffffffc ffffffff804b0298 ffff8101000e1f00 ffffffff8023891a
> Call Trace:
> [<ffffffff802387e3>] run_workqueue+0xa2/0xe6
> [<ffffffff803fa99f>] mptspi_dv_renegotiate_work+0x0/0x4a
> [<ffffffff80238827>] worker_thread+0x0/0x126
> [<ffffffff8023891a>] worker_thread+0xf3/0x126
> [<ffffffff80224498>] default_wake_function+0x0/0xf
> [<ffffffff80224498>] default_wake_function+0x0/0xf
> [<ffffffff80238827>] worker_thread+0x0/0x126
> [<ffffffff8023b984>] kthread+0xd0/0xfc
> [<ffffffff8020a658>] child_rip+0xa/0x12
> [<ffffffff8023b8b4>] kthread+0x0/0xfc

Yeah, Bryce@osdl is hitting this. Apparently it can be worked around
by compiling the driver as a module.

2006-10-02 23:21:23

by Eric Moore

[permalink] [raw]
Subject: RE: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

On Monday, October 02, 2006 2:40 PM, Andrew Morton wrote:

>
> Yeah, Bryce@osdl is hitting this. Apparently it can be worked around
> by compiling the driver as a module.
>

What I saw in Bryces trace was the driver was not receiving interrupts
for
the first command sent after interrutps were enabled. This was a config
page
for spi port pages. Since this command timed out, an internal timeout
handler was called,
and we issued an internal host reset. The host reset called each
driver,
such as mptspi, mptfc, mptsas, callback handers. That ended with
as pacin in mptspi, due to we assume ioc->hd to be a valid pointer.
We don't allocate ioc->hd to well after mpt_attach, which is where the
config
page that timed out. We could prevent the panic in mptspi, but that
doesn't fix the problem why we are not getting interrupts.

I have a 2.6.18 gold kernel, and that works fine with modules.
There are no changes in mpt stack since 2.6.18 that would effect
interrupts.
Do you know of any changes in kernel effecting interrupts? I suspect
that
modules versus linked drivers into kernel would matter, or would it?

I've been busy with SAS issues today, and not had time to replicat this.

Eric

2006-10-02 23:37:46

by Andrew Morton

[permalink] [raw]
Subject: Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

On Mon, 2 Oct 2006 17:21:08 -0600
"Moore, Eric" <[email protected]> wrote:

> On Monday, October 02, 2006 2:40 PM, Andrew Morton wrote:
>
> >
> > Yeah, Bryce@osdl is hitting this. Apparently it can be worked around
> > by compiling the driver as a module.
> >
>
> What I saw in Bryces trace was the driver was not receiving interrupts
> for
> the first command sent after interrutps were enabled. This was a config
> page
> for spi port pages. Since this command timed out, an internal timeout
> handler was called,
> and we issued an internal host reset. The host reset called each
> driver,
> such as mptspi, mptfc, mptsas, callback handers. That ended with
> as pacin in mptspi, due to we assume ioc->hd to be a valid pointer.
> We don't allocate ioc->hd to well after mpt_attach, which is where the
> config
> page that timed out. We could prevent the panic in mptspi, but that
> doesn't fix the problem why we are not getting interrupts.
>
> I have a 2.6.18 gold kernel, and that works fine with modules.
> There are no changes in mpt stack since 2.6.18 that would effect
> interrupts.
> Do you know of any changes in kernel effecting interrupts? I suspect
> that
> modules versus linked drivers into kernel would matter, or would it?

There are lots and lots of interrupt changes, some now in mainline, some
not.

There's a known-problematic PCI resource allocation bug now in mainline
too. It appears that this can cause devices to not get assigned an
interrupt.

So yes, this is probably the trigger. But as a secondary thing, it appears
that the driver will crash if something goes wrong with the interrupt
setup?

2006-10-03 00:32:25

by Jeff Garzik

[permalink] [raw]
Subject: Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

Andrew Morton wrote:
> On Mon, 2 Oct 2006 17:21:08 -0600
> "Moore, Eric" <[email protected]> wrote:
>
>> On Monday, October 02, 2006 2:40 PM, Andrew Morton wrote:
>>
>>> Yeah, Bryce@osdl is hitting this. Apparently it can be worked around
>>> by compiling the driver as a module.
>>>
>> What I saw in Bryces trace was the driver was not receiving interrupts
>> for
>> the first command sent after interrutps were enabled. This was a config
>> page
>> for spi port pages. Since this command timed out, an internal timeout
>> handler was called,
>> and we issued an internal host reset. The host reset called each
>> driver,
>> such as mptspi, mptfc, mptsas, callback handers. That ended with
>> as pacin in mptspi, due to we assume ioc->hd to be a valid pointer.
>> We don't allocate ioc->hd to well after mpt_attach, which is where the
>> config
>> page that timed out. We could prevent the panic in mptspi, but that
>> doesn't fix the problem why we are not getting interrupts.
>>
>> I have a 2.6.18 gold kernel, and that works fine with modules.
>> There are no changes in mpt stack since 2.6.18 that would effect
>> interrupts.
>> Do you know of any changes in kernel effecting interrupts? I suspect
>> that
>> modules versus linked drivers into kernel would matter, or would it?
>
> There are lots and lots of interrupt changes, some now in mainline, some
> not.
>
> There's a known-problematic PCI resource allocation bug now in mainline
> too. It appears that this can cause devices to not get assigned an
> interrupt.
>
> So yes, this is probably the trigger. But as a secondary thing, it appears
> that the driver will crash if something goes wrong with the interrupt
> setup?

FWIW, I am seeing precisely this problem, in the latest -git.

Jeff



2006-10-03 00:42:03

by Andrew Morton

[permalink] [raw]
Subject: Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

On Mon, 02 Oct 2006 20:32:13 -0400
Jeff Garzik <[email protected]> wrote:

> FWIW, I am seeing precisely this problem, in the latest -git.

I just sent this to Linus. Fingers crossed, it'll fix...

From: Andrew Morton <[email protected]>

54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various people's machines
to fail to map PCI resources.

Revert it in preparation for addressing the show-APICs-in-/proc/iomem
requirement in a different manner.

Cc: Aaron Durbin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86_64/kernel/apic.c | 54 ------------------------------------
1 file changed, 54 deletions(-)

diff -puN arch/x86_64/kernel/apic.c~revert-insert-ioapics-and-local-apic-into-resource-map arch/x86_64/kernel/apic.c
--- a/arch/x86_64/kernel/apic.c~revert-insert-ioapics-and-local-apic-into-resource-map
+++ a/arch/x86_64/kernel/apic.c
@@ -25,7 +25,6 @@
#include <linux/kernel_stat.h>
#include <linux/sysdev.h>
#include <linux/module.h>
-#include <linux/ioport.h>

#include <asm/atomic.h>
#include <asm/smp.h>
@@ -46,11 +45,6 @@ int apic_calibrate_pmtmr __initdata;

int disable_apic_timer __initdata;

-static struct resource lapic_resource = {
- .name = "Local APIC",
- .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
-};
-
/*
* cpu_mask that denotes the CPUs that needs timer interrupt coming in as
* IPIs in place of local APIC timers
@@ -591,40 +585,6 @@ static int __init detect_init_APIC (void
return 0;
}

-#ifdef CONFIG_X86_IO_APIC
-static struct resource * __init ioapic_setup_resources(void)
-{
-#define IOAPIC_RESOURCE_NAME_SIZE 11
- unsigned long n;
- struct resource *res;
- char *mem;
- int i;
-
- if (nr_ioapics <= 0)
- return NULL;
-
- n = IOAPIC_RESOURCE_NAME_SIZE + sizeof(struct resource);
- n *= nr_ioapics;
-
- res = alloc_bootmem(n);
-
- if (!res)
- return NULL;
-
- memset(res, 0, n);
- mem = (void *)&res[nr_ioapics];
-
- for (i = 0; i < nr_ioapics; i++) {
- res[i].name = mem;
- res[i].flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- snprintf(mem, IOAPIC_RESOURCE_NAME_SIZE, "IOAPIC %u", i);
- mem += IOAPIC_RESOURCE_NAME_SIZE;
- }
-
- return res;
-}
-#endif
-
void __init init_apic_mappings(void)
{
unsigned long apic_phys;
@@ -644,11 +604,6 @@ void __init init_apic_mappings(void)
apic_mapped = 1;
apic_printk(APIC_VERBOSE,"mapped APIC to %16lx (%16lx)\n", APIC_BASE, apic_phys);

- /* Put local APIC into the resource map. */
- lapic_resource.start = apic_phys;
- lapic_resource.end = lapic_resource.start + PAGE_SIZE - 1;
- insert_resource(&iomem_resource, &lapic_resource);
-
/*
* Fetch the APIC ID of the BSP in case we have a
* default configuration (or the MP table is broken).
@@ -658,9 +613,7 @@ void __init init_apic_mappings(void)
{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
int i;
- struct resource *ioapic_res;

- ioapic_res = ioapic_setup_resources();
for (i = 0; i < nr_ioapics; i++) {
if (smp_found_config) {
ioapic_phys = mp_ioapics[i].mpc_apicaddr;
@@ -672,13 +625,6 @@ void __init init_apic_mappings(void)
apic_printk(APIC_VERBOSE,"mapped IOAPIC to %016lx (%016lx)\n",
__fix_to_virt(idx), ioapic_phys);
idx++;
-
- if (ioapic_res) {
- ioapic_res->start = ioapic_phys;
- ioapic_res->end = ioapic_phys + (4 * 1024) - 1;
- insert_resource(&iomem_resource, ioapic_res);
- ioapic_res++;
- }
}
}
}
_

2006-10-03 00:51:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

Andrew Morton wrote:
> On Mon, 02 Oct 2006 20:32:13 -0400
> Jeff Garzik <[email protected]> wrote:
>
>> FWIW, I am seeing precisely this problem, in the latest -git.
>
> I just sent this to Linus. Fingers crossed, it'll fix...
>
> From: Andrew Morton <[email protected]>
>
> 54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various people's machines
> to fail to map PCI resources.
>
> Revert it in preparation for addressing the show-APICs-in-/proc/iomem
> requirement in a different manner.
>
> Cc: Aaron Durbin <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>

I'll give it a good test. My sata_mv (requires PCI domains) also died
with a bunch of timeouts. Lack of interrupts, or lack of PCI resources,
is definitely indicative of a cause.

Jeff



2006-10-03 01:35:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: Panic from mptspi_dv_renegotiate_work in 2.6.18-mm2

Andrew Morton wrote:
> On Mon, 02 Oct 2006 20:32:13 -0400
> Jeff Garzik <[email protected]> wrote:
>
>> FWIW, I am seeing precisely this problem, in the latest -git.
>
> I just sent this to Linus. Fingers crossed, it'll fix...
>
> From: Andrew Morton <[email protected]>
>
> 54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various people's machines
> to fail to map PCI resources.
>
> Revert it in preparation for addressing the show-APICs-in-/proc/iomem
> requirement in a different manner.
>
> Cc: Aaron Durbin <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>

ACK, this fixes sata_mv timeouts and mptsas oopsen here.

FWIW, both sata_mv and mptsas are only accessible on this machine after
applying my PCI domains patchset.

Jeff