[ 604.800358] ib_mthca: Initializing 0000:c4:00.0
[ 604.803593] ib_mthca 0000:c4:00.0: PCI INT A -> GSI 98 (level, low) -> IRQ 98
[ 604.820587] ib_mthca 0000:c4:00.0: setting latency timer to 64
[ 604.824369] ib_mthca 0000:c4:00.0: using 64bit DMA mask
[ 604.840003] ib_mthca 0000:c4:00.0: using 64bit consistent DMA mask
[ 604.845961] reserve_memtype added 0xfbef0000-0xfbef1000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 604.863794] free_memtype request 0xfbef0000-0xfbef1000
[ 605.882922] reserve_memtype added 0xfbe80000-0xfbe81000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.179925] reserve_memtype added 0xfa800000-0xfa801000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.222053] ib_mthca 0000:c4:00.0: HCA FW version 4.6.002 is old
(4.8.200 is current).
[ 606.229839] ib_mthca 0000:c4:00.0: If you have problems, try
updating your HCA FW.
[ 606.239838] reserve_memtype added 0xfbe82000-0xfbe83000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.259063] alloc irq_desc for 232 on node 0
[ 606.263133] alloc kstat_irqs on node 0
[ 606.277766] ib_mthca 0000:c4:00.0: irq 232 for MSI/MSI-X
[ 606.282177] alloc irq_desc for 233 on node 0
[ 606.286160] alloc kstat_irqs on node 0
[ 606.299309] ib_mthca 0000:c4:00.0: irq 233 for MSI/MSI-X
[ 606.303756] alloc irq_desc for 234 on node 0
[ 606.319225] alloc kstat_irqs on node 0
[ 606.322687] ib_mthca 0000:c4:00.0: irq 234 for MSI/MSI-X
[ 606.337147] reserve_memtype added 0xfa802000-0xfa803000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.357274] reserve_memtype added 0xe5800000-0xe6000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.366082] reserve_memtype added 0xe0000000-0xe4000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.389823] reserve_memtype added 0xfbef0000-0xfbef1000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.401964] Overlap at 0xfbe80000-0xfbe81000
[ 606.418913] reserve_memtype added 0xfbe80000-0xfbe81000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.538429] ib_mthca 0000:c4:00.0: NOP command failed to generate
interrupt (IRQ 234).
[ 606.543348] ib_mthca 0000:c4:00.0: Trying again with MSI-X disabled.
[ 606.560852] free_memtype request 0xfbe80000-0xfbe81000
[ 606.564779] free_memtype request 0xfbef0000-0xfbef1000
[ 606.578039] free_memtype request 0xe0000000-0xe4000000
[ 606.581770] free_memtype request 0xe5800000-0xe6000000
[ 606.597795] free_memtype request 0xfa802000-0xfa803000
[ 606.600688] free_memtype request 0xfbe82000-0xfbe83000
[ 606.616891] reserve_memtype added 0xfa802000-0xfa803000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.639208] reserve_memtype added 0xe5800000-0xe6000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.647364] reserve_memtype added 0xe0000000-0xe4000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.671960] reserve_memtype added 0xfbef0000-0xfbef1000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.686785] Overlap at 0xfbe80000-0xfbe81000
[ 606.696978] reserve_memtype added 0xfbe80000-0xfbe81000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 606.818317] ib_mthca 0000:c4:00.0: NOP command failed to generate
interrupt (IRQ 98), aborting.
[ 606.826709] ib_mthca 0000:c4:00.0: BIOS or ACPI interrupt routing problem?
[ 606.839353] ------------[ cut here ]------------
[ 606.841560] WARNING: at kernel/irq/manage.c:740 __free_irq+0xad/0x17f()
[ 606.859197] Hardware name: System Name
[ 606.861049] Trying to free already-free IRQ 234
[ 606.877055] Modules linked in:
[ 606.881444] Pid: 10719, comm: work_for_cpu Not tainted
2.6.30-tip-03093-gff58544-dirty #380
[ 606.897083] Call Trace:
[ 606.900366] [<ffffffff810cb58a>] ? __free_irq+0xad/0x17f
[ 606.904024] [<ffffffff8107d612>] warn_slowpath_common+0x8d/0xd0
[ 606.918689] [<ffffffff8107d6e2>] warn_slowpath_fmt+0x50/0x66
[ 606.924606] [<ffffffff810cb55d>] ? __free_irq+0x80/0x17f
[ 606.939724] [<ffffffff810cb58a>] __free_irq+0xad/0x17f
[ 606.943368] [<ffffffff810ac63d>] ? trace_hardirqs_on+0x20/0x36
[ 606.960565] [<ffffffff810cb67d>] free_irq+0x21/0x3f
[ 606.962793] [<ffffffff8190acbe>] mthca_free_irqs+0x5a/0xad
[ 606.979683] [<ffffffff8190af71>] mthca_cleanup_eq_table+0x2b/0xda
[ 606.983649] [<ffffffff81904308>] mthca_setup_hca+0x4f7/0x558
[ 606.998302] [<ffffffff81905909>] __mthca_init_one+0x5f6/0x708
[ 607.003948] [<ffffffff81b4dd66>] mthca_init_one+0x94/0xbf
[ 607.019447] [<ffffffff8109422b>] ? do_work_for_cpu+0x0/0x50
[ 607.023666] [<ffffffff814621ab>] local_pci_probe+0x2a/0x42
[ 607.039153] [<ffffffff81094252>] do_work_for_cpu+0x27/0x50
[ 607.041672] [<ffffffff8109422b>] ? do_work_for_cpu+0x0/0x50
[ 607.059565] [<ffffffff8109958d>] kthread+0x6e/0xb1
[ 607.063474] [<ffffffff81035fda>] child_rip+0xa/0x20
[ 607.079096] [<ffffffff8103597c>] ? restore_args+0x0/0x30
[ 607.083027] [<ffffffff8109951f>] ? kthread+0x0/0xb1
[ 607.098768] [<ffffffff81035fd0>] ? child_rip+0x0/0x20
[ 607.101323] ---[ end trace e6652b295472e0d7 ]---
[ 607.117066] ------------[ cut here ]------------
[ 607.119830] WARNING: at kernel/irq/manage.c:740 __free_irq+0xad/0x17f()
[ 607.137557] Hardware name: Libra
[ 607.140825] Trying to free already-free IRQ 233
[ 607.145850] Modules linked in:
[ 607.157651] Pid: 10719, comm: work_for_cpu Tainted: G W
2.6.30-tip-03093-gff58544-dirty #380
[ 607.176018] Call Trace:
[ 607.178982] [<ffffffff810cb58a>] ? __free_irq+0xad/0x17f
[ 607.182636] [<ffffffff8107d612>] warn_slowpath_common+0x8d/0xd0
[ 607.198688] [<ffffffff8107d6e2>] warn_slowpath_fmt+0x50/0x66
[ 607.203487] [<ffffffff810cb55d>] ? __free_irq+0x80/0x17f
[ 607.219523] [<ffffffff810cb58a>] __free_irq+0xad/0x17f
[ 607.222882] [<ffffffff810ac63d>] ? trace_hardirqs_on+0x20/0x36
[ 607.238103] [<ffffffff810cb67d>] free_irq+0x21/0x3f
[ 607.241173] [<ffffffff8190acda>] mthca_free_irqs+0x76/0xad
[ 607.257516] [<ffffffff8190af71>] mthca_cleanup_eq_table+0x2b/0xda
[ 607.264306] [<ffffffff81904308>] mthca_setup_hca+0x4f7/0x558
[ 607.277819] [<ffffffff81905909>] __mthca_init_one+0x5f6/0x708
[ 607.283174] [<ffffffff81b4dd66>] mthca_init_one+0x94/0xbf
[ 607.297829] [<ffffffff8109422b>] ? do_work_for_cpu+0x0/0x50
[ 607.301208] [<ffffffff814621ab>] local_pci_probe+0x2a/0x42
[ 607.318688] [<ffffffff81094252>] do_work_for_cpu+0x27/0x50
[ 607.322902] [<ffffffff8109422b>] ? do_work_for_cpu+0x0/0x50
[ 607.339817] [<ffffffff8109958d>] kthread+0x6e/0xb1
[ 607.344286] [<ffffffff81035fda>] child_rip+0xa/0x20
[ 607.357248] [<ffffffff8103597c>] ? restore_args+0x0/0x30
[ 607.360923] [<ffffffff8109951f>] ? kthread+0x0/0xb1
[ 607.376587] [<ffffffff81035fd0>] ? child_rip+0x0/0x20
[ 607.383884] ---[ end trace e6652b295472e0d8 ]---
[ 607.396025] ------------[ cut here ]------------
[ 607.400016] WARNING: at kernel/irq/manage.c:740 __free_irq+0xad/0x17f()
[ 607.416804] Hardware name: Libra
[ 607.419503] Trying to free already-free IRQ 232
[ 607.424234] Modules linked in:
[ 607.436788] Pid: 10719, comm: work_for_cpu Tainted: G W
2.6.30-tip-03093-gff58544-dirty #380
[ 607.442583] Call Trace:
[ 607.456233] [<ffffffff810cb58a>] ? __free_irq+0xad/0x17f
[ 607.459883] [<ffffffff8107d612>] warn_slowpath_common+0x8d/0xd0
[ 607.477933] [<ffffffff8107d6e2>] warn_slowpath_fmt+0x50/0x66
[ 607.481881] [<ffffffff810cb55d>] ? __free_irq+0x80/0x17f
[ 607.497082] [<ffffffff810cb58a>] __free_irq+0xad/0x17f
[ 607.500157] [<ffffffff810ac63d>] ? trace_hardirqs_on+0x20/0x36
[ 607.516507] [<ffffffff810cb67d>] free_irq+0x21/0x3f
[ 607.520978] [<ffffffff8190acf6>] mthca_free_irqs+0x92/0xad
[ 607.536204] [<ffffffff8190af71>] mthca_cleanup_eq_table+0x2b/0xda
[ 607.542430] [<ffffffff81904308>] mthca_setup_hca+0x4f7/0x558
[ 607.557645] [<ffffffff81905909>] __mthca_init_one+0x5f6/0x708
[ 607.561602] [<ffffffff81b4dd66>] mthca_init_one+0x94/0xbf
[ 607.578214] [<ffffffff8109422b>] ? do_work_for_cpu+0x0/0x50
[ 607.581882] [<ffffffff814621ab>] local_pci_probe+0x2a/0x42
[ 607.597365] [<ffffffff81094252>] do_work_for_cpu+0x27/0x50
[ 607.604409] [<ffffffff8109422b>] ? do_work_for_cpu+0x0/0x50
[ 607.617720] [<ffffffff8109958d>] kthread+0x6e/0xb1
[ 607.620780] [<ffffffff81035fda>] child_rip+0xa/0x20
[ 607.637011] [<ffffffff8103597c>] ? restore_args+0x0/0x30
[ 607.642633] [<ffffffff8109951f>] ? kthread+0x0/0xb1
[ 607.656116] [<ffffffff81035fd0>] ? child_rip+0x0/0x20
[ 607.659478] ---[ end trace e6652b295472e0d9 ]---
[ 607.662779] free_memtype request 0xfbe80000-0xfbe81000
[ 607.678284] free_memtype request 0xfbef0000-0xfbef1000
[ 607.682522] free_memtype request 0xe0000000-0xe4000000
[ 607.697438] free_memtype request 0xe5800000-0xe6000000
[ 607.701085] free_memtype request 0xfa802000-0xfa803000
[ 607.718191] free_memtype request 0xfbe80000-0xfbe81000
[ 607.721542] free_memtype request 0xfa800000-0xfa801000
[ 607.737028] ib_mthca 0000:c4:00.0: PCI INT A disabled
[ 607.741860] ib_mthca: probe of 0000:c4:00.0 failed with error -16
Thanks, will take a look. It does seem this is on an error path:
> [ 606.818317] ib_mthca 0000:c4:00.0: NOP command failed to generate
> interrupt (IRQ 98), aborting.
> [ 606.826709] ib_mthca 0000:c4:00.0: BIOS or ACPI interrupt routing problem?
So your system already had problems -- the driver didn't get an
expected interrupt from the device.
- R.
By the way, can you try this with the ib_mthca module option
"msi_x=0" -- because I see
> [ 606.538429] ib_mthca 0000:c4:00.0: NOP command failed to generate
> interrupt (IRQ 234).
> [ 606.543348] ib_mthca 0000:c4:00.0: Trying again with MSI-X disabled.
But perhaps the fallback to non-MSI-X in the driver is messing things
up. Also you have:
> [ 606.222053] ib_mthca 0000:c4:00.0: HCA FW version 4.6.002 is old
> (4.8.200 is current).
So updating the HCA firmware likely wouldn't hurt.
- R.
On Fri, 12 Jun 2009 07:33:30 -0700
Roland Dreier <[email protected]> wrote:
> By the way, can you try this with the ib_mthca module option
> "msi_x=0" -- because I see
>
> > [ 606.538429] ib_mthca 0000:c4:00.0: NOP command failed to
> > generate interrupt (IRQ 234).
> > [ 606.543348] ib_mthca 0000:c4:00.0: Trying again with MSI-X
> > disabled.
>
> But perhaps the fallback to non-MSI-X in the driver is messing things
> up. Also you have:
My linux-next branch has a change to the return value from the MSI-X
allocation function: pci_enable_msix now returns the number of
available entries rather than -EINVAL if the allocation failed. Would
that cause problems in the mthca driver?
--
Jesse Barnes, Intel Open Source Technology Center
> My linux-next branch has a change to the return value from the MSI-X
> allocation function: pci_enable_msix now returns the number of
> available entries rather than -EINVAL if the allocation failed. Would
> that cause problems in the mthca driver?
I don't think so in this case ... the driver is getting past enabling
MSI-X and actually testing the interrupt before it fails. I'll audit
the code just to make sure we handle that case though. But I thought
pci_enable_msix() always returned the number of entries actually
available if the allocation failed?
- R.
Yeah, the code as it is should be OK:
err = pci_enable_msix(mdev->pdev, entries, ARRAY_SIZE(entries));
if (err) {
if (err > 0)
mthca_info(mdev, "Only %d MSI-X vectors available, "
"not using MSI-X\n", err);
return err;
}
- R.
On Fri, Jun 12, 2009 at 7:31 AM, Roland Dreier<[email protected]> wrote:
> Thanks, will take a look. ?It does seem this is on an error path:
>
> ?> [ ?606.818317] ib_mthca 0000:c4:00.0: NOP command failed to generate
> ?> interrupt (IRQ 98), aborting.
> ?> [ ?606.826709] ib_mthca 0000:c4:00.0: BIOS or ACPI interrupt routing problem?
>
> So your system already had problems -- the driver didn't get an
> expected interrupt from the device.
I knew that. but we should not get that double freeing.
BIOS doesn't allocate proper MMIO range for that card.
YH
Something like this should help -- can you verify this fixes the double
free of irq?
drivers/infiniband/hw/mthca/mthca_eq.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c
index 28f0e0c..90e4e45 100644
--- a/drivers/infiniband/hw/mthca/mthca_eq.c
+++ b/drivers/infiniband/hw/mthca/mthca_eq.c
@@ -641,9 +641,11 @@ static void mthca_free_irqs(struct mthca_dev *dev)
if (dev->eq_table.have_irq)
free_irq(dev->pdev->irq, dev);
for (i = 0; i < MTHCA_NUM_EQ; ++i)
- if (dev->eq_table.eq[i].have_irq)
+ if (dev->eq_table.eq[i].have_irq) {
free_irq(dev->eq_table.eq[i].msi_x_vector,
dev->eq_table.eq + i);
+ dev->eq_table.eq[i].have_irq = 0;
+ }
}
static int mthca_map_reg(struct mthca_dev *dev,
On Fri, Jun 12, 2009 at 9:30 PM, Roland Dreier<[email protected]> wrote:
> Something like this should help -- can you verify this fixes the double
> free of irq?
>
> ?drivers/infiniband/hw/mthca/mthca_eq.c | ? ?4 +++-
> ?1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c
> index 28f0e0c..90e4e45 100644
> --- a/drivers/infiniband/hw/mthca/mthca_eq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_eq.c
> @@ -641,9 +641,11 @@ static void mthca_free_irqs(struct mthca_dev *dev)
> ? ? ? ?if (dev->eq_table.have_irq)
> ? ? ? ? ? ? ? ?free_irq(dev->pdev->irq, dev);
> ? ? ? ?for (i = 0; i < MTHCA_NUM_EQ; ++i)
> - ? ? ? ? ? ? ? if (dev->eq_table.eq[i].have_irq)
> + ? ? ? ? ? ? ? if (dev->eq_table.eq[i].have_irq) {
> ? ? ? ? ? ? ? ? ? ? ? ?free_irq(dev->eq_table.eq[i].msi_x_vector,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dev->eq_table.eq + i);
> + ? ? ? ? ? ? ? ? ? ? ? dev->eq_table.eq[i].have_irq = 0;
> + ? ? ? ? ? ? ? }
> ?}
>
> ?static int mthca_map_reg(struct mthca_dev *dev,
>
thanks, that fixes the problem.
got
[ 688.153102] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
[ 688.164128] ib_mthca: Initializing 0000:c4:00.0
[ 688.170696] ib_mthca 0000:c4:00.0: PCI INT A -> GSI 98 (level, low) -> IRQ 98
[ 688.184033] ib_mthca 0000:c4:00.0: setting latency timer to 64
[ 688.188594] ib_mthca 0000:c4:00.0: using 64bit DMA mask
[ 688.208718] ib_mthca 0000:c4:00.0: using 64bit consistent DMA mask
[ 688.217191] reserve_memtype added 0xfbef0000-0xfbef1000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 688.232259] free_memtype request 0xfbef0000-0xfbef1000
[ 689.243254] reserve_memtype added 0xfbe80000-0xfbe81000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.537336] reserve_memtype added 0xfa800000-0xfa801000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.587286] ib_mthca 0000:c4:00.0: HCA FW version 4.7.400 is old
(4.8.200 is current).
[ 689.592067] ib_mthca 0000:c4:00.0: If you have problems, try
updating your HCA FW.
[ 689.603264] reserve_memtype added 0xfbe82000-0xfbe83000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.622765] alloc irq_desc for 230 on node 0
[ 689.629521] alloc kstat_irqs on node 0
[ 689.641401] ib_mthca 0000:c4:00.0: irq 230 for MSI/MSI-X
[ 689.645155] alloc irq_desc for 231 on node 0
[ 689.647072] alloc kstat_irqs on node 0
[ 689.663348] ib_mthca 0000:c4:00.0: irq 231 for MSI/MSI-X
[ 689.666000] alloc irq_desc for 232 on node 0
[ 689.682655] alloc kstat_irqs on node 0
[ 689.685768] ib_mthca 0000:c4:00.0: irq 232 for MSI/MSI-X
[ 689.701511] reserve_memtype added 0xfa802000-0xfa803000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.721752] reserve_memtype added 0xe5800000-0xe6000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.731141] reserve_memtype added 0xe0000000-0xe4000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.753919] reserve_memtype added 0xfbef0000-0xfbef1000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.770375] Overlap at 0xfbe80000-0xfbe81000
[ 689.782317] reserve_memtype added 0xfbe80000-0xfbe81000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 689.902837] ib_mthca 0000:c4:00.0: NOP command failed to generate
interrupt (IRQ 232).
[ 689.907900] ib_mthca 0000:c4:00.0: Trying again with MSI-X disabled.
[ 689.923292] free_memtype request 0xfbe80000-0xfbe81000
[ 689.926430] free_memtype request 0xfbef0000-0xfbef1000
[ 689.942964] free_memtype request 0xe0000000-0xe4000000
[ 689.947059] free_memtype request 0xe5800000-0xe6000000
[ 689.962869] free_memtype request 0xfa802000-0xfa803000
[ 689.967293] free_memtype request 0xfbe82000-0xfbe83000
[ 689.980961] reserve_memtype added 0xfa802000-0xfa803000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 690.003659] reserve_memtype added 0xe5800000-0xe6000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 690.015144] reserve_memtype added 0xe0000000-0xe4000000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 690.033814] reserve_memtype added 0xfbef0000-0xfbef1000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 690.046945] Overlap at 0xfbe80000-0xfbe81000
[ 690.062803] reserve_memtype added 0xfbe80000-0xfbe81000, track
uncached-minus, req uncached-minus, ret uncached-minus
[ 690.182718] ib_mthca 0000:c4:00.0: NOP command failed to generate
interrupt (IRQ 98), aborting.
[ 690.188676] ib_mthca 0000:c4:00.0: BIOS or ACPI interrupt routing problem?
[ 690.203724] free_memtype request 0xfbe80000-0xfbe81000
[ 690.205768] free_memtype request 0xfbef0000-0xfbef1000
[ 690.224080] free_memtype request 0xe0000000-0xe4000000
[ 690.228558] free_memtype request 0xe5800000-0xe6000000
[ 690.241523] free_memtype request 0xfa802000-0xfa803000
[ 690.246196] free_memtype request 0xfbe80000-0xfbe81000
[ 690.263133] free_memtype request 0xfa800000-0xfa801000
[ 690.268628] ib_mthca 0000:c4:00.0: PCI INT A disabled
[ 690.281553] ib_mthca: probe of 0000:c4:00.0 failed with error -16
as expected.
YH
> thanks, that fixes the problem.
Thanks, will merge with your reported-by/tested-by included.
- R.