2009-10-02 09:46:22

by Alexander Beregalov

[permalink] [raw]
Subject: 2.6.32-rc2: Intel(R) I/OAT DMA Engine init failed

Hi
The kerneil is 2.6.32-rc2-00196-g0efe5e3.
Should I bisect?

ioatdma: Intel(R) QuickData Technology Driver 4.00
ioatdma 0000:00:08.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ioatdma 0000:00:08.0: setting latency timer to 64
alloc irq_desc for 33 on node -1
alloc kstat_irqs on node -1
ioatdma 0000:00:08.0: irq 33 for MSI/MSI-X
ioatdma 0000:00:08.0: Self-test copy timed out, disabling
ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine init failed
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:254 debug_print_object+0x64/0x90()
Hardware name: PowerEdge 1950
ODEBUG: free active object type: timer_list
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.32-rc2-00196-g0efe5e3 #1
Call Trace:
[<c10395b8>] warn_slowpath_common+0x78/0xc0
[<c12475b4>] ? debug_print_object+0x64/0x90
[<c12475b4>] ? debug_print_object+0x64/0x90
[<c1039671>] warn_slowpath_fmt+0x31/0x50
[<c12475b4>] debug_print_object+0x64/0x90
[<c1247b34>] debug_check_no_obj_freed+0x104/0x1b0
[<c1069511>] ? debug_check_no_locks_freed+0xc1/0x140
[<c10af922>] kfree+0xf2/0x140
[<c107b913>] ? free_irq+0x43/0x90
[<c12cd443>] ? release_nodes+0x143/0x1c0
[<c12cd443>] release_nodes+0x143/0x1c0
[<c12cd5df>] devres_release_all+0x3f/0x70
[<c12ca4f7>] driver_probe_device+0xe7/0x290
[<c140d8e7>] ? _spin_unlock_irqrestore+0x47/0x90
[<c12ca739>] __driver_attach+0x99/0xa0
[<c12ca6a0>] ? __driver_attach+0x0/0xa0
[<c12c9748>] bus_for_each_dev+0x58/0x90
[<c12ca234>] driver_attach+0x24/0x40
[<c12ca6a0>] ? __driver_attach+0x0/0xa0
[<c12c9be7>] bus_add_driver+0x1e7/0x2b0
[<c125b7e0>] ? pci_device_remove+0x0/0x60
[<c12caab0>] driver_register+0x70/0x140
[<c1246ebf>] ? __spin_lock_init+0x3f/0x80
[<c125bb7d>] __pci_register_driver+0x5d/0xe0
[<c15e7997>] ioat_init_module+0x6d/0x96
[<c1001068>] do_one_initcall+0x38/0x1b0
[<c15e792a>] ? ioat_init_module+0x0/0x96
[<c15c2497>] kernel_init+0x140/0x1a8
[<c15c2357>] ? kernel_init+0x0/0x1a8
[<c1003edb>] kernel_thread_helper+0x7/0x7c
---[ end trace 42ae668ed52462d6 ]---
ioatdma 0000:00:08.0: PCI INT A disabled


2009-10-05 22:01:44

by Dan Williams

[permalink] [raw]
Subject: Re: 2.6.32-rc2: Intel(R) I/OAT DMA Engine init failed

On Fri, 2009-10-02 at 02:46 -0700, Alexander Beregalov wrote:
> Hi
> The kerneil is 2.6.32-rc2-00196-g0efe5e3.
> Should I bisect?
>
> ioatdma: Intel(R) QuickData Technology Driver 4.00
> ioatdma 0000:00:08.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> ioatdma 0000:00:08.0: setting latency timer to 64
> alloc irq_desc for 33 on node -1
> alloc kstat_irqs on node -1
> ioatdma 0000:00:08.0: irq 33 for MSI/MSI-X
> ioatdma 0000:00:08.0: Self-test copy timed out, disabling
> ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine init failed

I may ask for a bisect later, but for now let's try and see why it is
failing. It is something platform specific as 2.6.32-rc3 loads fine on
my v1.2 platform. Please reproduce this with the patch below.

Thanks,
Dan

diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index c524d36..8f807b4 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -24,6 +24,7 @@
* This driver supports an Intel I/OAT DMA engine, which does asynchronous
* copy operations.
*/
+#define DEBUG

#include <linux/init.h>
#include <linux/module.h>
@@ -815,6 +816,7 @@ int __devinit ioat_dma_self_test(struct ioatdma_device *device)
struct completion cmp;
unsigned long tmo;
unsigned long flags;
+ enum dma_status stat;

src = kzalloc(sizeof(u8) * IOAT_TEST_SIZE, GFP_KERNEL);
if (!src)
@@ -864,10 +866,10 @@ int __devinit ioat_dma_self_test(struct ioatdma_device *device)

tmo = wait_for_completion_timeout(&cmp, msecs_to_jiffies(3000));

- if (tmo == 0 ||
- dma->device_is_tx_complete(dma_chan, cookie, NULL, NULL)
- != DMA_SUCCESS) {
- dev_err(dev, "Self-test copy timed out, disabling\n");
+ stat = dma->device_is_tx_complete(dma_chan, cookie, NULL, NULL);
+ if (tmo == 0 || stat != DMA_SUCCESS) {
+ dev_err(dev, "Self-test copy timed out, disabling %ld:%d\n",
+ tmo, stat);
err = -ENODEV;
goto free_resources;
}

2009-10-06 10:54:42

by Alexander Beregalov

[permalink] [raw]
Subject: Re: 2.6.32-rc2: Intel(R) I/OAT DMA Engine init failed

2009/10/6 Dan Williams <[email protected]>:
> On Fri, 2009-10-02 at 02:46 -0700, Alexander Beregalov wrote:
>> Hi
>> The kerneil is 2.6.32-rc2-00196-g0efe5e3.
>> Should I bisect?
>>
>> ioatdma: Intel(R) QuickData Technology Driver 4.00
>> ioatdma 0000:00:08.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>> ioatdma 0000:00:08.0: setting latency timer to 64
>>   alloc irq_desc for 33 on node -1
>>   alloc kstat_irqs on node -1
>> ioatdma 0000:00:08.0: irq 33 for MSI/MSI-X
>> ioatdma 0000:00:08.0: Self-test copy timed out, disabling
>> ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine init failed
>
> I may ask for a bisect later, but for now let's try and see why it is
> failing.  It is something platform specific as 2.6.32-rc3 loads fine on
> my v1.2 platform.  Please reproduce this with the patch below.

I can't reproduce it with you patch
(do not see "Self-test copy timed out" line):


ioatdma: Intel(R) QuickData Technology Driver 4.00
ioatdma 0000:00:08.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ioatdma 0000:00:08.0: setting latency timer to 64
ioatdma 0000:00:08.0: ioat1_enumerate_channels: xfercap = 4096
ioatdma 0000:00:08.0: irq 33 for MSI/MSI-X
ioatdma 0000:00:08.0: ioat1_dma_get_next_descriptor: allocated: 0
ioatdma 0000:00:08.0: desc[0]: (0x36b6b000->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat1_dma_alloc_chan_resources: allocated 256 descriptors
ioatdma 0000:00:08.0: ioat1_dma_get_next_descriptor: allocated: 1
ioatdma 0000:00:08.0: desc[1]: (0x36b6b040->0x0) cookie: 0 flags: 0x31
ctl: 0x9 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat1_tx_submit: cookie: 1
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x36b6b000
ioatdma 0000:00:08.0: desc[0]: (0x36b6b000->0x36b6b040) cookie: 0
flags: 0x2 ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: desc[1]: (0x36b6b040->0x0) cookie: 1 flags: 0x33
ctl: 0x9 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: __ioat1_dma_memcpy_issue_pending: pending: 1
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x36b6b040
ioatdma 0000:00:08.0: ioat1_timer_event: state: 1
ioatdma 0000:00:08.0: __cleanup: phys_complete: 36b6b040
ioatdma 0000:00:08.0: desc[0]: (0x36b6b000->0x36b6b040) cookie: 0
flags: 0x2 ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: desc[1]: (0x36b6b040->0x0) cookie: 1 flags: 0x33
ctl: 0x9 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: __cleanup cancel completion timeout
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x36b6b040
ioatdma 0000:00:08.0: ioat1_dma_free_chan_resources: freeing 1 from used list
ioatdma 0000:00:08.0: desc[1]: (0x36b6b040->0x0) cookie: 0 flags: 0x33
ctl: 0x9 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: DCA is disabled in BIOS
ioatdma 0000:00:08.0: ioat1_dma_get_next_descriptor: allocated: 0
ioatdma 0000:00:08.0: desc[0]: (0x36b770c0->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x36b770c0
ioatdma 0000:00:08.0: __cleanup: phys_complete: 36b770c0
ioatdma 0000:00:08.0: desc[0]: (0x36b770c0->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: __cleanup cancel completion timeout
ioatdma 0000:00:08.0: ioat1_timer_event: state: 0
ioatdma 0000:00:08.0: ioat1_dma_alloc_chan_resources: allocated 256 descriptors
ioatdma 0000:00:08.0: ioat1_dma_get_next_descriptor: allocated: 0
ioatdma 0000:00:08.0: desc[0]: (0x36b6bec0->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x36b6bec0
ioatdma 0000:00:08.0: ioat1_dma_alloc_chan_resources: allocated 256 descriptors
ioatdma 0000:00:08.0: ioat1_dma_get_next_descriptor: allocated: 0
ioatdma 0000:00:08.0: desc[0]: (0x3609b200->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat1_dma_alloc_chan_resources: allocated 256 descriptors
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x3609b200
ioatdma 0000:00:08.0: __cleanup: phys_complete: 3609b200
ioatdma 0000:00:08.0: desc[0]: (0x3609b200->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: __cleanup cancel completion timeout
ioatdma 0000:00:08.0: ioat1_dma_get_next_descriptor: allocated: 0
ioatdma 0000:00:08.0: desc[0]: (0x360d3300->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat1_dma_alloc_chan_resources: allocated 256 descriptors
ioatdma 0000:00:08.0: ioat_get_current_completion: phys_complete: 0x360d3300
ioatdma 0000:00:08.0: __cleanup: phys_complete: 360d3300
ioatdma 0000:00:08.0: desc[0]: (0x360d3300->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: __cleanup cancel completion timeout
ioatdma 0000:00:08.0: ioat1_timer_event: state: 0
ioatdma 0000:00:08.0: ioat1_timer_event: state: 0
ioatdma 0000:00:08.0: __cleanup: phys_complete: 36b6bec0
ioatdma 0000:00:08.0: desc[0]: (0x36b6bec0->0x0) cookie: 0 flags: 0x2
ctl: 0x29 (op: 0 int_en: 1 compl: 1)
ioatdma 0000:00:08.0: ioat1_timer_event: state: 0
ioatdma 0000:00:08.0: __cleanup cancel completion timeout

but when I commented DEBUG definition, it failed again:

ioatdma: Intel(R) QuickData Technology Driver 4.00
ioatdma 0000:00:08.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ioatdma 0000:00:08.0: setting latency timer to 64
alloc irq_desc for 33 on node -1
alloc kstat_irqs on node -1
ioatdma 0000:00:08.0: irq 33 for MSI/MSI-X
ioatdma 0000:00:08.0: Self-test copy timed out, disabling 0:0
ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine init failed
ioatdma 0000:00:08.0: PCI INT A disabled

2009-10-13 01:49:12

by Dan Williams

[permalink] [raw]
Subject: Re: 2.6.32-rc2: Intel(R) I/OAT DMA Engine init failed

On Tue, Oct 6, 2009 at 3:54 AM, Alexander Beregalov
<[email protected]> wrote:
> but when I commented DEBUG definition, it failed again:
>
> ioatdma: Intel(R) QuickData Technology Driver 4.00
> ioatdma 0000:00:08.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> ioatdma 0000:00:08.0: setting latency timer to 64
> ?alloc irq_desc for 33 on node -1
> ?alloc kstat_irqs on node -1
> ioatdma 0000:00:08.0: irq 33 for MSI/MSI-X
> ioatdma 0000:00:08.0: Self-test copy timed out, disabling 0:0
> ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine init failed
> ioatdma 0000:00:08.0: PCI INT A disabled

Odd, it looks like neither a dma channel interrupt nor the timer fired
while this operation was in flight. However, when you turn on debug
we see timer events firing as expected (ioat1_timer_event). The
operation completes successfully ( dma->device_is_tx_complete returns
DMA_SUCCESS), but its not clear what is causing us to violate the
timeout. Before bisecting can you try commit 1f6672d4? This is the
state of the aysnc_tx.git tree before it was merged for 2.6.32. If
that fails then a full bisect is the only recourse I can see to get
closer to root cause.

Thanks,
Dan