2010-08-02 10:51:26

by Subrata Modak

[permalink] [raw]
Subject: 2.6.35-stable/ppc64/p7: Badness at lib/dma-debug.c:902, Call Trace & Instruction Dump during boot

Hi,

On boot, Badness at lib/dma-debug.c:902, Call Trace & Instruction Dump
are recorded at /var/log/messages:

================================================================
udev: starting version 151
ses 0:8:0:0: Attached Enclosure device
IBM eHEA ethernet device driver (Release EHEA_0105)
mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
mlx4_core: Initializing 0002:01:00.0
mlx4_core 0002:01:00.0: enabling device (0140 -> 0142)
ehea: eth0: Jumbo frames are enabled
ehea: eth0 -> logical port id #1
ehea: eth1: Jumbo frames are enabled
ehea: eth1 -> logical port id #2
mlx4_core 0002:01:00.0: Requested 17 vectors, but only 8 MSI-X vectors
available, trying again
mlx4_core 0002:01:00.0: DMA-API: device driver tries to sync DMA memory it has
not allocated [device address=0x0000000060f22000] [size=4096 bytes]
------------[ cut here ]------------
Badness at lib/dma-debug.c:902
NIP: c0000000003fdfa0 LR: c0000000003fdf9c CTR: 0000000000000001
REGS: c000000f35bfad00 TRAP: 0700 Not tainted (2.6.35)
MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 48002482 XER: 20000010
TASK = c000000f35c08000[502] 'modprobe' THREAD: c000000f35bf8000 CPU: 8
GPR00: c0000000003fdf9c c000000f35bfaf80 c000000001464c48 0000000000000096
GPR04: 0000000000000001 c0000000000c19b0 0000000000000000 0000000000000002
GPR08: 0000000000000000 c000000f35c08000 00000000000043b3 0000000000000001
GPR12: 7669636520616464 c000000003fa5800 0000000000000000 0000000010016628
GPR16: c000000f32a02000 c000000f32402000 c000000f33604648 c000000f35bfb1c0
GPR20: c000000f38002120 0000000060f22000 0000000000000200 c000000001f4b000
GPR24: 0000000000000001 0000000000000001 c000000001f47900 c000000f35bfb0b0
GPR28: 0000000000000000 c000000f38002120 c0000000013eaea8 c000000f35bfaf80
NIP [c0000000003fdfa0] .check_sync+0x108/0x52c
LR [c0000000003fdf9c] .check_sync+0x104/0x52c
Call Trace:
[c000000f35bfaf80] [c0000000003fdf9c] .check_sync+0x104/0x52c (unreliable)
[c000000f35bfb040] [c0000000003fe8d4] .debug_dma_sync_single_for_cpu+0x58/0x70
[c000000f35bfb150] [d0000000082bdbdc] .mlx4_write_mtt+0x16c/0x290 [mlx4_core]
[c000000f35bfb250] [d0000000082b58c8] .mlx4_create_eq+0x368/0x5a4 [mlx4_core]
[c000000f35bfb350] [d0000000082b5d08] .mlx4_init_eq_table+0x204/0x52c
[mlx4_core]
[c000000f35bfb410] [d0000000082bb5d0] .mlx4_setup_hca+0x1e8/0x6b0 [mlx4_core]
[c000000f35bfb4f0] [d0000000082bc438] .__mlx4_init_one+0x7e8/0xab8 [mlx4_core]
[c000000f35bfb5b0] [d0000000082c2758] .mlx4_init_one+0x70/0x2708 [mlx4_core]
[c000000f35bfb650] [c00000000040afd0] .local_pci_probe+0x4c/0x68
[c000000f35bfb6e0] [c00000000040cb1c] .pci_device_probe+0xfc/0x148
[c000000f35bfb7a0] [c00000000051db68] .driver_probe_device+0x1a8/0x32c
[c000000f35bfb840] [c00000000051dda0] .__driver_attach+0xb4/0xfc
[c000000f35bfb8e0] [c00000000051cbc8] .bus_for_each_dev+0x98/0x108
[c000000f35bfb9a0] [c00000000051d720] .driver_attach+0x40/0x60
[c000000f35bfba30] [c00000000051c048] .bus_add_driver+0x190/0x380
[c000000f35bfbaf0] [c00000000051e310] .driver_register+0xe8/0x1ac
[c000000f35bfbba0] [c00000000040ce6c] .__pci_register_driver+0x88/0x138
[c000000f35bfbc40] [d0000000082c2618] .mlx4_init+0x88/0x100 [mlx4_core]
[c000000f35bfbcd0] [c00000000000a0f4] .do_one_initcall+0xb0/0x208
[c000000f35bfbd80] [c0000000001187e8] .SyS_init_module+0xe8/0x25c
[c000000f35bfbe30] [c000000000008df0] syscall_exit+0x0/0x40
Instruction dump:
48118b09 60000000 e8bd0050 7c641b78 2fa50000 409e0008 e8bd0010 e87e8078
e8fb0030 e8db0028 483ca985 60000000 <0fe00000> 480003b4 e93b0030 e81c0030
================================================================

Kindly note that a similar existing issue
(https://bugzilla.redhat.com/show_bug.cgi?id=579454),
Badness at lib/dma-debug.c:820
was reported here:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-April/081541.html
and a fix for the same is also available at:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-April/081545.html
and made into the kernel through:
commit a71fa1fc43a29133f13ae6ada1a389ca298c0934

However this pertains to:
Badness at lib/dma-debug.c:902

Regards--
Subrata


Attachments:
config-2.6.35-ppc64-p7 (102.11 kB)

2010-08-02 11:55:36

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: 2.6.35-stable/ppc64/p7: Badness at lib/dma-debug.c:902, Call Trace & Instruction Dump during boot

CC'ed to dma-debug maintainer.

On Mon, 02 Aug 2010 16:21:09 +0530
Subrata Modak <[email protected]> wrote:

> Hi,
>
> On boot, Badness at lib/dma-debug.c:902, Call Trace & Instruction Dump
> are recorded at /var/log/messages:
>
> ================================================================
> udev: starting version 151
> ses 0:8:0:0: Attached Enclosure device
> IBM eHEA ethernet device driver (Release EHEA_0105)
> mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
> mlx4_core: Initializing 0002:01:00.0
> mlx4_core 0002:01:00.0: enabling device (0140 -> 0142)
> ehea: eth0: Jumbo frames are enabled
> ehea: eth0 -> logical port id #1
> ehea: eth1: Jumbo frames are enabled
> ehea: eth1 -> logical port id #2
> mlx4_core 0002:01:00.0: Requested 17 vectors, but only 8 MSI-X vectors
> available, trying again
> mlx4_core 0002:01:00.0: DMA-API: device driver tries to sync DMA memory it has
> not allocated [device address=0x0000000060f22000] [size=4096 bytes]
> ------------[ cut here ]------------
> Badness at lib/dma-debug.c:902
> NIP: c0000000003fdfa0 LR: c0000000003fdf9c CTR: 0000000000000001
> REGS: c000000f35bfad00 TRAP: 0700 Not tainted (2.6.35)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 48002482 XER: 20000010
> TASK = c000000f35c08000[502] 'modprobe' THREAD: c000000f35bf8000 CPU: 8
> GPR00: c0000000003fdf9c c000000f35bfaf80 c000000001464c48 0000000000000096
> GPR04: 0000000000000001 c0000000000c19b0 0000000000000000 0000000000000002
> GPR08: 0000000000000000 c000000f35c08000 00000000000043b3 0000000000000001
> GPR12: 7669636520616464 c000000003fa5800 0000000000000000 0000000010016628
> GPR16: c000000f32a02000 c000000f32402000 c000000f33604648 c000000f35bfb1c0
> GPR20: c000000f38002120 0000000060f22000 0000000000000200 c000000001f4b000
> GPR24: 0000000000000001 0000000000000001 c000000001f47900 c000000f35bfb0b0
> GPR28: 0000000000000000 c000000f38002120 c0000000013eaea8 c000000f35bfaf80
> NIP [c0000000003fdfa0] .check_sync+0x108/0x52c
> LR [c0000000003fdf9c] .check_sync+0x104/0x52c
> Call Trace:
> [c000000f35bfaf80] [c0000000003fdf9c] .check_sync+0x104/0x52c (unreliable)
> [c000000f35bfb040] [c0000000003fe8d4] .debug_dma_sync_single_for_cpu+0x58/0x70

I guess that this driver does a partial sync with
dma_sync_single_for_* API. dma-debug can't handle it properly. It's
likely that this is a false warning.

2010-08-04 13:13:41

by Joerg Roedel

[permalink] [raw]
Subject: Re: 2.6.35-stable/ppc64/p7: Badness at lib/dma-debug.c:902, Call Trace & Instruction Dump during boot

On Mon, Aug 02, 2010 at 07:55:03AM -0400, FUJITA Tomonori wrote:
> I guess that this driver does a partial sync with
> dma_sync_single_for_* API. dma-debug can't handle it properly. It's
> likely that this is a false warning.

If this turns out to be true it is not trivial to fix. I prepare a patch
to test for you.

Joerg

--
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

2010-08-04 14:19:09

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: 2.6.35-stable/ppc64/p7: Badness at lib/dma-debug.c:902, Call Trace & Instruction Dump during boot

On Wed, 4 Aug 2010 15:16:34 +0200
"Roedel, Joerg" <[email protected]> wrote:

> On Mon, Aug 02, 2010 at 07:55:03AM -0400, FUJITA Tomonori wrote:
> > I guess that this driver does a partial sync with
> > dma_sync_single_for_* API. dma-debug can't handle it properly. It's
> > likely that this is a false warning.
>
> If this turns out to be true it is not trivial to fix. I prepare a patch
> to test for you.

I've not looked at the details of this driver, but there are drivers
that do such. So dma-debug needs to be fixed anyway; you can't assume
that a DMA address that dma_map_single returned is passed to
dma_sync_single_for API.