2018-07-27 13:13:09

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

Hi,

On today's next, the bisect pointed commit
ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).

Author: Robin Murphy <[email protected]>
Date: Mon Jul 23 23:16:12 2018 +0100
OF: Don't set default coherent DMA mask

Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
with DMA) on Iris Carrier.

It looks like problem with Freescale Ethernet driver:
[ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
[ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
[ 15.472086] Root-NFS: no NFS server address
[ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
[ 15.484228] VFS: Cannot open root device "nfs" or
unknown-block(2,0): error -6
[ 15.491664] Please append a correct "root=" boot option; here are
the available partitions:
[ 15.500188] 0100 16384 ram0
[ 15.500200] (driver?)
[ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(2,0)
[ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on unknown-block(2,0) ]---

Attached - defconfig and full boot log.

Any hints?
Let me know if you need any more information.

Best regards,
Krzysztof


Attachments:
vf-boot-log.txt (28.89 kB)
defconfig (6.49 kB)
Download all attachments

2018-07-27 13:19:57

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
> Hi,
>
> On today's next, the bisect pointed commit
> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>
> Author: Robin Murphy <[email protected]>
> Date: Mon Jul 23 23:16:12 2018 +0100
> OF: Don't set default coherent DMA mask
>
> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
> with DMA) on Iris Carrier.
>
> It looks like problem with Freescale Ethernet driver:
> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
> [ 15.472086] Root-NFS: no NFS server address
> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
> [ 15.484228] VFS: Cannot open root device "nfs" or
> unknown-block(2,0): error -6
> [ 15.491664] Please append a correct "root=" boot option; here are
> the available partitions:
> [ 15.500188] 0100 16384 ram0
> [ 15.500200] (driver?)
> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(2,0)
> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
> mount root fs on unknown-block(2,0) ]---
>
> Attached - defconfig and full boot log.
>
> Any hints?
> Let me know if you need any more information.

My Exynos boards also fail to boot on missing network:
https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0

As expected there are plenty of "DMA mask not set" warnings... and
later dwc3 driver fails with:
dwc3: probe of 12400000.dwc3 failed with error -12
which is probably the answer why LAN attached to USB is not present.

Best regards,
Krzysztof

2018-07-27 14:02:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
> > Hi,
> >
> > On today's next, the bisect pointed commit
> > ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
> > with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
> >
> > Author: Robin Murphy <[email protected]>
> > Date: Mon Jul 23 23:16:12 2018 +0100
> > OF: Don't set default coherent DMA mask
> >
> > Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
> > with DMA) on Iris Carrier.
> >
> > It looks like problem with Freescale Ethernet driver:
> > [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
> > [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
> > [ 15.472086] Root-NFS: no NFS server address
> > [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
> > [ 15.484228] VFS: Cannot open root device "nfs" or
> > unknown-block(2,0): error -6
> > [ 15.491664] Please append a correct "root=" boot option; here are
> > the available partitions:
> > [ 15.500188] 0100 16384 ram0
> > [ 15.500200] (driver?)
> > [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
> > fs on unknown-block(2,0)
> > [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
> > mount root fs on unknown-block(2,0) ]---
> >
> > Attached - defconfig and full boot log.
> >
> > Any hints?
> > Let me know if you need any more information.
>
> My Exynos boards also fail to boot on missing network:
> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>
> As expected there are plenty of "DMA mask not set" warnings... and
> later dwc3 driver fails with:
> dwc3: probe of 12400000.dwc3 failed with error -12
> which is probably the answer why LAN attached to USB is not present.

Looks like all the drivers failed to set a dma mask and were lucky.

2018-07-28 16:59:28

by Guenter Roeck

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
> > On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
> > > Hi,
> > >
> > > On today's next, the bisect pointed commit
> > > ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
> > > with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
> > >
> > > Author: Robin Murphy <[email protected]>
> > > Date: Mon Jul 23 23:16:12 2018 +0100
> > > OF: Don't set default coherent DMA mask
> > >
> > > Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
> > > with DMA) on Iris Carrier.
> > >
> > > It looks like problem with Freescale Ethernet driver:
> > > [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
> > > [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
> > > [ 15.472086] Root-NFS: no NFS server address
> > > [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
> > > [ 15.484228] VFS: Cannot open root device "nfs" or
> > > unknown-block(2,0): error -6
> > > [ 15.491664] Please append a correct "root=" boot option; here are
> > > the available partitions:
> > > [ 15.500188] 0100 16384 ram0
> > > [ 15.500200] (driver?)
> > > [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
> > > fs on unknown-block(2,0)
> > > [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
> > > mount root fs on unknown-block(2,0) ]---
> > >
> > > Attached - defconfig and full boot log.
> > >
> > > Any hints?
> > > Let me know if you need any more information.
> >
> > My Exynos boards also fail to boot on missing network:
> > https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
> >
> > As expected there are plenty of "DMA mask not set" warnings... and
> > later dwc3 driver fails with:
> > dwc3: probe of 12400000.dwc3 failed with error -12
> > which is probably the answer why LAN attached to USB is not present.
>
> Looks like all the drivers failed to set a dma mask and were lucky.

I would call it a serious regression. Also, no longer setting a default
coherent DMA mask is a quite substantial behavioral change, especially
if and since the code worked just fine up to now.

Crash when booting sam460ex attached below, as is a bisect log.

Guenter

---
irq: type mismatch, failed to map hwirq-0 for interrupt-controller3!
WARNING: CPU: 0 PID: 1 at ppc4xx_msi_probe+0x2dc/0x3b8
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-rc6-00010-gff33d1030a6c #1
NIP: c001c460 LR: c001c29c CTR: 00000000
REGS: cf82db60 TRAP: 0700 Not tainted (4.18.0-rc6-00010-gff33d1030a6c)
MSR: 00029000 <CE,EE,ME> CR: 24002028 XER: 00000000

GPR00: c001c29c cf82dc10 cf828000 d1021000 d1021000 cf882108 cf82db78 00000000
GPR08: 00000000 c0377ae4 00000000 1000051b 24002028 00000000 c00025e8 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
GPR24: 00029000 0000000c 10000000 cf8de410 c0494d60 00029000 cf8bebc0 cf8de400
NIP [c001c460] ppc4xx_msi_probe+0x2dc/0x3b8
LR [c001c29c] ppc4xx_msi_probe+0x118/0x3b8
Call Trace:
[cf82dc10] [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 (unreliable)
[cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
[cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
[cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
[cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
[cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
[cf82dd40] [c02050c8] device_add+0x404/0x5c4
[cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
[cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
[cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
[cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
[cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
[cf82dea0] [c0002404] do_one_initcall+0x40/0x188
[cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
[cf82df30] [c0002600] kernel_init+0x18/0x104
[cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
Instruction dump:
3860000e 4bffa2a5 3860000f 7f44d378 4bffa299 4bfffe30 3860000e 4bffa28d
3860000f 7f24cb78 4bffa281 4bfffde4 <0fe00000> 81290000 2f890000 409efe6c
---[ end trace 8cf551077ecfc429 ]---
ppc4xx-msi c10000000.ppc4xx-msi: coherent DMA mask is unset
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc001bff0
Oops: Kernel access of bad area, sig: 11 [#1]
BE Canyonlands
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc6-00010-gff33d1030a6c #1
NIP: c001bff0 LR: c001c418 CTR: c01faa7c
REGS: cf82db40 TRAP: 0300 Tainted: G W (4.18.0-rc6-00010-gff33d1030a6c)
MSR: 00029000 <CE,EE,ME> CR: 28002024 XER: 00000000
DEAR: 00000000 ESR: 00000000
GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
Call Trace:
[cf82dbf0] [00029000] 0x29000 (unreliable)
[cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
[cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
[cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
[cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
[cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
[cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
[cf82dd40] [c02050c8] device_add+0x404/0x5c4
[cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
[cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
[cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
[cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
[cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
[cf82dea0] [c0002404] do_one_initcall+0x40/0x188
[cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
[cf82df30] [c0002600] kernel_init+0x18/0x104
[cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
Instruction dump:
90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d
---[ end trace 8cf551077ecfc42a ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

---
# bad: [639d109b21f1413c54ca7042e40a57856e7679bb] Add linux-next specific files for 20180727
# good: [d72e90f33aa4709ebecc5005562f52335e106a60] Linux 4.18-rc6
git bisect start 'HEAD' 'v4.18-rc6'
# bad: [7bc81125a936a25af28f2172b593bca390b0c539] Merge remote-tracking branch 'spi-nor/spi-nor/next'
git bisect bad 7bc81125a936a25af28f2172b593bca390b0c539
# bad: [659868e6488dbad1181ad21888521ff41ae45f65] Merge remote-tracking branch 'vfs/for-next'
git bisect bad 659868e6488dbad1181ad21888521ff41ae45f65
# bad: [453ff4bb24c3fa4af40995f2615ec22176e71500] Merge remote-tracking branch 'mvebu/for-next'
git bisect bad 453ff4bb24c3fa4af40995f2615ec22176e71500
# good: [ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90] Merge branch 'next/soc' into for-next
git bisect good ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90
# good: [fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8] Merge remote-tracking branch 'leaks/leaks-next'
git bisect good fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8
# good: [53b9c41f0d9c35e41ea884bae6ad4b6fadc59035] Merge branch 'next/drivers' into for-next
git bisect good 53b9c41f0d9c35e41ea884bae6ad4b6fadc59035
# bad: [cd67b2d4c0ca61f7e93e622dba0164fb176975b4] Merge remote-tracking branch 'arm-soc/for-next'
git bisect bad cd67b2d4c0ca61f7e93e622dba0164fb176975b4
# good: [a0c166140d2e63a069263b6d3c39a42c61749d96] Merge branch 'next/drivers' into for-next
git bisect good a0c166140d2e63a069263b6d3c39a42c61749d96
# bad: [e5e08751da47170e6a05c09364595ec1abad7cec] Merge remote-tracking branch 'arm/for-next'
git bisect bad e5e08751da47170e6a05c09364595ec1abad7cec
# good: [52e19c3c1eaf103c2eb4f764825136abcfea1538] Merge branches 'clkdev', 'fixes', 'misc' and 'spectre' into for-next
git bisect good 52e19c3c1eaf103c2eb4f764825136abcfea1538
# good: [e8d4162413ecbf3b3d1451808bdbd212cec8b70c] ACPI/IORT: Set bus DMA mask as appropriate
git bisect good e8d4162413ecbf3b3d1451808bdbd212cec8b70c
# good: [186e2e8cc462aed36cc6845c938547833377582f] ACPI/IORT: Don't set default coherent DMA mask
git bisect good 186e2e8cc462aed36cc6845c938547833377582f
# bad: [deff076d4ce359c2d83983a75765b4ac8f635d2f] Merge remote-tracking branch 'dma-mapping/for-next'
git bisect bad deff076d4ce359c2d83983a75765b4ac8f635d2f
# bad: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
git bisect bad ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d
# first bad commit: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask

2018-07-30 09:20:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On Sat, Jul 28, 2018 at 09:58:20AM -0700, Guenter Roeck wrote:
> I would call it a serious regression. Also, no longer setting a default
> coherent DMA mask is a quite substantial behavioral change, especially
> if and since the code worked just fine up to now.
>
> Crash when booting sam460ex attached below, as is a bisect log.

Things should be ok again with:

commit a5516219b10218a87abb3352c82248ce3088e94a
Author: Robin Murphy <[email protected]>
Date: Fri Jul 27 15:14:15 2018 +0100

of/platform: Initialise default DMA masks

from the lastest dma-mapping tree as of Friday. It isn't in the
latest linux-next tree yet but be in the next one.

2018-07-30 14:40:16

by Robin Murphy

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 28/07/18 17:58, Guenter Roeck wrote:
> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> On today's next, the bisect pointed commit
>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>
>>>> Author: Robin Murphy <[email protected]>
>>>> Date: Mon Jul 23 23:16:12 2018 +0100
>>>> OF: Don't set default coherent DMA mask
>>>>
>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>> with DMA) on Iris Carrier.
>>>>
>>>> It looks like problem with Freescale Ethernet driver:
>>>> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>> [ 15.472086] Root-NFS: no NFS server address
>>>> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>> [ 15.484228] VFS: Cannot open root device "nfs" or
>>>> unknown-block(2,0): error -6
>>>> [ 15.491664] Please append a correct "root=" boot option; here are
>>>> the available partitions:
>>>> [ 15.500188] 0100 16384 ram0
>>>> [ 15.500200] (driver?)
>>>> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>> fs on unknown-block(2,0)
>>>> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>> mount root fs on unknown-block(2,0) ]---
>>>>
>>>> Attached - defconfig and full boot log.
>>>>
>>>> Any hints?
>>>> Let me know if you need any more information.
>>>
>>> My Exynos boards also fail to boot on missing network:
>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>
>>> As expected there are plenty of "DMA mask not set" warnings... and
>>> later dwc3 driver fails with:
>>> dwc3: probe of 12400000.dwc3 failed with error -12
>>> which is probably the answer why LAN attached to USB is not present.
>>
>> Looks like all the drivers failed to set a dma mask and were lucky.
>
> I would call it a serious regression. Also, no longer setting a default
> coherent DMA mask is a quite substantial behavioral change, especially
> if and since the code worked just fine up to now.

To reiterate, that particular side-effect was an unintentional
oversight, and I was simply (un)lucky enough that none of the drivers I
did test depended on that default mask. Sorry for the blip; please check
whether it's now fixed in next-20180730 as it should be.

> Crash when booting sam460ex attached below, as is a bisect log.

Nevertheless, like most of the others that came out of the woodwork,
that appears to be a crash due to a broken cleanup path down the line
from dma_alloc_coherent() returning NULL - that warrants fixing (or just
removing) in its own right, because cleanup code which has never been
tested and doesn't actually work is little more than a pointless waste
of space.

Robin.

>
> Guenter
>
> ---
> irq: type mismatch, failed to map hwirq-0 for interrupt-controller3!
> WARNING: CPU: 0 PID: 1 at ppc4xx_msi_probe+0x2dc/0x3b8
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-rc6-00010-gff33d1030a6c #1
> NIP: c001c460 LR: c001c29c CTR: 00000000
> REGS: cf82db60 TRAP: 0700 Not tainted (4.18.0-rc6-00010-gff33d1030a6c)
> MSR: 00029000 <CE,EE,ME> CR: 24002028 XER: 00000000
>
> GPR00: c001c29c cf82dc10 cf828000 d1021000 d1021000 cf882108 cf82db78 00000000
> GPR08: 00000000 c0377ae4 00000000 1000051b 24002028 00000000 c00025e8 00000000
> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
> GPR24: 00029000 0000000c 10000000 cf8de410 c0494d60 00029000 cf8bebc0 cf8de400
> NIP [c001c460] ppc4xx_msi_probe+0x2dc/0x3b8
> LR [c001c29c] ppc4xx_msi_probe+0x118/0x3b8
> Call Trace:
> [cf82dc10] [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 (unreliable)
> [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
> [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
> [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
> [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
> [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
> [cf82dd40] [c02050c8] device_add+0x404/0x5c4
> [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
> [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
> [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
> [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
> [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
> [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
> [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
> [cf82df30] [c0002600] kernel_init+0x18/0x104
> [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
> Instruction dump:
> 3860000e 4bffa2a5 3860000f 7f44d378 4bffa299 4bfffe30 3860000e 4bffa28d
> 3860000f 7f24cb78 4bffa281 4bfffde4 <0fe00000> 81290000 2f890000 409efe6c
> ---[ end trace 8cf551077ecfc429 ]---
> ppc4xx-msi c10000000.ppc4xx-msi: coherent DMA mask is unset
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc001bff0
> Oops: Kernel access of bad area, sig: 11 [#1]
> BE Canyonlands
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc6-00010-gff33d1030a6c #1
> NIP: c001bff0 LR: c001c418 CTR: c01faa7c
> REGS: cf82db40 TRAP: 0300 Tainted: G W (4.18.0-rc6-00010-gff33d1030a6c)
> MSR: 00029000 <CE,EE,ME> CR: 28002024 XER: 00000000
> DEAR: 00000000 ESR: 00000000
> GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
> GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
> GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
> NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
> LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
> Call Trace:
> [cf82dbf0] [00029000] 0x29000 (unreliable)
> [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
> [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
> [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
> [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
> [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
> [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
> [cf82dd40] [c02050c8] device_add+0x404/0x5c4
> [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
> [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
> [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
> [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
> [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
> [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
> [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
> [cf82df30] [c0002600] kernel_init+0x18/0x104
> [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
> Instruction dump:
> 90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
> 409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d
> ---[ end trace 8cf551077ecfc42a ]---
>
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> ---
> # bad: [639d109b21f1413c54ca7042e40a57856e7679bb] Add linux-next specific files for 20180727
> # good: [d72e90f33aa4709ebecc5005562f52335e106a60] Linux 4.18-rc6
> git bisect start 'HEAD' 'v4.18-rc6'
> # bad: [7bc81125a936a25af28f2172b593bca390b0c539] Merge remote-tracking branch 'spi-nor/spi-nor/next'
> git bisect bad 7bc81125a936a25af28f2172b593bca390b0c539
> # bad: [659868e6488dbad1181ad21888521ff41ae45f65] Merge remote-tracking branch 'vfs/for-next'
> git bisect bad 659868e6488dbad1181ad21888521ff41ae45f65
> # bad: [453ff4bb24c3fa4af40995f2615ec22176e71500] Merge remote-tracking branch 'mvebu/for-next'
> git bisect bad 453ff4bb24c3fa4af40995f2615ec22176e71500
> # good: [ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90] Merge branch 'next/soc' into for-next
> git bisect good ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90
> # good: [fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8] Merge remote-tracking branch 'leaks/leaks-next'
> git bisect good fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8
> # good: [53b9c41f0d9c35e41ea884bae6ad4b6fadc59035] Merge branch 'next/drivers' into for-next
> git bisect good 53b9c41f0d9c35e41ea884bae6ad4b6fadc59035
> # bad: [cd67b2d4c0ca61f7e93e622dba0164fb176975b4] Merge remote-tracking branch 'arm-soc/for-next'
> git bisect bad cd67b2d4c0ca61f7e93e622dba0164fb176975b4
> # good: [a0c166140d2e63a069263b6d3c39a42c61749d96] Merge branch 'next/drivers' into for-next
> git bisect good a0c166140d2e63a069263b6d3c39a42c61749d96
> # bad: [e5e08751da47170e6a05c09364595ec1abad7cec] Merge remote-tracking branch 'arm/for-next'
> git bisect bad e5e08751da47170e6a05c09364595ec1abad7cec
> # good: [52e19c3c1eaf103c2eb4f764825136abcfea1538] Merge branches 'clkdev', 'fixes', 'misc' and 'spectre' into for-next
> git bisect good 52e19c3c1eaf103c2eb4f764825136abcfea1538
> # good: [e8d4162413ecbf3b3d1451808bdbd212cec8b70c] ACPI/IORT: Set bus DMA mask as appropriate
> git bisect good e8d4162413ecbf3b3d1451808bdbd212cec8b70c
> # good: [186e2e8cc462aed36cc6845c938547833377582f] ACPI/IORT: Don't set default coherent DMA mask
> git bisect good 186e2e8cc462aed36cc6845c938547833377582f
> # bad: [deff076d4ce359c2d83983a75765b4ac8f635d2f] Merge remote-tracking branch 'dma-mapping/for-next'
> git bisect bad deff076d4ce359c2d83983a75765b4ac8f635d2f
> # bad: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
> git bisect bad ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d
> # first bad commit: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
>

2018-07-31 00:52:18

by Guenter Roeck

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 07/30/2018 07:38 AM, Robin Murphy wrote:
> On 28/07/18 17:58, Guenter Roeck wrote:
>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> On today's next, the bisect pointed commit
>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>
>>>>> Author: Robin Murphy <[email protected]>
>>>>> Date:   Mon Jul 23 23:16:12 2018 +0100
>>>>>      OF: Don't set default coherent DMA mask
>>>>>
>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>> with DMA) on Iris Carrier.
>>>>>
>>>>> It looks like problem with Freescale Ethernet driver:
>>>>> [   15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>> [   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>> [   15.472086] Root-NFS: no NFS server address
>>>>> [   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>> [   15.484228] VFS: Cannot open root device "nfs" or
>>>>> unknown-block(2,0): error -6
>>>>> [   15.491664] Please append a correct "root=" boot option; here are
>>>>> the available partitions:
>>>>> [   15.500188] 0100           16384 ram0
>>>>> [   15.500200]  (driver?)
>>>>> [   15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>> fs on unknown-block(2,0)
>>>>> [   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>
>>>>> Attached - defconfig and full boot log.
>>>>>
>>>>> Any hints?
>>>>> Let me know if you need any more information.
>>>>
>>>> My Exynos boards also fail to boot on missing network:
>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>
>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>> later dwc3 driver fails with:
>>>>      dwc3: probe of 12400000.dwc3 failed with error -12
>>>> which is probably the answer why LAN attached to USB is not present.
>>>
>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>
>> I would call it a serious regression. Also, no longer setting a default
>> coherent DMA mask is a quite substantial behavioral change, especially
>> if and since the code worked just fine up to now.
>
> To reiterate, that particular side-effect was an unintentional oversight, and I was simply (un)lucky enough that none of the drivers I did test depended on that default mask. Sorry for the blip; please check whether it's now fixed in next-20180730 as it should be.
>

Yes, I don't see the warnings and crashes anymore.

>> Crash when booting sam460ex attached below, as is a bisect log.
>
> Nevertheless, like most of the others that came out of the woodwork, that appears to be a crash due to a broken cleanup path down the line from dma_alloc_coherent() returning NULL - that warrants fixing (or just removing) in its own right, because cleanup code which has never been tested and doesn't actually work is little more than a pointless waste of space.
>

I had a quick look into the code. I agree, the error path in
ppc4xx_msi_probe() is completely messed up. It will crash for all
kinds of errors (and in many cases erroneously return -EPERM
as error, but that doesn't really matter since it crashes anyway).

Guenter

> Robin.
>
>>
>> Guenter
>>
>> ---
>> irq: type mismatch, failed to map hwirq-0 for interrupt-controller3!
>> WARNING: CPU: 0 PID: 1 at ppc4xx_msi_probe+0x2dc/0x3b8
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-rc6-00010-gff33d1030a6c #1
>> NIP:  c001c460 LR: c001c29c CTR: 00000000
>> REGS: cf82db60 TRAP: 0700   Not tainted  (4.18.0-rc6-00010-gff33d1030a6c)
>> MSR:  00029000 <CE,EE,ME>  CR: 24002028  XER: 00000000
>>
>> GPR00: c001c29c cf82dc10 cf828000 d1021000 d1021000 cf882108 cf82db78 00000000
>> GPR08: 00000000 c0377ae4 00000000 1000051b 24002028 00000000 c00025e8 00000000
>> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
>> GPR24: 00029000 0000000c 10000000 cf8de410 c0494d60 00029000 cf8bebc0 cf8de400
>> NIP [c001c460] ppc4xx_msi_probe+0x2dc/0x3b8
>> LR [c001c29c] ppc4xx_msi_probe+0x118/0x3b8
>> Call Trace:
>> [cf82dc10] [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 (unreliable)
>> [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
>> [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
>> [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
>> [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
>> [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
>> [cf82dd40] [c02050c8] device_add+0x404/0x5c4
>> [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
>> [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
>> [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
>> [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
>> [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
>> [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
>> [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
>> [cf82df30] [c0002600] kernel_init+0x18/0x104
>> [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
>> Instruction dump:
>> 3860000e 4bffa2a5 3860000f 7f44d378 4bffa299 4bfffe30 3860000e 4bffa28d
>> 3860000f 7f24cb78 4bffa281 4bfffde4 <0fe00000> 81290000 2f890000 409efe6c
>> ---[ end trace 8cf551077ecfc429 ]---
>> ppc4xx-msi c10000000.ppc4xx-msi: coherent DMA mask is unset
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc001bff0
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> BE Canyonlands
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper Tainted: G        W         4.18.0-rc6-00010-gff33d1030a6c #1
>> NIP:  c001bff0 LR: c001c418 CTR: c01faa7c
>> REGS: cf82db40 TRAP: 0300   Tainted: G        W          (4.18.0-rc6-00010-gff33d1030a6c)
>> MSR:  00029000 <CE,EE,ME>  CR: 28002024  XER: 00000000
>> DEAR: 00000000 ESR: 00000000
>> GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
>> GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
>> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
>> GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
>> NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
>> LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
>> Call Trace:
>> [cf82dbf0] [00029000] 0x29000 (unreliable)
>> [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
>> [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
>> [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
>> [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
>> [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
>> [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
>> [cf82dd40] [c02050c8] device_add+0x404/0x5c4
>> [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
>> [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
>> [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
>> [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
>> [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
>> [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
>> [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
>> [cf82df30] [c0002600] kernel_init+0x18/0x104
>> [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
>> Instruction dump:
>> 90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
>> 409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d
>> ---[ end trace 8cf551077ecfc42a ]---
>>
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>
>> ---
>> # bad: [639d109b21f1413c54ca7042e40a57856e7679bb] Add linux-next specific files for 20180727
>> # good: [d72e90f33aa4709ebecc5005562f52335e106a60] Linux 4.18-rc6
>> git bisect start 'HEAD' 'v4.18-rc6'
>> # bad: [7bc81125a936a25af28f2172b593bca390b0c539] Merge remote-tracking branch 'spi-nor/spi-nor/next'
>> git bisect bad 7bc81125a936a25af28f2172b593bca390b0c539
>> # bad: [659868e6488dbad1181ad21888521ff41ae45f65] Merge remote-tracking branch 'vfs/for-next'
>> git bisect bad 659868e6488dbad1181ad21888521ff41ae45f65
>> # bad: [453ff4bb24c3fa4af40995f2615ec22176e71500] Merge remote-tracking branch 'mvebu/for-next'
>> git bisect bad 453ff4bb24c3fa4af40995f2615ec22176e71500
>> # good: [ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90] Merge branch 'next/soc' into for-next
>> git bisect good ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90
>> # good: [fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8] Merge remote-tracking branch 'leaks/leaks-next'
>> git bisect good fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8
>> # good: [53b9c41f0d9c35e41ea884bae6ad4b6fadc59035] Merge branch 'next/drivers' into for-next
>> git bisect good 53b9c41f0d9c35e41ea884bae6ad4b6fadc59035
>> # bad: [cd67b2d4c0ca61f7e93e622dba0164fb176975b4] Merge remote-tracking branch 'arm-soc/for-next'
>> git bisect bad cd67b2d4c0ca61f7e93e622dba0164fb176975b4
>> # good: [a0c166140d2e63a069263b6d3c39a42c61749d96] Merge branch 'next/drivers' into for-next
>> git bisect good a0c166140d2e63a069263b6d3c39a42c61749d96
>> # bad: [e5e08751da47170e6a05c09364595ec1abad7cec] Merge remote-tracking branch 'arm/for-next'
>> git bisect bad e5e08751da47170e6a05c09364595ec1abad7cec
>> # good: [52e19c3c1eaf103c2eb4f764825136abcfea1538] Merge branches 'clkdev', 'fixes', 'misc' and 'spectre' into for-next
>> git bisect good 52e19c3c1eaf103c2eb4f764825136abcfea1538
>> # good: [e8d4162413ecbf3b3d1451808bdbd212cec8b70c] ACPI/IORT: Set bus DMA mask as appropriate
>> git bisect good e8d4162413ecbf3b3d1451808bdbd212cec8b70c
>> # good: [186e2e8cc462aed36cc6845c938547833377582f] ACPI/IORT: Don't set default coherent DMA mask
>> git bisect good 186e2e8cc462aed36cc6845c938547833377582f
>> # bad: [deff076d4ce359c2d83983a75765b4ac8f635d2f] Merge remote-tracking branch 'dma-mapping/for-next'
>> git bisect bad deff076d4ce359c2d83983a75765b4ac8f635d2f
>> # bad: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
>> git bisect bad ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d
>> # first bad commit: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
>>
>


2018-07-31 08:25:52

by Stefan Agner

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 30.07.2018 16:38, Robin Murphy wrote:
> On 28/07/18 17:58, Guenter Roeck wrote:
>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> On today's next, the bisect pointed commit
>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>
>>>>> Author: Robin Murphy <[email protected]>
>>>>> Date: Mon Jul 23 23:16:12 2018 +0100
>>>>> OF: Don't set default coherent DMA mask
>>>>>
>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>> with DMA) on Iris Carrier.
>>>>>
>>>>> It looks like problem with Freescale Ethernet driver:
>>>>> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>> [ 15.472086] Root-NFS: no NFS server address
>>>>> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>> [ 15.484228] VFS: Cannot open root device "nfs" or
>>>>> unknown-block(2,0): error -6
>>>>> [ 15.491664] Please append a correct "root=" boot option; here are
>>>>> the available partitions:
>>>>> [ 15.500188] 0100 16384 ram0
>>>>> [ 15.500200] (driver?)
>>>>> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>> fs on unknown-block(2,0)
>>>>> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>
>>>>> Attached - defconfig and full boot log.
>>>>>
>>>>> Any hints?
>>>>> Let me know if you need any more information.
>>>>
>>>> My Exynos boards also fail to boot on missing network:
>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>
>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>> later dwc3 driver fails with:
>>>> dwc3: probe of 12400000.dwc3 failed with error -12
>>>> which is probably the answer why LAN attached to USB is not present.
>>>
>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>
>> I would call it a serious regression. Also, no longer setting a default
>> coherent DMA mask is a quite substantial behavioral change, especially
>> if and since the code worked just fine up to now.
>
> To reiterate, that particular side-effect was an unintentional
> oversight, and I was simply (un)lucky enough that none of the drivers
> I did test depended on that default mask. Sorry for the blip; please
> check whether it's now fixed in next-20180730 as it should be.
>

Just for my understanding:

Your first patch ("OF: Don't set default coherent DMA mask") sounded
like that *not* setting default coherent DMA mask was intentionally.
Since the commit message reads: "...the bus code has not initialised any
default value" that was assuming that all bus code sets a default DMA
mask which wasn't the case for "simple-bus".

So I guess that is what ("of/platform: Initialise default DMA masks")
makes up for in the typical device tree case ("simple-bus")?

Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
is set again, can/should we rely on the coherent DMA mask set?

Or is the expectation still that this is set on driver level too?

It seems that many drivers were affected in the vf610 case (according to
the log in Krzysztof initial message), e.g.
[ 0.237851] gpio-vf610 4004d000.gpio: DMA mask not set
[ 0.240304] fsl-ftm-pwm 40038000.pwm: DMA mask not set
[ 0.886031] fsl-lpuart 40028000.serial: DMA mask not set
[ 0.958600] vf610_nfc 400e0000.nand: DMA mask not set
[ 1.055900] fsl-dspi 4002d000.dspi1: DMA mask not set
[ 1.393539] fec 400d1000.ethernet: DMA mask not set

--
Stefan


>> Crash when booting sam460ex attached below, as is a bisect log.
>
> Nevertheless, like most of the others that came out of the woodwork,
> that appears to be a crash due to a broken cleanup path down the line
> from dma_alloc_coherent() returning NULL - that warrants fixing (or
> just removing) in its own right, because cleanup code which has never
> been tested and doesn't actually work is little more than a pointless
> waste of space.
>
> Robin.
>
>>
>> Guenter
>>
>> ---
>> irq: type mismatch, failed to map hwirq-0 for interrupt-controller3!
>> WARNING: CPU: 0 PID: 1 at ppc4xx_msi_probe+0x2dc/0x3b8
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-rc6-00010-gff33d1030a6c #1
>> NIP: c001c460 LR: c001c29c CTR: 00000000
>> REGS: cf82db60 TRAP: 0700 Not tainted (4.18.0-rc6-00010-gff33d1030a6c)
>> MSR: 00029000 <CE,EE,ME> CR: 24002028 XER: 00000000
>>
>> GPR00: c001c29c cf82dc10 cf828000 d1021000 d1021000 cf882108 cf82db78 00000000
>> GPR08: 00000000 c0377ae4 00000000 1000051b 24002028 00000000 c00025e8 00000000
>> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
>> GPR24: 00029000 0000000c 10000000 cf8de410 c0494d60 00029000 cf8bebc0 cf8de400
>> NIP [c001c460] ppc4xx_msi_probe+0x2dc/0x3b8
>> LR [c001c29c] ppc4xx_msi_probe+0x118/0x3b8
>> Call Trace:
>> [cf82dc10] [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 (unreliable)
>> [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
>> [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
>> [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
>> [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
>> [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
>> [cf82dd40] [c02050c8] device_add+0x404/0x5c4
>> [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
>> [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
>> [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
>> [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
>> [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
>> [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
>> [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
>> [cf82df30] [c0002600] kernel_init+0x18/0x104
>> [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
>> Instruction dump:
>> 3860000e 4bffa2a5 3860000f 7f44d378 4bffa299 4bfffe30 3860000e 4bffa28d
>> 3860000f 7f24cb78 4bffa281 4bfffde4 <0fe00000> 81290000 2f890000 409efe6c
>> ---[ end trace 8cf551077ecfc429 ]---
>> ppc4xx-msi c10000000.ppc4xx-msi: coherent DMA mask is unset
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc001bff0
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> BE Canyonlands
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc6-00010-gff33d1030a6c #1
>> NIP: c001bff0 LR: c001c418 CTR: c01faa7c
>> REGS: cf82db40 TRAP: 0300 Tainted: G W (4.18.0-rc6-00010-gff33d1030a6c)
>> MSR: 00029000 <CE,EE,ME> CR: 28002024 XER: 00000000
>> DEAR: 00000000 ESR: 00000000
>> GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
>> GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
>> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
>> GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
>> NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
>> LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
>> Call Trace:
>> [cf82dbf0] [00029000] 0x29000 (unreliable)
>> [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
>> [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
>> [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
>> [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
>> [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
>> [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
>> [cf82dd40] [c02050c8] device_add+0x404/0x5c4
>> [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
>> [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
>> [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
>> [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
>> [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
>> [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
>> [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
>> [cf82df30] [c0002600] kernel_init+0x18/0x104
>> [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
>> Instruction dump:
>> 90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
>> 409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d
>> ---[ end trace 8cf551077ecfc42a ]---
>>
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>
>> ---
>> # bad: [639d109b21f1413c54ca7042e40a57856e7679bb] Add linux-next specific files for 20180727
>> # good: [d72e90f33aa4709ebecc5005562f52335e106a60] Linux 4.18-rc6
>> git bisect start 'HEAD' 'v4.18-rc6'
>> # bad: [7bc81125a936a25af28f2172b593bca390b0c539] Merge remote-tracking branch 'spi-nor/spi-nor/next'
>> git bisect bad 7bc81125a936a25af28f2172b593bca390b0c539
>> # bad: [659868e6488dbad1181ad21888521ff41ae45f65] Merge remote-tracking branch 'vfs/for-next'
>> git bisect bad 659868e6488dbad1181ad21888521ff41ae45f65
>> # bad: [453ff4bb24c3fa4af40995f2615ec22176e71500] Merge remote-tracking branch 'mvebu/for-next'
>> git bisect bad 453ff4bb24c3fa4af40995f2615ec22176e71500
>> # good: [ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90] Merge branch 'next/soc' into for-next
>> git bisect good ebc949ee3c7e28b6554f00fcdaf2c0c8aae54d90
>> # good: [fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8] Merge remote-tracking branch 'leaks/leaks-next'
>> git bisect good fef31ecbe2ecbb518ad1db37282eb97ca6dd29b8
>> # good: [53b9c41f0d9c35e41ea884bae6ad4b6fadc59035] Merge branch 'next/drivers' into for-next
>> git bisect good 53b9c41f0d9c35e41ea884bae6ad4b6fadc59035
>> # bad: [cd67b2d4c0ca61f7e93e622dba0164fb176975b4] Merge remote-tracking branch 'arm-soc/for-next'
>> git bisect bad cd67b2d4c0ca61f7e93e622dba0164fb176975b4
>> # good: [a0c166140d2e63a069263b6d3c39a42c61749d96] Merge branch 'next/drivers' into for-next
>> git bisect good a0c166140d2e63a069263b6d3c39a42c61749d96
>> # bad: [e5e08751da47170e6a05c09364595ec1abad7cec] Merge remote-tracking branch 'arm/for-next'
>> git bisect bad e5e08751da47170e6a05c09364595ec1abad7cec
>> # good: [52e19c3c1eaf103c2eb4f764825136abcfea1538] Merge branches 'clkdev', 'fixes', 'misc' and 'spectre' into for-next
>> git bisect good 52e19c3c1eaf103c2eb4f764825136abcfea1538
>> # good: [e8d4162413ecbf3b3d1451808bdbd212cec8b70c] ACPI/IORT: Set bus DMA mask as appropriate
>> git bisect good e8d4162413ecbf3b3d1451808bdbd212cec8b70c
>> # good: [186e2e8cc462aed36cc6845c938547833377582f] ACPI/IORT: Don't set default coherent DMA mask
>> git bisect good 186e2e8cc462aed36cc6845c938547833377582f
>> # bad: [deff076d4ce359c2d83983a75765b4ac8f635d2f] Merge remote-tracking branch 'dma-mapping/for-next'
>> git bisect bad deff076d4ce359c2d83983a75765b4ac8f635d2f
>> # bad: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
>> git bisect bad ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d
>> # first bad commit: [ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d] OF: Don't set default coherent DMA mask
>>

2018-07-31 12:34:12

by Robin Murphy

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31/07/18 09:19, Stefan Agner wrote:
> On 30.07.2018 16:38, Robin Murphy wrote:
>> On 28/07/18 17:58, Guenter Roeck wrote:
>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On today's next, the bisect pointed commit
>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>
>>>>>> Author: Robin Murphy <[email protected]>
>>>>>> Date: Mon Jul 23 23:16:12 2018 +0100
>>>>>> OF: Don't set default coherent DMA mask
>>>>>>
>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>>> with DMA) on Iris Carrier.
>>>>>>
>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>>> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>>> [ 15.472086] Root-NFS: no NFS server address
>>>>>> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>> [ 15.484228] VFS: Cannot open root device "nfs" or
>>>>>> unknown-block(2,0): error -6
>>>>>> [ 15.491664] Please append a correct "root=" boot option; here are
>>>>>> the available partitions:
>>>>>> [ 15.500188] 0100 16384 ram0
>>>>>> [ 15.500200] (driver?)
>>>>>> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>>> fs on unknown-block(2,0)
>>>>>> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>
>>>>>> Attached - defconfig and full boot log.
>>>>>>
>>>>>> Any hints?
>>>>>> Let me know if you need any more information.
>>>>>
>>>>> My Exynos boards also fail to boot on missing network:
>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>
>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>> later dwc3 driver fails with:
>>>>> dwc3: probe of 12400000.dwc3 failed with error -12
>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>
>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>
>>> I would call it a serious regression. Also, no longer setting a default
>>> coherent DMA mask is a quite substantial behavioral change, especially
>>> if and since the code worked just fine up to now.
>>
>> To reiterate, that particular side-effect was an unintentional
>> oversight, and I was simply (un)lucky enough that none of the drivers
>> I did test depended on that default mask. Sorry for the blip; please
>> check whether it's now fixed in next-20180730 as it should be.
>>
>
> Just for my understanding:
>
> Your first patch ("OF: Don't set default coherent DMA mask") sounded
> like that *not* setting default coherent DMA mask was intentionally.
> Since the commit message reads: "...the bus code has not initialised any
> default value" that was assuming that all bus code sets a default DMA
> mask which wasn't the case for "simple-bus".

Yes, reading the patches in the order they were written is perhaps a
little unclear, but hopefully the order in which they are now applied
makes more sense.

> So I guess that is what ("of/platform: Initialise default DMA masks")
> makes up for in the typical device tree case ("simple-bus")?

Indeed, I'd missed the fact that the now-out-of-place-looking
initialisation in of_dma_configure() still actually belonged to
of_platform_device_create_pdata() - that patch should make the
assumptions of "OF: Don't set default coherent DMA mask" true again,
even for OF-platform devices.

> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
> is set again, can/should we rely on the coherent DMA mask set?
>
> Or is the expectation still that this is set on driver level too?

Ideally, we'd like all drivers to explicitly request their masks as the
documentation in DMA-API-HOWTO.txt recommends, if only to ensure DMA is
actually possible - there can be systems where even the default 32-bit
mask is no good - but clearly we're a little way off trying to enforce
that just yet.

Robin.

2018-07-31 13:27:37

by Guenter Roeck

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 07/31/2018 05:32 AM, Robin Murphy wrote:
> On 31/07/18 09:19, Stefan Agner wrote:
>> On 30.07.2018 16:38, Robin Murphy wrote:
>>> On 28/07/18 17:58, Guenter Roeck wrote:
>>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On today's next, the bisect pointed commit
>>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>>
>>>>>>> Author: Robin Murphy <[email protected]>
>>>>>>> Date:   Mon Jul 23 23:16:12 2018 +0100
>>>>>>>       OF: Don't set default coherent DMA mask
>>>>>>>
>>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>>>> with DMA) on Iris Carrier.
>>>>>>>
>>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>>> [   15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>>>> [   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>>>> [   15.472086] Root-NFS: no NFS server address
>>>>>>> [   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>>> [   15.484228] VFS: Cannot open root device "nfs" or
>>>>>>> unknown-block(2,0): error -6
>>>>>>> [   15.491664] Please append a correct "root=" boot option; here are
>>>>>>> the available partitions:
>>>>>>> [   15.500188] 0100           16384 ram0
>>>>>>> [   15.500200]  (driver?)
>>>>>>> [   15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>>>> fs on unknown-block(2,0)
>>>>>>> [   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>>
>>>>>>> Attached - defconfig and full boot log.
>>>>>>>
>>>>>>> Any hints?
>>>>>>> Let me know if you need any more information.
>>>>>>
>>>>>> My Exynos boards also fail to boot on missing network:
>>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>>
>>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>>> later dwc3 driver fails with:
>>>>>>       dwc3: probe of 12400000.dwc3 failed with error -12
>>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>>
>>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>>
>>>> I would call it a serious regression. Also, no longer setting a default
>>>> coherent DMA mask is a quite substantial behavioral change, especially
>>>> if and since the code worked just fine up to now.
>>>
>>> To reiterate, that particular side-effect was an unintentional
>>> oversight, and I was simply (un)lucky enough that none of the drivers
>>> I did test depended on that default mask. Sorry for the blip; please
>>> check whether it's now fixed in next-20180730 as it should be.
>>>
>>
>> Just for my understanding:
>>
>> Your first patch ("OF: Don't set default coherent DMA mask") sounded
>> like that *not* setting default coherent DMA mask was intentionally.
>> Since the commit message reads: "...the bus code has not initialised any
>> default value" that was assuming that all bus code sets a default DMA
>> mask which wasn't the case for "simple-bus".
>
> Yes, reading the patches in the order they were written is perhaps a little unclear, but hopefully the order in which they are now applied makes more sense.
>
>> So I guess that is what ("of/platform: Initialise default DMA masks")
>> makes up for in the typical device tree case ("simple-bus")?
>
> Indeed, I'd missed the fact that the now-out-of-place-looking initialisation in of_dma_configure() still actually belonged to of_platform_device_create_pdata() - that patch should make the assumptions of "OF: Don't set default coherent DMA mask" true again, even for OF-platform devices.
>
>> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
>> is set again, can/should we rely on the coherent DMA mask set?
>>
>> Or is the expectation still that this is set on driver level too?
>
> Ideally, we'd like all drivers to explicitly request their masks as the documentation in DMA-API-HOWTO.txt recommends, if only to ensure DMA is actually possible - there can be systems where even the default 32-bit mask is no good - but clearly we're a little way off trying to enforce that just yet.
>
> Robin.
>

Please note that sparc images still generate the warning (next-20180731).

sunlance ffd35110: DMA mask not set
sunlance.c:v2.02 8/24/03 Miguel de Icaza ([email protected])
ioremap: done with statics, switching to malloc
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 sparc_lance_probe_one+0x428/0x4f4

esp ffd38e90: DMA mask not set
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 esp_sbus_probe+0x408/0x6e8

Guenter


2018-07-31 14:11:09

by Robin Murphy

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31/07/18 14:26, Guenter Roeck wrote:
> On 07/31/2018 05:32 AM, Robin Murphy wrote:
>> On 31/07/18 09:19, Stefan Agner wrote:
>>> On 30.07.2018 16:38, Robin Murphy wrote:
>>>> On 28/07/18 17:58, Guenter Roeck wrote:
>>>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On today's next, the bisect pointed commit
>>>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot
>>>>>>>> failures
>>>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>>>
>>>>>>>> Author: Robin Murphy <[email protected]>
>>>>>>>> Date:   Mon Jul 23 23:16:12 2018 +0100
>>>>>>>>       OF: Don't set default coherent DMA mask
>>>>>>>>
>>>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial
>>>>>>>> configured
>>>>>>>> with DMA) on Iris Carrier.
>>>>>>>>
>>>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>>>> [   15.458477] fsl-edma 40018000.dma-controller: coherent DMA
>>>>>>>> mask is unset
>>>>>>>> [   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic
>>>>>>>> DMA
>>>>>>>> [   15.472086] Root-NFS: no NFS server address
>>>>>>>> [   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>>>> [   15.484228] VFS: Cannot open root device "nfs" or
>>>>>>>> unknown-block(2,0): error -6
>>>>>>>> [   15.491664] Please append a correct "root=" boot option; here
>>>>>>>> are
>>>>>>>> the available partitions:
>>>>>>>> [   15.500188] 0100           16384 ram0
>>>>>>>> [   15.500200]  (driver?)
>>>>>>>> [   15.506406] Kernel panic - not syncing: VFS: Unable to mount
>>>>>>>> root
>>>>>>>> fs on unknown-block(2,0)
>>>>>>>> [   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>>>
>>>>>>>> Attached - defconfig and full boot log.
>>>>>>>>
>>>>>>>> Any hints?
>>>>>>>> Let me know if you need any more information.
>>>>>>>
>>>>>>> My Exynos boards also fail to boot on missing network:
>>>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>>>
>>>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>>>> later dwc3 driver fails with:
>>>>>>>       dwc3: probe of 12400000.dwc3 failed with error -12
>>>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>>>
>>>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>>>
>>>>> I would call it a serious regression. Also, no longer setting a
>>>>> default
>>>>> coherent DMA mask is a quite substantial behavioral change, especially
>>>>> if and since the code worked just fine up to now.
>>>>
>>>> To reiterate, that particular side-effect was an unintentional
>>>> oversight, and I was simply (un)lucky enough that none of the drivers
>>>> I did test depended on that default mask. Sorry for the blip; please
>>>> check whether it's now fixed in next-20180730 as it should be.
>>>>
>>>
>>> Just for my understanding:
>>>
>>> Your first patch ("OF: Don't set default coherent DMA mask") sounded
>>> like that *not* setting default coherent DMA mask was intentionally.
>>> Since the commit message reads: "...the bus code has not initialised any
>>> default value" that was assuming that all bus code sets a default DMA
>>> mask which wasn't the case for "simple-bus".
>>
>> Yes, reading the patches in the order they were written is perhaps a
>> little unclear, but hopefully the order in which they are now applied
>> makes more sense.
>>
>>> So I guess that is what ("of/platform: Initialise default DMA masks")
>>> makes up for in the typical device tree case ("simple-bus")?
>>
>> Indeed, I'd missed the fact that the now-out-of-place-looking
>> initialisation in of_dma_configure() still actually belonged to
>> of_platform_device_create_pdata() - that patch should make the
>> assumptions of "OF: Don't set default coherent DMA mask" true again,
>> even for OF-platform devices.
>>
>>> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
>>> is set again, can/should we rely on the coherent DMA mask set?
>>>
>>> Or is the expectation still that this is set on driver level too?
>>
>> Ideally, we'd like all drivers to explicitly request their masks as
>> the documentation in DMA-API-HOWTO.txt recommends, if only to ensure
>> DMA is actually possible - there can be systems where even the default
>> 32-bit mask is no good - but clearly we're a little way off trying to
>> enforce that just yet.
>>
>> Robin.
>>
>
> Please note that sparc images still generate the warning (next-20180731).

Ugh, OK, any ideas what sparc does to create these platform devices that
isn't of_platform_device_create_pdata() and has somehow grown an
implicit dependency on of_dma_configure() since 4.12? I'm looking, but
nothing jumps out...

Robin.

> sunlance ffd35110: DMA mask not set
> sunlance.c:v2.02 8/24/03 Miguel de Icaza ([email protected])
> ioremap: done with statics, switching to malloc
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516
> sparc_lance_probe_one+0x428/0x4f4
>
> esp ffd38e90: DMA mask not set
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516
> esp_sbus_probe+0x408/0x6e8
>
> Guenter
>

2018-07-31 15:45:03

by Guenter Roeck

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:
> >Please note that sparc images still generate the warning (next-20180731).
>
> Ugh, OK, any ideas what sparc does to create these platform devices that
> isn't of_platform_device_create_pdata() and has somehow grown an implicit
> dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
> out...
>

I suspect it might be of_device_register(), called from
arch/sparc/kernel/of_device_64.c:scan_one_device()
arch/sparc/kernel/of_device_32.c:scan_one_device()

Guenter

2018-07-31 15:55:16

by Stefan Agner

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31.07.2018 14:32, Robin Murphy wrote:
> On 31/07/18 09:19, Stefan Agner wrote:
>> On 30.07.2018 16:38, Robin Murphy wrote:
>>> On 28/07/18 17:58, Guenter Roeck wrote:
>>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On today's next, the bisect pointed commit
>>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>>
>>>>>>> Author: Robin Murphy <[email protected]>
>>>>>>> Date: Mon Jul 23 23:16:12 2018 +0100
>>>>>>> OF: Don't set default coherent DMA mask
>>>>>>>
>>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>>>> with DMA) on Iris Carrier.
>>>>>>>
>>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>>> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>>>> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>>>> [ 15.472086] Root-NFS: no NFS server address
>>>>>>> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>>> [ 15.484228] VFS: Cannot open root device "nfs" or
>>>>>>> unknown-block(2,0): error -6
>>>>>>> [ 15.491664] Please append a correct "root=" boot option; here are
>>>>>>> the available partitions:
>>>>>>> [ 15.500188] 0100 16384 ram0
>>>>>>> [ 15.500200] (driver?)
>>>>>>> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>>>> fs on unknown-block(2,0)
>>>>>>> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>>
>>>>>>> Attached - defconfig and full boot log.
>>>>>>>
>>>>>>> Any hints?
>>>>>>> Let me know if you need any more information.
>>>>>>
>>>>>> My Exynos boards also fail to boot on missing network:
>>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>>
>>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>>> later dwc3 driver fails with:
>>>>>> dwc3: probe of 12400000.dwc3 failed with error -12
>>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>>
>>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>>
>>>> I would call it a serious regression. Also, no longer setting a default
>>>> coherent DMA mask is a quite substantial behavioral change, especially
>>>> if and since the code worked just fine up to now.
>>>
>>> To reiterate, that particular side-effect was an unintentional
>>> oversight, and I was simply (un)lucky enough that none of the drivers
>>> I did test depended on that default mask. Sorry for the blip; please
>>> check whether it's now fixed in next-20180730 as it should be.
>>>
>>
>> Just for my understanding:
>>
>> Your first patch ("OF: Don't set default coherent DMA mask") sounded
>> like that *not* setting default coherent DMA mask was intentionally.
>> Since the commit message reads: "...the bus code has not initialised any
>> default value" that was assuming that all bus code sets a default DMA
>> mask which wasn't the case for "simple-bus".
>
> Yes, reading the patches in the order they were written is perhaps a
> little unclear, but hopefully the order in which they are now applied
> makes more sense.
>
>> So I guess that is what ("of/platform: Initialise default DMA masks")
>> makes up for in the typical device tree case ("simple-bus")?
>
> Indeed, I'd missed the fact that the now-out-of-place-looking
> initialisation in of_dma_configure() still actually belonged to
> of_platform_device_create_pdata() - that patch should make the
> assumptions of "OF: Don't set default coherent DMA mask" true again,
> even for OF-platform devices.
>
>> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
>> is set again, can/should we rely on the coherent DMA mask set?
>>
>> Or is the expectation still that this is set on driver level too?
>
> Ideally, we'd like all drivers to explicitly request their masks as
> the documentation in DMA-API-HOWTO.txt recommends, if only to ensure
> DMA is actually possible - there can be systems where even the default
> 32-bit mask is no good - but clearly we're a little way off trying to
> enforce that just yet.

In the FEC driver case, there is an integrated DMA (uDMA). It has
alignment restrictions, but can otherwise address the full 32-bit range.

So something like this should do it right?

if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
dev_warn(dev, "No suitable DMA available\n");
return -ENODEV;
}

However, that, as far as I understand, still requires that the bus set
up dma_mask properly.

Should I be using dma_coerce_mask_and_coherent?

--
Stefan

>
> Robin.

2018-07-31 15:59:47

by Robin Murphy

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31/07/18 16:43, Guenter Roeck wrote:
> On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:
>>> Please note that sparc images still generate the warning (next-20180731).
>>
>> Ugh, OK, any ideas what sparc does to create these platform devices that
>> isn't of_platform_device_create_pdata() and has somehow grown an implicit
>> dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
>> out...
>>
>
> I suspect it might be of_device_register(), called from
> arch/sparc/kernel/of_device_64.c:scan_one_device()
> arch/sparc/kernel/of_device_32.c:scan_one_device()

Right, that's as far as I got as well, so I'm struggling to see how
these things ever got DMA masks set before the of_dma_configure() call
moved out of of_platform_device_create_pdata(), or why it wasn't a
problem prior to the generic dma_ops rework if they didn't :/

Robin.

2018-07-31 17:27:02

by Robin Murphy

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31/07/18 16:53, Stefan Agner wrote:
> On 31.07.2018 14:32, Robin Murphy wrote:
>> On 31/07/18 09:19, Stefan Agner wrote:
>>> On 30.07.2018 16:38, Robin Murphy wrote:
>>>> On 28/07/18 17:58, Guenter Roeck wrote:
>>>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On today's next, the bisect pointed commit
>>>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>>>
>>>>>>>> Author: Robin Murphy <[email protected]>
>>>>>>>> Date: Mon Jul 23 23:16:12 2018 +0100
>>>>>>>> OF: Don't set default coherent DMA mask
>>>>>>>>
>>>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>>>>> with DMA) on Iris Carrier.
>>>>>>>>
>>>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>>>> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>>>>> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>>>>> [ 15.472086] Root-NFS: no NFS server address
>>>>>>>> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>>>> [ 15.484228] VFS: Cannot open root device "nfs" or
>>>>>>>> unknown-block(2,0): error -6
>>>>>>>> [ 15.491664] Please append a correct "root=" boot option; here are
>>>>>>>> the available partitions:
>>>>>>>> [ 15.500188] 0100 16384 ram0
>>>>>>>> [ 15.500200] (driver?)
>>>>>>>> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>>>>> fs on unknown-block(2,0)
>>>>>>>> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>>>
>>>>>>>> Attached - defconfig and full boot log.
>>>>>>>>
>>>>>>>> Any hints?
>>>>>>>> Let me know if you need any more information.
>>>>>>>
>>>>>>> My Exynos boards also fail to boot on missing network:
>>>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>>>
>>>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>>>> later dwc3 driver fails with:
>>>>>>> dwc3: probe of 12400000.dwc3 failed with error -12
>>>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>>>
>>>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>>>
>>>>> I would call it a serious regression. Also, no longer setting a default
>>>>> coherent DMA mask is a quite substantial behavioral change, especially
>>>>> if and since the code worked just fine up to now.
>>>>
>>>> To reiterate, that particular side-effect was an unintentional
>>>> oversight, and I was simply (un)lucky enough that none of the drivers
>>>> I did test depended on that default mask. Sorry for the blip; please
>>>> check whether it's now fixed in next-20180730 as it should be.
>>>>
>>>
>>> Just for my understanding:
>>>
>>> Your first patch ("OF: Don't set default coherent DMA mask") sounded
>>> like that *not* setting default coherent DMA mask was intentionally.
>>> Since the commit message reads: "...the bus code has not initialised any
>>> default value" that was assuming that all bus code sets a default DMA
>>> mask which wasn't the case for "simple-bus".
>>
>> Yes, reading the patches in the order they were written is perhaps a
>> little unclear, but hopefully the order in which they are now applied
>> makes more sense.
>>
>>> So I guess that is what ("of/platform: Initialise default DMA masks")
>>> makes up for in the typical device tree case ("simple-bus")?
>>
>> Indeed, I'd missed the fact that the now-out-of-place-looking
>> initialisation in of_dma_configure() still actually belonged to
>> of_platform_device_create_pdata() - that patch should make the
>> assumptions of "OF: Don't set default coherent DMA mask" true again,
>> even for OF-platform devices.
>>
>>> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
>>> is set again, can/should we rely on the coherent DMA mask set?
>>>
>>> Or is the expectation still that this is set on driver level too?
>>
>> Ideally, we'd like all drivers to explicitly request their masks as
>> the documentation in DMA-API-HOWTO.txt recommends, if only to ensure
>> DMA is actually possible - there can be systems where even the default
>> 32-bit mask is no good - but clearly we're a little way off trying to
>> enforce that just yet.
>
> In the FEC driver case, there is an integrated DMA (uDMA). It has
> alignment restrictions, but can otherwise address the full 32-bit range.
>
> So something like this should do it right?
>
> if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
> dev_warn(dev, "No suitable DMA available\n");
> return -ENODEV;
> }
>

Yup, precisely.

> However, that, as far as I understand, still requires that the bus set
> up dma_mask properly.
>
> Should I be using dma_coerce_mask_and_coherent?

AFAICS for FEC, the ColdFire instances have statically-set masks, the
i.MX boardfiles get them set via platform+device_register_full(), and
now that the bug-which-never-should-have-been is fixed the DT-based
instances should be fine too, so you should be good to go. In general
I'd say that the dma_coerce_mask*() routines are only really for generic
interface drivers like *HCI where they don't really know what the
underlying device is and it may be on any old random bus. Drivers for
specific IP blocks normally only have one or two known buses to deal
with, so in most cases it's more reasonable to make the bus code
well-behaved if it isn't already.

Robin.

2018-07-31 17:40:54

by Guenter Roeck

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On Tue, Jul 31, 2018 at 04:58:41PM +0100, Robin Murphy wrote:
> On 31/07/18 16:43, Guenter Roeck wrote:
> >On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:
> >>>Please note that sparc images still generate the warning (next-20180731).
> >>
> >>Ugh, OK, any ideas what sparc does to create these platform devices that
> >>isn't of_platform_device_create_pdata() and has somehow grown an implicit
> >>dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
> >>out...
> >>
> >
> >I suspect it might be of_device_register(), called from
> > arch/sparc/kernel/of_device_64.c:scan_one_device()
> > arch/sparc/kernel/of_device_32.c:scan_one_device()
>
> Right, that's as far as I got as well, so I'm struggling to see how these
> things ever got DMA masks set before the of_dma_configure() call moved out
> of of_platform_device_create_pdata(), or why it wasn't a problem prior to
> the generic dma_ops rework if they didn't :/
>
Ah, ok. No idea, sorry. All I know is that the messages were first seen
with next-20180727.

Guenter

2018-07-31 19:03:59

by Stefan Agner

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31.07.2018 18:29, Robin Murphy wrote:
> On 31/07/18 16:53, Stefan Agner wrote:
>> On 31.07.2018 14:32, Robin Murphy wrote:
>>> On 31/07/18 09:19, Stefan Agner wrote:
>>>> On 30.07.2018 16:38, Robin Murphy wrote:
>>>>> On 28/07/18 17:58, Guenter Roeck wrote:
>>>>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <[email protected]> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On today's next, the bisect pointed commit
>>>>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>>>>
>>>>>>>>> Author: Robin Murphy <[email protected]>
>>>>>>>>> Date: Mon Jul 23 23:16:12 2018 +0100
>>>>>>>>> OF: Don't set default coherent DMA mask
>>>>>>>>>
>>>>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>>>>>> with DMA) on Iris Carrier.
>>>>>>>>>
>>>>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>>>>> [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>>>>>> [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>>>>>> [ 15.472086] Root-NFS: no NFS server address
>>>>>>>>> [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>>>>> [ 15.484228] VFS: Cannot open root device "nfs" or
>>>>>>>>> unknown-block(2,0): error -6
>>>>>>>>> [ 15.491664] Please append a correct "root=" boot option; here are
>>>>>>>>> the available partitions:
>>>>>>>>> [ 15.500188] 0100 16384 ram0
>>>>>>>>> [ 15.500200] (driver?)
>>>>>>>>> [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>>>>>> fs on unknown-block(2,0)
>>>>>>>>> [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>>>>
>>>>>>>>> Attached - defconfig and full boot log.
>>>>>>>>>
>>>>>>>>> Any hints?
>>>>>>>>> Let me know if you need any more information.
>>>>>>>>
>>>>>>>> My Exynos boards also fail to boot on missing network:
>>>>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>>>>
>>>>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>>>>> later dwc3 driver fails with:
>>>>>>>> dwc3: probe of 12400000.dwc3 failed with error -12
>>>>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>>>>
>>>>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>>>>
>>>>>> I would call it a serious regression. Also, no longer setting a default
>>>>>> coherent DMA mask is a quite substantial behavioral change, especially
>>>>>> if and since the code worked just fine up to now.
>>>>>
>>>>> To reiterate, that particular side-effect was an unintentional
>>>>> oversight, and I was simply (un)lucky enough that none of the drivers
>>>>> I did test depended on that default mask. Sorry for the blip; please
>>>>> check whether it's now fixed in next-20180730 as it should be.
>>>>>
>>>>
>>>> Just for my understanding:
>>>>
>>>> Your first patch ("OF: Don't set default coherent DMA mask") sounded
>>>> like that *not* setting default coherent DMA mask was intentionally.
>>>> Since the commit message reads: "...the bus code has not initialised any
>>>> default value" that was assuming that all bus code sets a default DMA
>>>> mask which wasn't the case for "simple-bus".
>>>
>>> Yes, reading the patches in the order they were written is perhaps a
>>> little unclear, but hopefully the order in which they are now applied
>>> makes more sense.
>>>
>>>> So I guess that is what ("of/platform: Initialise default DMA masks")
>>>> makes up for in the typical device tree case ("simple-bus")?
>>>
>>> Indeed, I'd missed the fact that the now-out-of-place-looking
>>> initialisation in of_dma_configure() still actually belonged to
>>> of_platform_device_create_pdata() - that patch should make the
>>> assumptions of "OF: Don't set default coherent DMA mask" true again,
>>> even for OF-platform devices.
>>>
>>>> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
>>>> is set again, can/should we rely on the coherent DMA mask set?
>>>>
>>>> Or is the expectation still that this is set on driver level too?
>>>
>>> Ideally, we'd like all drivers to explicitly request their masks as
>>> the documentation in DMA-API-HOWTO.txt recommends, if only to ensure
>>> DMA is actually possible - there can be systems where even the default
>>> 32-bit mask is no good - but clearly we're a little way off trying to
>>> enforce that just yet.
>>
>> In the FEC driver case, there is an integrated DMA (uDMA). It has
>> alignment restrictions, but can otherwise address the full 32-bit range.
>>
>> So something like this should do it right?
>>
>> if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
>> dev_warn(dev, "No suitable DMA available\n");
>> return -ENODEV;
>> }
>>
>
> Yup, precisely.
>
>> However, that, as far as I understand, still requires that the bus set
>> up dma_mask properly.
>>
>> Should I be using dma_coerce_mask_and_coherent?
>
> AFAICS for FEC, the ColdFire instances have statically-set masks, the
> i.MX boardfiles get them set via platform+device_register_full(), and
> now that the bug-which-never-should-have-been is fixed the DT-based
> instances should be fine too, so you should be good to go. In general
> I'd say that the dma_coerce_mask*() routines are only really for
> generic interface drivers like *HCI where they don't really know what
> the underlying device is and it may be on any old random bus. Drivers
> for specific IP blocks normally only have one or two known buses to
> deal with, so in most cases it's more reasonable to make the bus code
> well-behaved if it isn't already.

Got it, with your patch the underlying bus of the DT case is well
behaving again.

Will send a patch which makes use of dma_set_mask_and_coherent.

Thanks for your clarification!

--
Stefan

2018-08-01 16:37:58

by Robin Murphy

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On 31/07/18 18:38, Guenter Roeck wrote:
> On Tue, Jul 31, 2018 at 04:58:41PM +0100, Robin Murphy wrote:
>> On 31/07/18 16:43, Guenter Roeck wrote:
>>> On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:
>>>>> Please note that sparc images still generate the warning (next-20180731).
>>>>
>>>> Ugh, OK, any ideas what sparc does to create these platform devices that
>>>> isn't of_platform_device_create_pdata() and has somehow grown an implicit
>>>> dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
>>>> out...
>>>>
>>>
>>> I suspect it might be of_device_register(), called from
>>> arch/sparc/kernel/of_device_64.c:scan_one_device()
>>> arch/sparc/kernel/of_device_32.c:scan_one_device()
>>
>> Right, that's as far as I got as well, so I'm struggling to see how these
>> things ever got DMA masks set before the of_dma_configure() call moved out
>> of of_platform_device_create_pdata(), or why it wasn't a problem prior to
>> the generic dma_ops rework if they didn't :/
>>
> Ah, ok. No idea, sorry. All I know is that the messages were first seen
> with next-20180727.

OK, I spent this afternoon wrangling toolchains and QEMU to boot an
instrumented 4.11 kernel, and the answer is that the warnings are
arguably correct. These masks have indeed never been set where they
should have been, but then the sbus_dma_ops don't reference them anyway.

The coherent mask WARN_ON *should* have started appearing in 4.16 with
205e1b7f51e4("dma-mapping: warn when there is no coherent_dma_mask"),
but happened to be hidden by the inadvertent side-effect of the prior
dma_configure() change. Since there's seemingly no actual regression of
functionality, I'm inclined to leave this in the hands of whoever cares
about sparc32.

Robin.

2018-08-01 18:25:06

by Guenter Roeck

[permalink] [raw]
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

On Wed, Aug 01, 2018 at 05:33:30PM +0100, Robin Murphy wrote:
> On 31/07/18 18:38, Guenter Roeck wrote:
> >On Tue, Jul 31, 2018 at 04:58:41PM +0100, Robin Murphy wrote:
> >>On 31/07/18 16:43, Guenter Roeck wrote:
> >>>On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:
> >>>>>Please note that sparc images still generate the warning (next-20180731).
> >>>>
> >>>>Ugh, OK, any ideas what sparc does to create these platform devices that
> >>>>isn't of_platform_device_create_pdata() and has somehow grown an implicit
> >>>>dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
> >>>>out...
> >>>>
> >>>
> >>>I suspect it might be of_device_register(), called from
> >>> arch/sparc/kernel/of_device_64.c:scan_one_device()
> >>> arch/sparc/kernel/of_device_32.c:scan_one_device()
> >>
> >>Right, that's as far as I got as well, so I'm struggling to see how these
> >>things ever got DMA masks set before the of_dma_configure() call moved out
> >>of of_platform_device_create_pdata(), or why it wasn't a problem prior to
> >>the generic dma_ops rework if they didn't :/
> >>
> >Ah, ok. No idea, sorry. All I know is that the messages were first seen
> >with next-20180727.
>
> OK, I spent this afternoon wrangling toolchains and QEMU to boot an
> instrumented 4.11 kernel, and the answer is that the warnings are arguably
> correct. These masks have indeed never been set where they should have been,
> but then the sbus_dma_ops don't reference them anyway.
>
> The coherent mask WARN_ON *should* have started appearing in 4.16 with
> 205e1b7f51e4("dma-mapping: warn when there is no coherent_dma_mask"), but
> happened to be hidden by the inadvertent side-effect of the prior
> dma_configure() change. Since there's seemingly no actual regression of
> functionality, I'm inclined to leave this in the hands of whoever cares
> about sparc32.
>

You mean there is now a warning stating that coherent_dma_mask is not set,
but coherent_dma_mask isn't used by sparc32 in the first place ?

Guenter