2020-03-10 06:43:26

by Greg Ungerer

[permalink] [raw]
Subject: Re: [PATCH] crypto: caam - select DMA address size at runtime

Hi Andrey,

I am tracking down a caam driver problem, where it is dumping on startup
on a Layerscape 1046 based hardware platform. The dump typically looks
something like this:

------------[ cut here ]------------
kernel BUG at drivers/crypto/caam/jr.c:218!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-ac0 #1
Hardware name: Digi AnywhereUSB-8 (DT)
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : caam_jr_dequeue+0x3f8/0x420
lr : tasklet_action_common.isra.17+0x144/0x180
sp : ffffffc010003df0
x29: ffffffc010003df0 x28: 0000000000000001
x27: 0000000000000000 x26: 0000000000000000
x25: ffffff8020aeba80 x24: 0000000000000000
x23: 0000000000000000 x22: ffffffc010ab4e51
x21: 0000000000000001 x20: ffffffc010ab4000
x19: ffffff8020a2ec10 x18: 0000000000000004
x17: 0000000000000001 x16: 6800f1f100000000
x15: ffffffc010de5000 x14: 0000000000000000
x13: ffffffc010de5000 x12: ffffffc010de5000
x11: 0000000000000000 x10: ffffff8073018080
x9 : 0000000000000028 x8 : 0000000000000000
x7 : 0000000000000000 x6 : ffffffc010a11140
x5 : ffffffc06b070000 x4 : 0000000000000008
x3 : ffffff8073018080 x2 : 0000000000000000
x1 : 0000000000000001 x0 : 0000000000000000

Call trace:
caam_jr_dequeue+0x3f8/0x420
tasklet_action_common.isra.17+0x144/0x180
tasklet_action+0x24/0x30
_stext+0x114/0x228
irq_exit+0x64/0x70
__handle_domain_irq+0x64/0xb8
gic_handle_irq+0x50/0xa0
el1_irq+0xb8/0x140
arch_cpu_idle+0x10/0x18
do_idle+0xf0/0x118
cpu_startup_entry+0x24/0x60
rest_init+0xb0/0xbc
arch_call_rest_init+0xc/0x14
start_kernel+0x3d0/0x3fc
Code: d3607c21 2a020002 aa010041 17ffff4d (d4210000)
---[ end trace ce2c4c37d2c89a99 ]---


Git bisecting this lead me to commit a1cf573ee95d ("crypto: caam -
select DMA address size at runtime") as the culprit.

I came across commit by Iuliana, 7278fa25aa0e ("crypto: caam -
do not reset pointer size from MCFGR register"). However that
doesn't fix this dumping problem for me (it does seem to occur
less often though). [NOTE: dump above generated with this
change applied].

I initially hit this dump on a linux-5.4, and it also occurs on
linux-5.5 for me.

Any thoughts?

Regards
Greg


2020-03-10 07:16:29

by Greg Ungerer

[permalink] [raw]
Subject: Re: [PATCH] crypto: caam - select DMA address size at runtime

Hi Andrey,

In further testing I am still getting dumps even after reverting this change.
So maybe this is not really the problem at all... It is happening much less
often - it used to be pretty much every boot (with this change in place),
with change reverted maybe 1 in 10 or so.

Regards
Greg



On 10/3/20 4:42 pm, Greg Ungerer wrote:
> Hi Andrey,
>
> I am tracking down a caam driver problem, where it is dumping on startup
> on a Layerscape 1046 based hardware platform. The dump typically looks
> something like this:
>
> ------------[ cut here ]------------
> kernel BUG at drivers/crypto/caam/jr.c:218!
> Internal error: Oops - BUG: 0 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-ac0 #1
> Hardware name: Digi AnywhereUSB-8 (DT)
> pstate: 40000005 (nZcv daif -PAN -UAO)
> pc : caam_jr_dequeue+0x3f8/0x420
> lr : tasklet_action_common.isra.17+0x144/0x180
> sp : ffffffc010003df0
> x29: ffffffc010003df0 x28: 0000000000000001
> x27: 0000000000000000 x26: 0000000000000000
> x25: ffffff8020aeba80 x24: 0000000000000000
> x23: 0000000000000000 x22: ffffffc010ab4e51
> x21: 0000000000000001 x20: ffffffc010ab4000
> x19: ffffff8020a2ec10 x18: 0000000000000004
> x17: 0000000000000001 x16: 6800f1f100000000
> x15: ffffffc010de5000 x14: 0000000000000000
> x13: ffffffc010de5000 x12: ffffffc010de5000
> x11: 0000000000000000 x10: ffffff8073018080
> x9 : 0000000000000028 x8 : 0000000000000000
> x7 : 0000000000000000 x6 : ffffffc010a11140
> x5 : ffffffc06b070000 x4 : 0000000000000008
> x3 : ffffff8073018080 x2 : 0000000000000000
> x1 : 0000000000000001 x0 : 0000000000000000
>
> Call trace:
>  caam_jr_dequeue+0x3f8/0x420
>  tasklet_action_common.isra.17+0x144/0x180
>  tasklet_action+0x24/0x30
>  _stext+0x114/0x228
>  irq_exit+0x64/0x70
>  __handle_domain_irq+0x64/0xb8
>  gic_handle_irq+0x50/0xa0
>  el1_irq+0xb8/0x140
>  arch_cpu_idle+0x10/0x18
>  do_idle+0xf0/0x118
>  cpu_startup_entry+0x24/0x60
>  rest_init+0xb0/0xbc
>  arch_call_rest_init+0xc/0x14
>  start_kernel+0x3d0/0x3fc
> Code: d3607c21 2a020002 aa010041 17ffff4d (d4210000)
> ---[ end trace ce2c4c37d2c89a99 ]---
>
>
> Git bisecting this lead me to commit a1cf573ee95d ("crypto: caam -
> select DMA address size at runtime") as the culprit.
>
> I came across commit by Iuliana, 7278fa25aa0e ("crypto: caam -
> do not reset pointer size from MCFGR register"). However that
> doesn't fix this dumping problem for me (it does seem to occur
> less often though). [NOTE: dump above generated with this
> change applied].
>
> I initially hit this dump on a linux-5.4, and it also occurs on
> linux-5.5 for me.
>
> Any thoughts?
>
> Regards
> Greg

2020-03-10 12:01:20

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH] crypto: caam - select DMA address size at runtime

On 3/10/2020 8:43 AM, Greg Ungerer wrote:
> Hi Andrey,
>
> I am tracking down a caam driver problem, where it is dumping on startup
> on a Layerscape 1046 based hardware platform. The dump typically looks
> something like this:
>
> ------------[ cut here ]------------
> kernel BUG at drivers/crypto/caam/jr.c:218!
> Internal error: Oops - BUG: 0 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-ac0 #1
> Hardware name: Digi AnywhereUSB-8 (DT)
> pstate: 40000005 (nZcv daif -PAN -UAO)
> pc : caam_jr_dequeue+0x3f8/0x420
> lr : tasklet_action_common.isra.17+0x144/0x180
> sp : ffffffc010003df0
> x29: ffffffc010003df0 x28: 0000000000000001
> x27: 0000000000000000 x26: 0000000000000000
> x25: ffffff8020aeba80 x24: 0000000000000000
> x23: 0000000000000000 x22: ffffffc010ab4e51
> x21: 0000000000000001 x20: ffffffc010ab4000
> x19: ffffff8020a2ec10 x18: 0000000000000004
> x17: 0000000000000001 x16: 6800f1f100000000
> x15: ffffffc010de5000 x14: 0000000000000000
> x13: ffffffc010de5000 x12: ffffffc010de5000
> x11: 0000000000000000 x10: ffffff8073018080
> x9 : 0000000000000028 x8 : 0000000000000000
> x7 : 0000000000000000 x6 : ffffffc010a11140
> x5 : ffffffc06b070000 x4 : 0000000000000008
> x3 : ffffff8073018080 x2 : 0000000000000000
> x1 : 0000000000000001 x0 : 0000000000000000
>
> Call trace:
> caam_jr_dequeue+0x3f8/0x420
> tasklet_action_common.isra.17+0x144/0x180
> tasklet_action+0x24/0x30
> _stext+0x114/0x228
> irq_exit+0x64/0x70
> __handle_domain_irq+0x64/0xb8
> gic_handle_irq+0x50/0xa0
> el1_irq+0xb8/0x140
> arch_cpu_idle+0x10/0x18
> do_idle+0xf0/0x118
> cpu_startup_entry+0x24/0x60
> rest_init+0xb0/0xbc
> arch_call_rest_init+0xc/0x14
> start_kernel+0x3d0/0x3fc
> Code: d3607c21 2a020002 aa010041 17ffff4d (d4210000)
> ---[ end trace ce2c4c37d2c89a99 ]---
>
>
> Git bisecting this lead me to commit a1cf573ee95d ("crypto: caam -
> select DMA address size at runtime") as the culprit.
>
> I came across commit by Iuliana, 7278fa25aa0e ("crypto: caam -
> do not reset pointer size from MCFGR register"). However that
> doesn't fix this dumping problem for me (it does seem to occur
> less often though). [NOTE: dump above generated with this
> change applied].
>
> I initially hit this dump on a linux-5.4, and it also occurs on
> linux-5.5 for me.
>
> Any thoughts?
>
Could you try the following patch?
It worked on my side.

Unfortunately I don't think it fixes the root cause,
the device should work fine (though slower) without the property.
DMA API violations (e.g. cacheline sharing) are a good candidate.

--- >8 ---

Subject: [PATCH] arm64: dts: ls1046a: mark crypto engine dma coherent

Crypto engine (CAAM) on LS1046A platform has support for HW coherency,
mark accordingly the DT node.

Signed-off-by: Horia Geantă <[email protected]>
---
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index d4c1da3d4bde..9e8147ef1748 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -244,6 +244,7 @@
ranges = <0x0 0x00 0x1700000 0x100000>;
reg = <0x00 0x1700000 0x0 0x100000>;
interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>;
+ dma-coherent;

sec_jr0: jr@10000 {
compatible = "fsl,sec-v5.4-job-ring",
--
2.17.1

2020-03-11 04:48:24

by Greg Ungerer

[permalink] [raw]
Subject: Re: [PATCH] crypto: caam - select DMA address size at runtime

Hi Horia,

On 10/3/20 10:00 pm, Horia Geantă wrote:
> On 3/10/2020 8:43 AM, Greg Ungerer wrote:
>> Hi Andrey,
>>
>> I am tracking down a caam driver problem, where it is dumping on startup
>> on a Layerscape 1046 based hardware platform. The dump typically looks
>> something like this:
>>
>> ------------[ cut here ]------------
>> kernel BUG at drivers/crypto/caam/jr.c:218!
>> Internal error: Oops - BUG: 0 [#1] SMP
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-ac0 #1
>> Hardware name: Digi AnywhereUSB-8 (DT)
>> pstate: 40000005 (nZcv daif -PAN -UAO)
>> pc : caam_jr_dequeue+0x3f8/0x420
>> lr : tasklet_action_common.isra.17+0x144/0x180
>> sp : ffffffc010003df0
>> x29: ffffffc010003df0 x28: 0000000000000001
>> x27: 0000000000000000 x26: 0000000000000000
>> x25: ffffff8020aeba80 x24: 0000000000000000
>> x23: 0000000000000000 x22: ffffffc010ab4e51
>> x21: 0000000000000001 x20: ffffffc010ab4000
>> x19: ffffff8020a2ec10 x18: 0000000000000004
>> x17: 0000000000000001 x16: 6800f1f100000000
>> x15: ffffffc010de5000 x14: 0000000000000000
>> x13: ffffffc010de5000 x12: ffffffc010de5000
>> x11: 0000000000000000 x10: ffffff8073018080
>> x9 : 0000000000000028 x8 : 0000000000000000
>> x7 : 0000000000000000 x6 : ffffffc010a11140
>> x5 : ffffffc06b070000 x4 : 0000000000000008
>> x3 : ffffff8073018080 x2 : 0000000000000000
>> x1 : 0000000000000001 x0 : 0000000000000000
>>
>> Call trace:
>> caam_jr_dequeue+0x3f8/0x420
>> tasklet_action_common.isra.17+0x144/0x180
>> tasklet_action+0x24/0x30
>> _stext+0x114/0x228
>> irq_exit+0x64/0x70
>> __handle_domain_irq+0x64/0xb8
>> gic_handle_irq+0x50/0xa0
>> el1_irq+0xb8/0x140
>> arch_cpu_idle+0x10/0x18
>> do_idle+0xf0/0x118
>> cpu_startup_entry+0x24/0x60
>> rest_init+0xb0/0xbc
>> arch_call_rest_init+0xc/0x14
>> start_kernel+0x3d0/0x3fc
>> Code: d3607c21 2a020002 aa010041 17ffff4d (d4210000)
>> ---[ end trace ce2c4c37d2c89a99 ]---
>>
>>
>> Git bisecting this lead me to commit a1cf573ee95d ("crypto: caam -
>> select DMA address size at runtime") as the culprit.
>>
>> I came across commit by Iuliana, 7278fa25aa0e ("crypto: caam -
>> do not reset pointer size from MCFGR register"). However that
>> doesn't fix this dumping problem for me (it does seem to occur
>> less often though). [NOTE: dump above generated with this
>> change applied].
>>
>> I initially hit this dump on a linux-5.4, and it also occurs on
>> linux-5.5 for me.
>>
>> Any thoughts?
>>
> Could you try the following patch?
> It worked on my side.
>
> Unfortunately I don't think it fixes the root cause,
> the device should work fine (though slower) without the property.
> DMA API violations (e.g. cacheline sharing) are a good candidate.

Yep, that definitely fixes it for me. Thanks!

Regards
Greg


> --- >8 ---
>
> Subject: [PATCH] arm64: dts: ls1046a: mark crypto engine dma coherent
>
> Crypto engine (CAAM) on LS1046A platform has support for HW coherency,
> mark accordingly the DT node.
>
> Signed-off-by: Horia Geantă <[email protected]>
> ---
> arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index d4c1da3d4bde..9e8147ef1748 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -244,6 +244,7 @@
> ranges = <0x0 0x00 0x1700000 0x100000>;
> reg = <0x00 0x1700000 0x0 0x100000>;
> interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>;
> + dma-coherent;
>
> sec_jr0: jr@10000 {
> compatible = "fsl,sec-v5.4-job-ring",
> --
> 2.17.1
>