2020-12-28 04:47:10

by Paul Thomas

[permalink] [raw]
Subject: dmaengine : xilinx_dma two issues

Hello,

I'm trying to get the 5.10 kernel up and running for our system, and
I'm running into a couple of issues with xilinx_dma.

First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
probe fix node order dependency' breaks our usage. Before this commit
a call to:
dma_request_chan(&indio_dev->dev, "axi_dma_0");
returns fine, but after that commit it returns -19. The reason for
this seems to be that the only channel that is setup is channel 1
(chan->id is 1 in xilinx_dma_chan_probe()). However in
of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
dma_spec->args[0];), which causes the:
!xdev->chan[chan_id]
test to fail in of_dma_xilinx_xlate()

Our device-tree entry looks like this:
axi_dma_0: dma@80002000 {
status = "okay";
#dma-cells = <1>;
compatible = "xlnx,axi-dma-1.00.a";
interrupt-parent = <&gic>;
interrupts = <0 89 4>;
reg = <0x0 0x80002000 0x0 0x1000>;
xlnx,addrwidth = <0x20>;
clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
<&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
"m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
dma-channel@80002030 {
compatible = "xlnx,axi-dma-s2mm-channel";
dma-channels = <0x1>;
interrupts = <0 89 4>;
xlnx,datawidth = <0x20>;
xlnx,device-id = <0x0>;
};
};

This is on a 5.10.1 kernel on arm64 zynqmp hardware.

The second issue goes a little further back to commit e81274cd6b526
'dmaengine: add support to dynamic register/unregister of channels'.
After this commit even just removing the module 'rmmod xilinx_dma',
without ever using it, results in a kernel oops like this:
[ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
[ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
defined so it is not safe to unbind this driver while in use
[ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver Probed!!
[ 42.100660] Unable to handle kernel paging request at virtual
address dead000000000108
[ 42.108598] Mem abort info:
[ 42.111393] ESR = 0x96000044
[ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
[ 42.119744] SET = 0, FnV = 0
[ 42.122794] EA = 0, S1PTW = 0
[ 42.125918] Data abort info:
[ 42.128789] ISV = 0, ISS = 0x00000044
[ 42.132617] CM = 0, WnR = 1
[ 42.135577] [dead000000000108] address between user and kernel address ranges
[ 42.142705] Internal error: Oops: 96000044 [#1] SMP
[ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
uio_pdrv_genirq
[ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
5.10.1-00026-g3a2e6dd7a05-dirty #192
[ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
[ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
[ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
[ 42.185636] sp : ffffffc01112bca0
[ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
[ 42.194238] x27: 0000000000000000 x26: 0000000000000000
[ 42.199542] x25: 0000000000000000 x24: 0000000000000000
[ 42.204845] x23: 0000000000000000 x22: 0000000000000000
[ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
[ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
[ 42.220756] x17: 0000000000000000 x16: 0000000000000000
[ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
[ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
[ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
[ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
[ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
[ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
[ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
[ 42.263184] x1 : dead000000000100 x0 : dead000000000122
[ 42.268488] Call trace:
[ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
[ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
[ 42.281446] platform_drv_remove+0x24/0x38
[ 42.285530] device_release_driver_internal+0xec/0x1a8
[ 42.290659] driver_detach+0x64/0xd8
[ 42.294226] bus_remove_driver+0x58/0xb8
[ 42.298133] driver_unregister+0x30/0x60
[ 42.302048] platform_driver_unregister+0x14/0x20
[ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
[ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
[ 42.317178] el0_svc_common.constprop.4+0x68/0x170
[ 42.321959] do_el0_svc+0x70/0x90
[ 42.325267] el0_svc+0x14/0x20
[ 42.328313] el0_sync_handler+0x90/0xb8
[ 42.332141] el0_sync+0x158/0x180
[ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
[ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---

So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version of
xilinx_dma.c and never remove the module then it is working with the
5.10.1 kernel.

Hopefully, this will be clear to someone how these issues can be
resolved. In general we've been very happy using the xilinx dma.

I'm not subscribed to the linux-kernel ML so if you need any further
info or testing just keep me in the to: list.

thanks,
Paul


2021-01-04 06:59:15

by Radhey Shyam Pandey

[permalink] [raw]
Subject: RE: dmaengine : xilinx_dma two issues

> -----Original Message-----
> From: Paul Thomas <[email protected]>
> Sent: Monday, December 28, 2020 10:14 AM
> To: Dan Williams <[email protected]>; Vinod Koul
> <[email protected]>; Michal Simek <[email protected]>; Radhey Shyam
> Pandey <[email protected]>; Matthew Murrian
> <[email protected]>; Romain Perier
> <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
> Ferland <[email protected]>; Sebastian von Ohr
> <[email protected]>; [email protected]; Linux ARM <linux-
> [email protected]>; linux-kernel <linux-
> [email protected]>
> Subject: dmaengine : xilinx_dma two issues
>
> Hello,
>
> I'm trying to get the 5.10 kernel up and running for our system, and I'm
> running into a couple of issues with xilinx_dma.
+ (Xilinx mailing list)

Thanks for bringing the issues to our notice. Replies inline.

>
> First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel probe
> fix node order dependency' breaks our usage. Before this commit a call to:
> dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but after
> that commit it returns -19. The reason for this seems to be that the only
> channel that is setup is channel 1 (chan->id is 1 in xilinx_dma_chan_probe()).
> However in
> of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id = dma_spec-
> >args[0];), which causes the:
> !xdev->chan[chan_id]
> test to fail in of_dma_xilinx_xlate()

What is the channel number passed in
dmaclient DT?

dmas = <& axi_dma_0 1>
dma-names = "axi_dma_0"

>
> Our device-tree entry looks like this:
> axi_dma_0: dma@80002000 {
> status = "okay";
> #dma-cells = <1>;
> compatible = "xlnx,axi-dma-1.00.a";
> interrupt-parent = <&gic>;
> interrupts = <0 89 4>;
> reg = <0x0 0x80002000 0x0 0x1000>;
> xlnx,addrwidth = <0x20>;
> clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk", "m_axi_mm2s_aclk",
> "m_axi_s2mm_aclk";
> dma-channel@80002030 {
> compatible = "xlnx,axi-dma-s2mm-channel";
> dma-channels = <0x1>;
> interrupts = <0 89 4>;
> xlnx,datawidth = <0x20>;
> xlnx,device-id = <0x0>;
> };
> };
>
> This is on a 5.10.1 kernel on arm64 zynqmp hardware.
>
> The second issue goes a little further back to commit e81274cd6b526
> 'dmaengine: add support to dynamic register/unregister of channels'.
> After this commit even just removing the module 'rmmod xilinx_dma',
> without ever using it, results in a kernel oops like this:
> [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> defined so it is not safe to unbind this driver while in use
> [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> Probed!!
> [ 42.100660] Unable to handle kernel paging request at virtual
> address dead000000000108
> [ 42.108598] Mem abort info:
> [ 42.111393] ESR = 0x96000044
> [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 42.119744] SET = 0, FnV = 0
> [ 42.122794] EA = 0, S1PTW = 0
> [ 42.125918] Data abort info:
> [ 42.128789] ISV = 0, ISS = 0x00000044
> [ 42.132617] CM = 0, WnR = 1
> [ 42.135577] [dead000000000108] address between user and kernel
> address ranges
> [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> uio_pdrv_genirq
> [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> 5.10.1-00026-g3a2e6dd7a05-dirty #192
> [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> [ 42.185636] sp : ffffffc01112bca0
> [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> [ 42.268488] Call trace:
> [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> [ 42.281446] platform_drv_remove+0x24/0x38
> [ 42.285530] device_release_driver_internal+0xec/0x1a8
> [ 42.290659] driver_detach+0x64/0xd8
> [ 42.294226] bus_remove_driver+0x58/0xb8
> [ 42.298133] driver_unregister+0x30/0x60
> [ 42.302048] platform_driver_unregister+0x14/0x20
> [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> [ 42.321959] do_el0_svc+0x70/0x90
> [ 42.325267] el0_svc+0x14/0x20
> [ 42.328313] el0_sync_handler+0x90/0xb8
> [ 42.332141] el0_sync+0x158/0x180
> [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
>
> So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version of
> xilinx_dma.c and never remove the module then it is working with the
> 5.10.1 kernel.

Ok, we will analyze this issue and report back the findings.

>
> Hopefully, this will be clear to someone how these issues can be resolved. In
> general we've been very happy using the xilinx dma.
>
> I'm not subscribed to the linux-kernel ML so if you need any further info or
> testing just keep me in the to: list.
>
> thanks,
> Paul

2021-01-08 07:19:05

by Radhey Shyam Pandey

[permalink] [raw]
Subject: RE: dmaengine : xilinx_dma two issues

> -----Original Message-----
> From: Radhey Shyam Pandey
> Sent: Monday, January 4, 2021 10:50 AM
> To: Paul Thomas <[email protected]>; Dan Williams
> <[email protected]>; Vinod Koul <[email protected]>; Michal Simek
> <[email protected]>; Matthew Murrian <[email protected]>;
> Romain Perier <[email protected]>; Krzysztof Kozlowski
> <[email protected]>; Marc Ferland <[email protected]>; Sebastian von Ohr
> <[email protected]>; [email protected]; Linux ARM <linux-
> [email protected]>; linux-kernel <linux-
> [email protected]>; Shravya Kumbham <[email protected]>; git
> <[email protected]>
> Subject: RE: dmaengine : xilinx_dma two issues
>
> > -----Original Message-----
> > From: Paul Thomas <[email protected]>
> > Sent: Monday, December 28, 2020 10:14 AM
> > To: Dan Williams <[email protected]>; Vinod Koul
> > <[email protected]>; Michal Simek <[email protected]>; Radhey Shyam
> > Pandey <[email protected]>; Matthew Murrian
> > <[email protected]>; Romain Perier
> <[email protected]>;
> > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > [email protected]; Linux ARM <linux-
> > [email protected]>; linux-kernel <linux-
> > [email protected]>
> > Subject: dmaengine : xilinx_dma two issues
> >
> > Hello,
> >
> > I'm trying to get the 5.10 kernel up and running for our system, and
> > I'm running into a couple of issues with xilinx_dma.
> + (Xilinx mailing list)
>
> Thanks for bringing the issues to our notice. Replies inline.
>
> >
> > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > probe fix node order dependency' breaks our usage. Before this commit a
> call to:
> > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > after that commit it returns -19. The reason for this seems to be that
> > the only channel that is setup is channel 1 (chan->id is 1 in
> xilinx_dma_chan_probe()).
> > However in
> > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > dma_spec-
> > >args[0];), which causes the:
> > !xdev->chan[chan_id]
> > test to fail in of_dma_xilinx_xlate()
>
> What is the channel number passed in
> dmaclient DT?

Any update on this issue?

>
> dmas = <& axi_dma_0 1>
> dma-names = "axi_dma_0"
>
> >
> > Our device-tree entry looks like this:
> > axi_dma_0: dma@80002000 {
> > status = "okay";
> > #dma-cells = <1>;
> > compatible = "xlnx,axi-dma-1.00.a";
> > interrupt-parent = <&gic>;
> > interrupts = <0 89 4>;
> > reg = <0x0 0x80002000 0x0 0x1000>;
> > xlnx,addrwidth = <0x20>;
> > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > dma-channel@80002030 {
> > compatible = "xlnx,axi-dma-s2mm-channel";
> > dma-channels = <0x1>;
> > interrupts = <0 89 4>;
> > xlnx,datawidth = <0x20>;
> > xlnx,device-id = <0x0>;
> > };
> > };
> >
> > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> >
> > The second issue goes a little further back to commit e81274cd6b526
> > 'dmaengine: add support to dynamic register/unregister of channels'.
> > After this commit even just removing the module 'rmmod xilinx_dma',
> > without ever using it, results in a kernel oops like this:
> > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > defined so it is not safe to unbind this driver while in use
> > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > Probed!!
> > [ 42.100660] Unable to handle kernel paging request at virtual
> > address dead000000000108
> > [ 42.108598] Mem abort info:
> > [ 42.111393] ESR = 0x96000044
> > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 42.119744] SET = 0, FnV = 0
> > [ 42.122794] EA = 0, S1PTW = 0
> > [ 42.125918] Data abort info:
> > [ 42.128789] ISV = 0, ISS = 0x00000044
> > [ 42.132617] CM = 0, WnR = 1
> > [ 42.135577] [dead000000000108] address between user and kernel
> > address ranges
> > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > uio_pdrv_genirq
> > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > [ 42.185636] sp : ffffffc01112bca0
> > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > [ 42.268488] Call trace:
> > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > [ 42.281446] platform_drv_remove+0x24/0x38
> > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > [ 42.290659] driver_detach+0x64/0xd8
> > [ 42.294226] bus_remove_driver+0x58/0xb8
> > [ 42.298133] driver_unregister+0x30/0x60
> > [ 42.302048] platform_driver_unregister+0x14/0x20
> > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > [ 42.321959] do_el0_svc+0x70/0x90
> > [ 42.325267] el0_svc+0x14/0x20
> > [ 42.328313] el0_sync_handler+0x90/0xb8
> > [ 42.332141] el0_sync+0x158/0x180
> > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> >
> > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version of
> > xilinx_dma.c and never remove the module then it is working with the
> > 5.10.1 kernel.
>
> Ok, we will analyze this issue and report back the findings.

+ Dave

I was able to reproduce this crash on the unloading xilinx_dma module.
This is introduced due to the e81274cd6b526 mainline commit added
in the 5.6 kernel version. The crash is coming from -

xilinx_dma_chan_remove+0x74/0xa0:
__list_del at ./include/linux/list.h:112
(inlined by) __list_del_entry at./include/linux/list.h:135
(inlined by) list_del at ./include/linux/list.h:146
(inlined by) xilinx_dma_chan_remove at drivers/dma/xilinx/xilinx_dma.c:2546

Looking into e81274cd6b526 commit - It deletes channel device_node entry.
Same channel device_node entry is also deleted in xilinx_dma_chan_remove
as a result we see this crash.

@@ -993,12 +1007,22 @@ static
void __dma_async_device_channel_unregister(struct dma_device *device,
"%s called while %d clients hold a reference\n",
__func__, chan->client_count);
mutex_lock(&dma_list_mutex);
+ list_del(&chan->device_node);
+ device->chancnt--;

I will let Dave comment on the background of this change. In
dmaengine driver - we are adding channel device node entry
so deleting it in the exit path looks fine to me. Also, I checked
other dma drivers are also deleting channel device_node entry
in .remove so it likely a common problem.

>
> >
> > Hopefully, this will be clear to someone how these issues can be
> > resolved. In general we've been very happy using the xilinx dma.
> >
> > I'm not subscribed to the linux-kernel ML so if you need any further
> > info or testing just keep me in the to: list.
> >
> > thanks,
> > Paul

2021-01-08 16:01:28

by Paul Thomas

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

Hi All,

On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]> wrote:
>
> > -----Original Message-----
> > From: Radhey Shyam Pandey
> > Sent: Monday, January 4, 2021 10:50 AM
> > To: Paul Thomas <[email protected]>; Dan Williams
> > <[email protected]>; Vinod Koul <[email protected]>; Michal Simek
> > <[email protected]>; Matthew Murrian <[email protected]>;
> > Romain Perier <[email protected]>; Krzysztof Kozlowski
> > <[email protected]>; Marc Ferland <[email protected]>; Sebastian von Ohr
> > <[email protected]>; [email protected]; Linux ARM <linux-
> > [email protected]>; linux-kernel <linux-
> > [email protected]>; Shravya Kumbham <[email protected]>; git
> > <[email protected]>
> > Subject: RE: dmaengine : xilinx_dma two issues
> >
> > > -----Original Message-----
> > > From: Paul Thomas <[email protected]>
> > > Sent: Monday, December 28, 2020 10:14 AM
> > > To: Dan Williams <[email protected]>; Vinod Koul
> > > <[email protected]>; Michal Simek <[email protected]>; Radhey Shyam
> > > Pandey <[email protected]>; Matthew Murrian
> > > <[email protected]>; Romain Perier
> > <[email protected]>;
> > > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > > [email protected]; Linux ARM <linux-
> > > [email protected]>; linux-kernel <linux-
> > > [email protected]>
> > > Subject: dmaengine : xilinx_dma two issues
> > >
> > > Hello,
> > >
> > > I'm trying to get the 5.10 kernel up and running for our system, and
> > > I'm running into a couple of issues with xilinx_dma.
> > + (Xilinx mailing list)
> >
> > Thanks for bringing the issues to our notice. Replies inline.
> >
> > >
> > > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > > probe fix node order dependency' breaks our usage. Before this commit a
> > call to:
> > > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > > after that commit it returns -19. The reason for this seems to be that
> > > the only channel that is setup is channel 1 (chan->id is 1 in
> > xilinx_dma_chan_probe()).
> > > However in
> > > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > > dma_spec-
> > > >args[0];), which causes the:
> > > !xdev->chan[chan_id]
> > > test to fail in of_dma_xilinx_xlate()
> >
> > What is the channel number passed in
> > dmaclient DT?
Is this a question for me?

>
> Any update on this issue?

>
> >
> > dmas = <& axi_dma_0 1>
> > dma-names = "axi_dma_0"
> >
> > >
> > > Our device-tree entry looks like this:
> > > axi_dma_0: dma@80002000 {
> > > status = "okay";
> > > #dma-cells = <1>;
> > > compatible = "xlnx,axi-dma-1.00.a";
> > > interrupt-parent = <&gic>;
> > > interrupts = <0 89 4>;
> > > reg = <0x0 0x80002000 0x0 0x1000>;
> > > xlnx,addrwidth = <0x20>;
> > > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > > dma-channel@80002030 {
> > > compatible = "xlnx,axi-dma-s2mm-channel";
> > > dma-channels = <0x1>;
> > > interrupts = <0 89 4>;
> > > xlnx,datawidth = <0x20>;
> > > xlnx,device-id = <0x0>;
> > > };
> > > };
> > >
> > > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> > >
> > > The second issue goes a little further back to commit e81274cd6b526
> > > 'dmaengine: add support to dynamic register/unregister of channels'.
> > > After this commit even just removing the module 'rmmod xilinx_dma',
> > > without ever using it, results in a kernel oops like this:
> > > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > > defined so it is not safe to unbind this driver while in use
> > > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > > Probed!!
> > > [ 42.100660] Unable to handle kernel paging request at virtual
> > > address dead000000000108
> > > [ 42.108598] Mem abort info:
> > > [ 42.111393] ESR = 0x96000044
> > > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > > [ 42.119744] SET = 0, FnV = 0
> > > [ 42.122794] EA = 0, S1PTW = 0
> > > [ 42.125918] Data abort info:
> > > [ 42.128789] ISV = 0, ISS = 0x00000044
> > > [ 42.132617] CM = 0, WnR = 1
> > > [ 42.135577] [dead000000000108] address between user and kernel
> > > address ranges
> > > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > > uio_pdrv_genirq
> > > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > > [ 42.185636] sp : ffffffc01112bca0
> > > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > > [ 42.268488] Call trace:
> > > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > > [ 42.281446] platform_drv_remove+0x24/0x38
> > > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > > [ 42.290659] driver_detach+0x64/0xd8
> > > [ 42.294226] bus_remove_driver+0x58/0xb8
> > > [ 42.298133] driver_unregister+0x30/0x60
> > > [ 42.302048] platform_driver_unregister+0x14/0x20
> > > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > > [ 42.321959] do_el0_svc+0x70/0x90
> > > [ 42.325267] el0_svc+0x14/0x20
> > > [ 42.328313] el0_sync_handler+0x90/0xb8
> > > [ 42.332141] el0_sync+0x158/0x180
> > > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> > >
> > > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version of
> > > xilinx_dma.c and never remove the module then it is working with the
> > > 5.10.1 kernel.
> >
> > Ok, we will analyze this issue and report back the findings.
>
> + Dave
>
> I was able to reproduce this crash on the unloading xilinx_dma module.
> This is introduced due to the e81274cd6b526 mainline commit added
> in the 5.6 kernel version. The crash is coming from -
>
> xilinx_dma_chan_remove+0x74/0xa0:
> __list_del at ./include/linux/list.h:112
> (inlined by) __list_del_entry at./include/linux/list.h:135
> (inlined by) list_del at ./include/linux/list.h:146
> (inlined by) xilinx_dma_chan_remove at drivers/dma/xilinx/xilinx_dma.c:2546
>
> Looking into e81274cd6b526 commit - It deletes channel device_node entry.
> Same channel device_node entry is also deleted in xilinx_dma_chan_remove
> as a result we see this crash.
>
> @@ -993,12 +1007,22 @@ static
> void __dma_async_device_channel_unregister(struct dma_device *device,
> "%s called while %d clients hold a reference\n",
> __func__, chan->client_count);
> mutex_lock(&dma_list_mutex);
> + list_del(&chan->device_node);
> + device->chancnt--;
>
> I will let Dave comment on the background of this change. In
> dmaengine driver - we are adding channel device node entry
> so deleting it in the exit path looks fine to me. Also, I checked
> other dma drivers are also deleting channel device_node entry
> in .remove so it likely a common problem.
>
> >
> > >
> > > Hopefully, this will be clear to someone how these issues can be
> > > resolved. In general we've been very happy using the xilinx dma.
> > >
> > > I'm not subscribed to the linux-kernel ML so if you need any further
> > > info or testing just keep me in the to: list.
> > >
> > > thanks,
> > > Paul

Thanks for looking at this! Let me know if anyone needs more info.

-Paul

2021-01-08 18:40:42

by Radhey Shyam Pandey

[permalink] [raw]
Subject: RE: dmaengine : xilinx_dma two issues

> -----Original Message-----
> From: Paul Thomas <[email protected]>
> Sent: Friday, January 8, 2021 9:27 PM
> To: Radhey Shyam Pandey <[email protected]>
> Cc: Dan Williams <[email protected]>; Vinod Koul
> <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
> <[email protected]>; Romain Perier
> <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
> Ferland <[email protected]>; Sebastian von Ohr
> <[email protected]>; [email protected]; Linux ARM <linux-
> [email protected]>; linux-kernel <linux-
> [email protected]>; [email protected]; Shravya Kumbham
> <[email protected]>; git <[email protected]>
> Subject: Re: dmaengine : xilinx_dma two issues
>
> Hi All,
>
> On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
> wrote:
> >
> > > -----Original Message-----
> > > From: Radhey Shyam Pandey
> > > Sent: Monday, January 4, 2021 10:50 AM
> > > To: Paul Thomas <[email protected]>; Dan Williams
> > > <[email protected]>; Vinod Koul <[email protected]>; Michal
> > > Simek <[email protected]>; Matthew Murrian
> > > <[email protected]>; Romain Perier
> > > <[email protected]>; Krzysztof Kozlowski <[email protected]>;
> > > Marc Ferland <[email protected]>; Sebastian von Ohr
> > > <[email protected]>; [email protected]; Linux ARM <linux-
> > > [email protected]>; linux-kernel <linux-
> > > [email protected]>; Shravya Kumbham <[email protected]>; git
> > > <[email protected]>
> > > Subject: RE: dmaengine : xilinx_dma two issues
> > >
> > > > -----Original Message-----
> > > > From: Paul Thomas <[email protected]>
> > > > Sent: Monday, December 28, 2020 10:14 AM
> > > > To: Dan Williams <[email protected]>; Vinod Koul
> > > > <[email protected]>; Michal Simek <[email protected]>; Radhey
> > > > Shyam Pandey <[email protected]>; Matthew Murrian
> > > > <[email protected]>; Romain Perier
> > > <[email protected]>;
> > > > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > > > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > > > [email protected]; Linux ARM <linux-
> > > > [email protected]>; linux-kernel <linux-
> > > > [email protected]>
> > > > Subject: dmaengine : xilinx_dma two issues
> > > >
> > > > Hello,
> > > >
> > > > I'm trying to get the 5.10 kernel up and running for our system,
> > > > and I'm running into a couple of issues with xilinx_dma.
> > > + (Xilinx mailing list)
> > >
> > > Thanks for bringing the issues to our notice. Replies inline.
> > >
> > > >
> > > > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > > > probe fix node order dependency' breaks our usage. Before this
> > > > commit a
> > > call to:
> > > > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > > > after that commit it returns -19. The reason for this seems to be
> > > > that the only channel that is setup is channel 1 (chan->id is 1 in
> > > xilinx_dma_chan_probe()).
> > > > However in
> > > > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > > > dma_spec-
> > > > >args[0];), which causes the:
> > > > !xdev->chan[chan_id]
> > > > test to fail in of_dma_xilinx_xlate()
> > >
> > > What is the channel number passed in dmaclient DT?
> Is this a question for me?

Yes, please also share the dmaclient DT client node. Need to see
channel number passed to dmas property. Something like below-

dmas = <& axi_dma_0 1>
dma-names = "axi_dma_0"

>
> >
> > Any update on this issue?
>
> >
> > >
> > > dmas = <& axi_dma_0 1>
> > > dma-names = "axi_dma_0"
> > >
> > > >
> > > > Our device-tree entry looks like this:
> > > > axi_dma_0: dma@80002000 {
> > > > status = "okay";
> > > > #dma-cells = <1>;
> > > > compatible = "xlnx,axi-dma-1.00.a";
> > > > interrupt-parent = <&gic>;
> > > > interrupts = <0 89 4>;
> > > > reg = <0x0 0x80002000 0x0 0x1000>;
> > > > xlnx,addrwidth = <0x20>;
> > > > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > > > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > > > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > > > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > > > dma-channel@80002030 {
> > > > compatible = "xlnx,axi-dma-s2mm-channel";
> > > > dma-channels = <0x1>;
> > > > interrupts = <0 89 4>;
> > > > xlnx,datawidth = <0x20>;
> > > > xlnx,device-id = <0x0>;
> > > > };
> > > > };
> > > >
> > > > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> > > >
> > > > The second issue goes a little further back to commit
> > > > e81274cd6b526
> > > > 'dmaengine: add support to dynamic register/unregister of channels'.
> > > > After this commit even just removing the module 'rmmod
> > > > xilinx_dma', without ever using it, results in a kernel oops like this:
> > > > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > > > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > > > defined so it is not safe to unbind this driver while in use
> > > > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > > > Probed!!
> > > > [ 42.100660] Unable to handle kernel paging request at virtual
> > > > address dead000000000108
> > > > [ 42.108598] Mem abort info:
> > > > [ 42.111393] ESR = 0x96000044
> > > > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [ 42.119744] SET = 0, FnV = 0
> > > > [ 42.122794] EA = 0, S1PTW = 0
> > > > [ 42.125918] Data abort info:
> > > > [ 42.128789] ISV = 0, ISS = 0x00000044
> > > > [ 42.132617] CM = 0, WnR = 1
> > > > [ 42.135577] [dead000000000108] address between user and kernel
> > > > address ranges
> > > > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > > > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > > > uio_pdrv_genirq
> > > > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > > > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > > > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > > > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > > > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > > > [ 42.185636] sp : ffffffc01112bca0
> > > > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > > > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > > > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > > > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > > > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > > > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > > > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > > > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > > > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > > > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > > > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > > > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > > > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > > > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > > > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > > > [ 42.268488] Call trace:
> > > > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > > > [ 42.281446] platform_drv_remove+0x24/0x38
> > > > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > > > [ 42.290659] driver_detach+0x64/0xd8
> > > > [ 42.294226] bus_remove_driver+0x58/0xb8
> > > > [ 42.298133] driver_unregister+0x30/0x60
> > > > [ 42.302048] platform_driver_unregister+0x14/0x20
> > > > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > > > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > > > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > > > [ 42.321959] do_el0_svc+0x70/0x90
> > > > [ 42.325267] el0_svc+0x14/0x20
> > > > [ 42.328313] el0_sync_handler+0x90/0xb8
> > > > [ 42.332141] el0_sync+0x158/0x180
> > > > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > > > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> > > >
> > > > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version
> > > > of xilinx_dma.c and never remove the module then it is working
> > > > with the
> > > > 5.10.1 kernel.
> > >
> > > Ok, we will analyze this issue and report back the findings.
> >
> > + Dave
> >
> > I was able to reproduce this crash on the unloading xilinx_dma module.
> > This is introduced due to the e81274cd6b526 mainline commit added in
> > the 5.6 kernel version. The crash is coming from -
> >
> > xilinx_dma_chan_remove+0x74/0xa0:
> > __list_del at ./include/linux/list.h:112 (inlined by) __list_del_entry
> > at./include/linux/list.h:135 (inlined by) list_del at
> > ./include/linux/list.h:146 (inlined by) xilinx_dma_chan_remove at
> > drivers/dma/xilinx/xilinx_dma.c:2546
> >
> > Looking into e81274cd6b526 commit - It deletes channel device_node
> entry.
> > Same channel device_node entry is also deleted in
> > xilinx_dma_chan_remove as a result we see this crash.
> >
> > @@ -993,12 +1007,22 @@ static
> > void __dma_async_device_channel_unregister(struct dma_device *device,
> > "%s called while %d clients hold a reference\n",
> > __func__, chan->client_count);
> > mutex_lock(&dma_list_mutex);
> > + list_del(&chan->device_node);
> > + device->chancnt--;
> >
> > I will let Dave comment on the background of this change. In
> > dmaengine driver - we are adding channel device node entry so deleting
> > it in the exit path looks fine to me. Also, I checked other dma
> > drivers are also deleting channel device_node entry in .remove so it
> > likely a common problem.
> >
> > >
> > > >
> > > > Hopefully, this will be clear to someone how these issues can be
> > > > resolved. In general we've been very happy using the xilinx dma.
> > > >
> > > > I'm not subscribed to the linux-kernel ML so if you need any
> > > > further info or testing just keep me in the to: list.
> > > >
> > > > thanks,
> > > > Paul
>
> Thanks for looking at this! Let me know if anyone needs more info.
>
> -Paul

2021-01-10 15:18:20

by Paul Thomas

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

On Fri, Jan 8, 2021 at 1:36 PM Radhey Shyam Pandey <[email protected]> wrote:
>
> > -----Original Message-----
> > From: Paul Thomas <[email protected]>
> > Sent: Friday, January 8, 2021 9:27 PM
> > To: Radhey Shyam Pandey <[email protected]>
> > Cc: Dan Williams <[email protected]>; Vinod Koul
> > <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
> > <[email protected]>; Romain Perier
> > <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
> > Ferland <[email protected]>; Sebastian von Ohr
> > <[email protected]>; [email protected]; Linux ARM <linux-
> > [email protected]>; linux-kernel <linux-
> > [email protected]>; [email protected]; Shravya Kumbham
> > <[email protected]>; git <[email protected]>
> > Subject: Re: dmaengine : xilinx_dma two issues
> >
> > Hi All,
> >
> > On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
> > wrote:
> > >
> > > > -----Original Message-----
> > > > From: Radhey Shyam Pandey
> > > > Sent: Monday, January 4, 2021 10:50 AM
> > > > To: Paul Thomas <[email protected]>; Dan Williams
> > > > <[email protected]>; Vinod Koul <[email protected]>; Michal
> > > > Simek <[email protected]>; Matthew Murrian
> > > > <[email protected]>; Romain Perier
> > > > <[email protected]>; Krzysztof Kozlowski <[email protected]>;
> > > > Marc Ferland <[email protected]>; Sebastian von Ohr
> > > > <[email protected]>; [email protected]; Linux ARM <linux-
> > > > [email protected]>; linux-kernel <linux-
> > > > [email protected]>; Shravya Kumbham <[email protected]>; git
> > > > <[email protected]>
> > > > Subject: RE: dmaengine : xilinx_dma two issues
> > > >
> > > > > -----Original Message-----
> > > > > From: Paul Thomas <[email protected]>
> > > > > Sent: Monday, December 28, 2020 10:14 AM
> > > > > To: Dan Williams <[email protected]>; Vinod Koul
> > > > > <[email protected]>; Michal Simek <[email protected]>; Radhey
> > > > > Shyam Pandey <[email protected]>; Matthew Murrian
> > > > > <[email protected]>; Romain Perier
> > > > <[email protected]>;
> > > > > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > > > > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > > > > [email protected]; Linux ARM <linux-
> > > > > [email protected]>; linux-kernel <linux-
> > > > > [email protected]>
> > > > > Subject: dmaengine : xilinx_dma two issues
> > > > >
> > > > > Hello,
> > > > >
> > > > > I'm trying to get the 5.10 kernel up and running for our system,
> > > > > and I'm running into a couple of issues with xilinx_dma.
> > > > + (Xilinx mailing list)
> > > >
> > > > Thanks for bringing the issues to our notice. Replies inline.
> > > >
> > > > >
> > > > > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > > > > probe fix node order dependency' breaks our usage. Before this
> > > > > commit a
> > > > call to:
> > > > > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > > > > after that commit it returns -19. The reason for this seems to be
> > > > > that the only channel that is setup is channel 1 (chan->id is 1 in
> > > > xilinx_dma_chan_probe()).
> > > > > However in
> > > > > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > > > > dma_spec-
> > > > > >args[0];), which causes the:
> > > > > !xdev->chan[chan_id]
> > > > > test to fail in of_dma_xilinx_xlate()
> > > >
> > > > What is the channel number passed in dmaclient DT?
> > Is this a question for me?
>
> Yes, please also share the dmaclient DT client node. Need to see
> channel number passed to dmas property. Something like below-
>
> dmas = <& axi_dma_0 1>
> dma-names = "axi_dma_0"
OK, I think I need to revisit this and clean it up some. Currently In
the driver (a custom iio adc driver) it is hard coded:
dma_request_chan(&indio_dev->dev, "axi_dma_0");

However, the DT also has the entries (currently unused by the driver):
dmas = <&axi_dma_0 0>;
dma-names = "axi_dma_0";

I'll go back and clean up our driver to do something like adi-axi-adc.c does:

if (!device_property_present(dev, "dmas"))
return 0;

if (device_property_read_string(dev, "dma-names", &dma_name))
dma_name = "axi_dma_0";

Should the dmas node get used by the driver? I see the second argument
is: '0' for write/tx and '1' for read/rx channel. So I should be
setting this to 1 like this?
dmas = <&axi_dma_0 1>;
dma-names = "axi_dma_0";

But where does that field get used?

thanks,
Paul

>
> >
> > >
> > > Any update on this issue?
> >
> > >
> > > >
> > > > dmas = <& axi_dma_0 1>
> > > > dma-names = "axi_dma_0"
> > > >
> > > > >
> > > > > Our device-tree entry looks like this:
> > > > > axi_dma_0: dma@80002000 {
> > > > > status = "okay";
> > > > > #dma-cells = <1>;
> > > > > compatible = "xlnx,axi-dma-1.00.a";
> > > > > interrupt-parent = <&gic>;
> > > > > interrupts = <0 89 4>;
> > > > > reg = <0x0 0x80002000 0x0 0x1000>;
> > > > > xlnx,addrwidth = <0x20>;
> > > > > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > > > > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > > > > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > > > > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > > > > dma-channel@80002030 {
> > > > > compatible = "xlnx,axi-dma-s2mm-channel";
> > > > > dma-channels = <0x1>;
> > > > > interrupts = <0 89 4>;
> > > > > xlnx,datawidth = <0x20>;
> > > > > xlnx,device-id = <0x0>;
> > > > > };
> > > > > };
> > > > >
> > > > > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> > > > >
> > > > > The second issue goes a little further back to commit
> > > > > e81274cd6b526
> > > > > 'dmaengine: add support to dynamic register/unregister of channels'.
> > > > > After this commit even just removing the module 'rmmod
> > > > > xilinx_dma', without ever using it, results in a kernel oops like this:
> > > > > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > > > > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > > > > defined so it is not safe to unbind this driver while in use
> > > > > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > > > > Probed!!
> > > > > [ 42.100660] Unable to handle kernel paging request at virtual
> > > > > address dead000000000108
> > > > > [ 42.108598] Mem abort info:
> > > > > [ 42.111393] ESR = 0x96000044
> > > > > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > [ 42.119744] SET = 0, FnV = 0
> > > > > [ 42.122794] EA = 0, S1PTW = 0
> > > > > [ 42.125918] Data abort info:
> > > > > [ 42.128789] ISV = 0, ISS = 0x00000044
> > > > > [ 42.132617] CM = 0, WnR = 1
> > > > > [ 42.135577] [dead000000000108] address between user and kernel
> > > > > address ranges
> > > > > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > > > > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > > > > uio_pdrv_genirq
> > > > > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > > > > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > > > > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > > > > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > > > > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > > > > [ 42.185636] sp : ffffffc01112bca0
> > > > > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > > > > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > > > > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > > > > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > > > > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > > > > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > > > > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > > > > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > > > > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > > > > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > > > > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > > > > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > > > > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > > > > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > > > > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > > > > [ 42.268488] Call trace:
> > > > > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > > > > [ 42.281446] platform_drv_remove+0x24/0x38
> > > > > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > > > > [ 42.290659] driver_detach+0x64/0xd8
> > > > > [ 42.294226] bus_remove_driver+0x58/0xb8
> > > > > [ 42.298133] driver_unregister+0x30/0x60
> > > > > [ 42.302048] platform_driver_unregister+0x14/0x20
> > > > > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > > > > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > > > > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > > > > [ 42.321959] do_el0_svc+0x70/0x90
> > > > > [ 42.325267] el0_svc+0x14/0x20
> > > > > [ 42.328313] el0_sync_handler+0x90/0xb8
> > > > > [ 42.332141] el0_sync+0x158/0x180
> > > > > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > > > > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> > > > >
> > > > > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version
> > > > > of xilinx_dma.c and never remove the module then it is working
> > > > > with the
> > > > > 5.10.1 kernel.
> > > >
> > > > Ok, we will analyze this issue and report back the findings.
> > >
> > > + Dave
> > >
> > > I was able to reproduce this crash on the unloading xilinx_dma module.
> > > This is introduced due to the e81274cd6b526 mainline commit added in
> > > the 5.6 kernel version. The crash is coming from -
> > >
> > > xilinx_dma_chan_remove+0x74/0xa0:
> > > __list_del at ./include/linux/list.h:112 (inlined by) __list_del_entry
> > > at./include/linux/list.h:135 (inlined by) list_del at
> > > ./include/linux/list.h:146 (inlined by) xilinx_dma_chan_remove at
> > > drivers/dma/xilinx/xilinx_dma.c:2546
> > >
> > > Looking into e81274cd6b526 commit - It deletes channel device_node
> > entry.
> > > Same channel device_node entry is also deleted in
> > > xilinx_dma_chan_remove as a result we see this crash.
> > >
> > > @@ -993,12 +1007,22 @@ static
> > > void __dma_async_device_channel_unregister(struct dma_device *device,
> > > "%s called while %d clients hold a reference\n",
> > > __func__, chan->client_count);
> > > mutex_lock(&dma_list_mutex);
> > > + list_del(&chan->device_node);
> > > + device->chancnt--;
> > >
> > > I will let Dave comment on the background of this change. In
> > > dmaengine driver - we are adding channel device node entry so deleting
> > > it in the exit path looks fine to me. Also, I checked other dma
> > > drivers are also deleting channel device_node entry in .remove so it
> > > likely a common problem.
> > >
> > > >
> > > > >
> > > > > Hopefully, this will be clear to someone how these issues can be
> > > > > resolved. In general we've been very happy using the xilinx dma.
> > > > >
> > > > > I'm not subscribed to the linux-kernel ML so if you need any
> > > > > further info or testing just keep me in the to: list.
> > > > >
> > > > > thanks,
> > > > > Paul
> >
> > Thanks for looking at this! Let me know if anyone needs more info.
> >
> > -Paul

2021-01-10 15:45:48

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

On 1/10/21 4:16 PM, Paul Thomas wrote:
> On Fri, Jan 8, 2021 at 1:36 PM Radhey Shyam Pandey <[email protected]> wrote:
>>> -----Original Message-----
>>> From: Paul Thomas <[email protected]>
>>> Sent: Friday, January 8, 2021 9:27 PM
>>> To: Radhey Shyam Pandey <[email protected]>
>>> Cc: Dan Williams <[email protected]>; Vinod Koul
>>> <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
>>> <[email protected]>; Romain Perier
>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
>>> Ferland <[email protected]>; Sebastian von Ohr
>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>> [email protected]>; linux-kernel <linux-
>>> [email protected]>; [email protected]; Shravya Kumbham
>>> <[email protected]>; git <[email protected]>
>>> Subject: Re: dmaengine : xilinx_dma two issues
>>>
>>> Hi All,
>>>
>>> On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
>>> wrote:
>>>>> -----Original Message-----
>>>>> From: Radhey Shyam Pandey
>>>>> Sent: Monday, January 4, 2021 10:50 AM
>>>>> To: Paul Thomas <[email protected]>; Dan Williams
>>>>> <[email protected]>; Vinod Koul <[email protected]>; Michal
>>>>> Simek <[email protected]>; Matthew Murrian
>>>>> <[email protected]>; Romain Perier
>>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>;
>>>>> Marc Ferland <[email protected]>; Sebastian von Ohr
>>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>>> [email protected]>; linux-kernel <linux-
>>>>> [email protected]>; Shravya Kumbham <[email protected]>; git
>>>>> <[email protected]>
>>>>> Subject: RE: dmaengine : xilinx_dma two issues
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Paul Thomas <[email protected]>
>>>>>> Sent: Monday, December 28, 2020 10:14 AM
>>>>>> To: Dan Williams <[email protected]>; Vinod Koul
>>>>>> <[email protected]>; Michal Simek <[email protected]>; Radhey
>>>>>> Shyam Pandey <[email protected]>; Matthew Murrian
>>>>>> <[email protected]>; Romain Perier
>>>>> <[email protected]>;
>>>>>> Krzysztof Kozlowski <[email protected]>; Marc Ferland
>>>>>> <[email protected]>; Sebastian von Ohr <[email protected]>;
>>>>>> [email protected]; Linux ARM <linux-
>>>>>> [email protected]>; linux-kernel <linux-
>>>>>> [email protected]>
>>>>>> Subject: dmaengine : xilinx_dma two issues
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to get the 5.10 kernel up and running for our system,
>>>>>> and I'm running into a couple of issues with xilinx_dma.
>>>>> + (Xilinx mailing list)
>>>>>
>>>>> Thanks for bringing the issues to our notice. Replies inline.
>>>>>
>>>>>> First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
>>>>>> probe fix node order dependency' breaks our usage. Before this
>>>>>> commit a
>>>>> call to:
>>>>>> dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
>>>>>> after that commit it returns -19. The reason for this seems to be
>>>>>> that the only channel that is setup is channel 1 (chan->id is 1 in
>>>>> xilinx_dma_chan_probe()).
>>>>>> However in
>>>>>> of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
>>>>>> dma_spec-
>>>>>>> args[0];), which causes the:
>>>>>> !xdev->chan[chan_id]
>>>>>> test to fail in of_dma_xilinx_xlate()
>>>>> What is the channel number passed in dmaclient DT?
>>> Is this a question for me?
>> Yes, please also share the dmaclient DT client node. Need to see
>> channel number passed to dmas property. Something like below-
>>
>> dmas = <& axi_dma_0 1>
>> dma-names = "axi_dma_0"
> OK, I think I need to revisit this and clean it up some. Currently In
> the driver (a custom iio adc driver) it is hard coded:
> dma_request_chan(&indio_dev->dev, "axi_dma_0");
>
> However, the DT also has the entries (currently unused by the driver):
> dmas = <&axi_dma_0 0>;
> dma-names = "axi_dma_0";
>
> I'll go back and clean up our driver to do something like adi-axi-adc.c does:
>
> if (!device_property_present(dev, "dmas"))
> return 0;
>
> if (device_property_read_string(dev, "dma-names", &dma_name))
> dma_name = "axi_dma_0";
>
> Should the dmas node get used by the driver? I see the second argument
> is: '0' for write/tx and '1' for read/rx channel. So I should be
> setting this to 1 like this?
> dmas = <&axi_dma_0 1>;
> dma-names = "axi_dma_0";
>
> But where does that field get used?

This got broken in "dmaengine: xilinx_dma: In dma channel probe fix node
order dependency"
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14ccf0aab46e1888e2f45b6e995c621c70b32651>.
Before if there was only one channel that channel was always at index 0.
Regardless of whether the channel was RX or TX. But after that change
the RX channel is always at offset 1, regardless of whether the DMA has
one or two channels. This is a breakage in ABI.

If you have the choice I'd recommend to not use the Xilinx DMA, it gets
broken pretty much every other release.

- Lars



2021-01-11 13:04:03

by Michal Simek

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

Hi Lars,

On 10. 01. 21 16:43, Lars-Peter Clausen wrote:
> On 1/10/21 4:16 PM, Paul Thomas wrote:
>> On Fri, Jan 8, 2021 at 1:36 PM Radhey Shyam Pandey
>> <[email protected]> wrote:
>>>> -----Original Message-----
>>>> From: Paul Thomas <[email protected]>
>>>> Sent: Friday, January 8, 2021 9:27 PM
>>>> To: Radhey Shyam Pandey <[email protected]>
>>>> Cc: Dan Williams <[email protected]>; Vinod Koul
>>>> <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
>>>> <[email protected]>; Romain Perier
>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
>>>> Ferland <[email protected]>; Sebastian von Ohr
>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>> [email protected]>; linux-kernel <linux-
>>>> [email protected]>; [email protected]; Shravya Kumbham
>>>> <[email protected]>; git <[email protected]>
>>>> Subject: Re: dmaengine : xilinx_dma two issues
>>>>
>>>> Hi All,
>>>>
>>>> On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
>>>> wrote:
>>>>>> -----Original Message-----
>>>>>> From: Radhey Shyam Pandey
>>>>>> Sent: Monday, January 4, 2021 10:50 AM
>>>>>> To: Paul Thomas <[email protected]>; Dan Williams
>>>>>> <[email protected]>; Vinod Koul <[email protected]>; Michal
>>>>>> Simek <[email protected]>; Matthew Murrian
>>>>>> <[email protected]>; Romain Perier
>>>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>;
>>>>>> Marc Ferland <[email protected]>; Sebastian von Ohr
>>>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>>>> [email protected]>; linux-kernel <linux-
>>>>>> [email protected]>; Shravya Kumbham <[email protected]>; git
>>>>>> <[email protected]>
>>>>>> Subject: RE: dmaengine : xilinx_dma two issues
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Paul Thomas <[email protected]>
>>>>>>> Sent: Monday, December 28, 2020 10:14 AM
>>>>>>> To: Dan Williams <[email protected]>; Vinod Koul
>>>>>>> <[email protected]>; Michal Simek <[email protected]>; Radhey
>>>>>>> Shyam Pandey <[email protected]>; Matthew Murrian
>>>>>>> <[email protected]>; Romain Perier
>>>>>> <[email protected]>;
>>>>>>> Krzysztof Kozlowski <[email protected]>; Marc Ferland
>>>>>>> <[email protected]>; Sebastian von Ohr <[email protected]>;
>>>>>>> [email protected]; Linux ARM <linux-
>>>>>>> [email protected]>; linux-kernel <linux-
>>>>>>> [email protected]>
>>>>>>> Subject: dmaengine : xilinx_dma two issues
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm trying to get the 5.10 kernel up and running for our system,
>>>>>>> and I'm running into a couple of issues with xilinx_dma.
>>>>>> + (Xilinx mailing list)
>>>>>>
>>>>>> Thanks for bringing the issues to our notice. Replies inline.
>>>>>>
>>>>>>> First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
>>>>>>> probe fix node order dependency' breaks our usage. Before this
>>>>>>> commit a
>>>>>> call to:
>>>>>>> dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
>>>>>>> after that commit it returns -19. The reason for this seems to be
>>>>>>> that the only channel that is setup is channel 1 (chan->id is 1 in
>>>>>> xilinx_dma_chan_probe()).
>>>>>>> However in
>>>>>>> of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
>>>>>>> dma_spec-
>>>>>>>> args[0];), which causes the:
>>>>>>> !xdev->chan[chan_id]
>>>>>>> test to fail in of_dma_xilinx_xlate()
>>>>>> What is the channel number passed in dmaclient DT?
>>>> Is this a question for me?
>>> Yes, please also share the dmaclient DT client node. Need to see
>>> channel number passed to dmas property. Something like below-
>>>
>>> dmas = <& axi_dma_0 1>
>>> dma-names = "axi_dma_0"
>> OK, I think I need to revisit this and clean it up some. Currently In
>> the driver (a custom iio adc driver) it is hard coded:
>> dma_request_chan(&indio_dev->dev, "axi_dma_0");
>>
>> However, the DT also has the entries (currently unused by the driver):
>>          dmas = <&axi_dma_0 0>;
>>          dma-names = "axi_dma_0";
>>
>> I'll go back and clean up our driver to do something like
>> adi-axi-adc.c does:
>>
>>          if (!device_property_present(dev, "dmas"))
>>                  return 0;
>>
>>          if (device_property_read_string(dev, "dma-names", &dma_name))
>>                  dma_name = "axi_dma_0";
>>
>> Should the dmas node get used by the driver? I see the second argument
>> is: '0' for write/tx and '1' for read/rx channel. So I should be
>> setting this to 1 like this?
>>          dmas = <&axi_dma_0 1>;
>>          dma-names = "axi_dma_0";
>>
>> But where does that field get used?
>
> This got broken in "dmaengine: xilinx_dma: In dma channel probe fix node
> order dependency"
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14ccf0aab46e1888e2f45b6e995c621c70b32651>.
> Before if there was only one channel that channel was always at index 0.
> Regardless of whether the channel was RX or TX. But after that change
> the RX channel is always at offset 1, regardless of whether the DMA has
> one or two channels. This is a breakage in ABI.
>
> If you have the choice I'd recommend to not use the Xilinx DMA, it gets
> broken pretty much every other release.

I expect that you are talking about Xilinx releases and I hope that this
has changed over times when most of changes are upstreamed already. The
patch above you are referencing has been applied by Vinod and he is
checking patches a lot. If there is a problem and any breakage it needs
to be fixed. And bugs happen all the time and we have a way how to work
with it.
If you see there any issue please report them and let's fix them and
continue on this topic from technical point of view.
In connection to this problem what are you suggesting? Just revert this
patch or fix ordering differently? Would be good to provide your
suggestion and fix it.

Thanks,
Michal

2021-01-11 15:36:02

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

On 1/11/21 10:32 AM, Michal Simek wrote:
> Hi Lars,
>
> On 10. 01. 21 16:43, Lars-Peter Clausen wrote:
>> On 1/10/21 4:16 PM, Paul Thomas wrote:
>>> On Fri, Jan 8, 2021 at 1:36 PM Radhey Shyam Pandey
>>> <[email protected]> wrote:
>>>>> -----Original Message-----
>>>>> From: Paul Thomas <[email protected]>
>>>>> Sent: Friday, January 8, 2021 9:27 PM
>>>>> To: Radhey Shyam Pandey <[email protected]>
>>>>> Cc: Dan Williams <[email protected]>; Vinod Koul
>>>>> <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
>>>>> <[email protected]>; Romain Perier
>>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
>>>>> Ferland <[email protected]>; Sebastian von Ohr
>>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>>> [email protected]>; linux-kernel <linux-
>>>>> [email protected]>; [email protected]; Shravya Kumbham
>>>>> <[email protected]>; git <[email protected]>
>>>>> Subject: Re: dmaengine : xilinx_dma two issues
>>>>>
>>>>> Hi All,
>>>>>
>>>>> On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
>>>>> wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Radhey Shyam Pandey
>>>>>>> Sent: Monday, January 4, 2021 10:50 AM
>>>>>>> To: Paul Thomas <[email protected]>; Dan Williams
>>>>>>> <[email protected]>; Vinod Koul <[email protected]>; Michal
>>>>>>> Simek <[email protected]>; Matthew Murrian
>>>>>>> <[email protected]>; Romain Perier
>>>>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>;
>>>>>>> Marc Ferland <[email protected]>; Sebastian von Ohr
>>>>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>>>>> [email protected]>; linux-kernel <linux-
>>>>>>> [email protected]>; Shravya Kumbham <[email protected]>; git
>>>>>>> <[email protected]>
>>>>>>> Subject: RE: dmaengine : xilinx_dma two issues
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Paul Thomas <[email protected]>
>>>>>>>> Sent: Monday, December 28, 2020 10:14 AM
>>>>>>>> To: Dan Williams <[email protected]>; Vinod Koul
>>>>>>>> <[email protected]>; Michal Simek <[email protected]>; Radhey
>>>>>>>> Shyam Pandey <[email protected]>; Matthew Murrian
>>>>>>>> <[email protected]>; Romain Perier
>>>>>>> <[email protected]>;
>>>>>>>> Krzysztof Kozlowski <[email protected]>; Marc Ferland
>>>>>>>> <[email protected]>; Sebastian von Ohr <[email protected]>;
>>>>>>>> [email protected]; Linux ARM <linux-
>>>>>>>> [email protected]>; linux-kernel <linux-
>>>>>>>> [email protected]>
>>>>>>>> Subject: dmaengine : xilinx_dma two issues
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm trying to get the 5.10 kernel up and running for our system,
>>>>>>>> and I'm running into a couple of issues with xilinx_dma.
>>>>>>> + (Xilinx mailing list)
>>>>>>>
>>>>>>> Thanks for bringing the issues to our notice. Replies inline.
>>>>>>>
>>>>>>>> First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
>>>>>>>> probe fix node order dependency' breaks our usage. Before this
>>>>>>>> commit a
>>>>>>> call to:
>>>>>>>> dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
>>>>>>>> after that commit it returns -19. The reason for this seems to be
>>>>>>>> that the only channel that is setup is channel 1 (chan->id is 1 in
>>>>>>> xilinx_dma_chan_probe()).
>>>>>>>> However in
>>>>>>>> of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
>>>>>>>> dma_spec-
>>>>>>>>> args[0];), which causes the:
>>>>>>>> !xdev->chan[chan_id]
>>>>>>>> test to fail in of_dma_xilinx_xlate()
>>>>>>> What is the channel number passed in dmaclient DT?
>>>>> Is this a question for me?
>>>> Yes, please also share the dmaclient DT client node. Need to see
>>>> channel number passed to dmas property. Something like below-
>>>>
>>>> dmas = <& axi_dma_0 1>
>>>> dma-names = "axi_dma_0"
>>> OK, I think I need to revisit this and clean it up some. Currently In
>>> the driver (a custom iio adc driver) it is hard coded:
>>> dma_request_chan(&indio_dev->dev, "axi_dma_0");
>>>
>>> However, the DT also has the entries (currently unused by the driver):
>>>          dmas = <&axi_dma_0 0>;
>>>          dma-names = "axi_dma_0";
>>>
>>> I'll go back and clean up our driver to do something like
>>> adi-axi-adc.c does:
>>>
>>>          if (!device_property_present(dev, "dmas"))
>>>                  return 0;
>>>
>>>          if (device_property_read_string(dev, "dma-names", &dma_name))
>>>                  dma_name = "axi_dma_0";
>>>
>>> Should the dmas node get used by the driver? I see the second argument
>>> is: '0' for write/tx and '1' for read/rx channel. So I should be
>>> setting this to 1 like this?
>>>          dmas = <&axi_dma_0 1>;
>>>          dma-names = "axi_dma_0";
>>>
>>> But where does that field get used?
>> This got broken in "dmaengine: xilinx_dma: In dma channel probe fix node
>> order dependency"
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14ccf0aab46e1888e2f45b6e995c621c70b32651>.
>> Before if there was only one channel that channel was always at index 0.
>> Regardless of whether the channel was RX or TX. But after that change
>> the RX channel is always at offset 1, regardless of whether the DMA has
>> one or two channels. This is a breakage in ABI.
>>
>> If you have the choice I'd recommend to not use the Xilinx DMA, it gets
>> broken pretty much every other release.
> I expect that you are talking about Xilinx releases and I hope that this
> has changed over times when most of changes are upstreamed already. The
> patch above you are referencing has been applied by Vinod and he is
> checking patches a lot. If there is a problem and any breakage it needs
> to be fixed. And bugs happen all the time and we have a way how to work
> with it.

I don't know if it has gotten better. When I upgrade to a new release
what takes up most of the time is figuring out why the Xilinx DMA
doesn't work anymore. Its been like this for years.

> If you see there any issue please report them and let's fix them and
> continue on this topic from technical point of view.
> In connection to this problem what are you suggesting? Just revert this
> patch or fix ordering differently? Would be good to provide your
> suggestion and fix it.

Reverting would re-introduce the issue the patch was supposed to fix.

The would have been to use index 0 for the channel if there is only one
channel. If there are two channels use 0 for TX and 1 for RX.

The problem is that the change has been around for a while and restoring
the previous behavior will break users that are expecting the new
behavior. It is a bit of a catch-22.

- Lars

2021-01-11 16:11:35

by Michal Simek

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues



On 11. 01. 21 16:33, Lars-Peter Clausen wrote:
> On 1/11/21 10:32 AM, Michal Simek wrote:
>> Hi Lars,
>>
>> On 10. 01. 21 16:43, Lars-Peter Clausen wrote:
>>> On 1/10/21 4:16 PM, Paul Thomas wrote:
>>>> On Fri, Jan 8, 2021 at 1:36 PM Radhey Shyam Pandey
>>>> <[email protected]> wrote:
>>>>>> -----Original Message-----
>>>>>> From: Paul Thomas <[email protected]>
>>>>>> Sent: Friday, January 8, 2021 9:27 PM
>>>>>> To: Radhey Shyam Pandey <[email protected]>
>>>>>> Cc: Dan Williams <[email protected]>; Vinod Koul
>>>>>> <[email protected]>; Michal Simek <[email protected]>; Matthew
>>>>>> Murrian
>>>>>> <[email protected]>; Romain Perier
>>>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>;
>>>>>> Marc
>>>>>> Ferland <[email protected]>; Sebastian von Ohr
>>>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>>>> [email protected]>; linux-kernel <linux-
>>>>>> [email protected]>; [email protected]; Shravya Kumbham
>>>>>> <[email protected]>; git <[email protected]>
>>>>>> Subject: Re: dmaengine : xilinx_dma two issues
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey
>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Radhey Shyam Pandey
>>>>>>>> Sent: Monday, January 4, 2021 10:50 AM
>>>>>>>> To: Paul Thomas <[email protected]>; Dan Williams
>>>>>>>> <[email protected]>; Vinod Koul <[email protected]>; Michal
>>>>>>>> Simek <[email protected]>; Matthew Murrian
>>>>>>>> <[email protected]>; Romain Perier
>>>>>>>> <[email protected]>; Krzysztof Kozlowski <[email protected]>;
>>>>>>>> Marc Ferland <[email protected]>; Sebastian von Ohr
>>>>>>>> <[email protected]>; [email protected]; Linux ARM <linux-
>>>>>>>> [email protected]>; linux-kernel <linux-
>>>>>>>> [email protected]>; Shravya Kumbham <[email protected]>; git
>>>>>>>> <[email protected]>
>>>>>>>> Subject: RE: dmaengine : xilinx_dma two issues
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Paul Thomas <[email protected]>
>>>>>>>>> Sent: Monday, December 28, 2020 10:14 AM
>>>>>>>>> To: Dan Williams <[email protected]>; Vinod Koul
>>>>>>>>> <[email protected]>; Michal Simek <[email protected]>; Radhey
>>>>>>>>> Shyam Pandey <[email protected]>; Matthew Murrian
>>>>>>>>> <[email protected]>; Romain Perier
>>>>>>>> <[email protected]>;
>>>>>>>>> Krzysztof Kozlowski <[email protected]>; Marc Ferland
>>>>>>>>> <[email protected]>; Sebastian von Ohr <[email protected]>;
>>>>>>>>> [email protected]; Linux ARM <linux-
>>>>>>>>> [email protected]>; linux-kernel <linux-
>>>>>>>>> [email protected]>
>>>>>>>>> Subject: dmaengine : xilinx_dma two issues
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm trying to get the 5.10 kernel up and running for our system,
>>>>>>>>> and I'm running into a couple of issues with xilinx_dma.
>>>>>>>> + (Xilinx mailing list)
>>>>>>>>
>>>>>>>> Thanks for bringing the issues to our notice. Replies inline.
>>>>>>>>
>>>>>>>>> First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
>>>>>>>>> probe fix node order dependency' breaks our usage. Before this
>>>>>>>>> commit a
>>>>>>>> call to:
>>>>>>>>> dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
>>>>>>>>> after that commit it returns -19. The reason for this seems to be
>>>>>>>>> that the only channel that is setup is channel 1 (chan->id is 1 in
>>>>>>>> xilinx_dma_chan_probe()).
>>>>>>>>> However in
>>>>>>>>> of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
>>>>>>>>> dma_spec-
>>>>>>>>>> args[0];), which causes the:
>>>>>>>>> !xdev->chan[chan_id]
>>>>>>>>> test to fail in of_dma_xilinx_xlate()
>>>>>>>> What is the channel number passed in dmaclient DT?
>>>>>> Is this a question for me?
>>>>> Yes, please also share the dmaclient DT client node. Need to see
>>>>> channel number passed to dmas property. Something like below-
>>>>>
>>>>> dmas = <& axi_dma_0 1>
>>>>> dma-names = "axi_dma_0"
>>>> OK, I think I need to revisit this and clean it up some. Currently In
>>>> the driver (a custom iio adc driver) it is hard coded:
>>>> dma_request_chan(&indio_dev->dev, "axi_dma_0");
>>>>
>>>> However, the DT also has the entries (currently unused by the driver):
>>>>           dmas = <&axi_dma_0 0>;
>>>>           dma-names = "axi_dma_0";
>>>>
>>>> I'll go back and clean up our driver to do something like
>>>> adi-axi-adc.c does:
>>>>
>>>>           if (!device_property_present(dev, "dmas"))
>>>>                   return 0;
>>>>
>>>>           if (device_property_read_string(dev, "dma-names", &dma_name))
>>>>                   dma_name = "axi_dma_0";
>>>>
>>>> Should the dmas node get used by the driver? I see the second argument
>>>> is: '0' for write/tx and '1' for read/rx channel. So I should be
>>>> setting this to 1 like this?
>>>>           dmas = <&axi_dma_0 1>;
>>>>           dma-names = "axi_dma_0";
>>>>
>>>> But where does that field get used?
>>> This got broken in "dmaengine: xilinx_dma: In dma channel probe fix node
>>> order dependency"
>>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14ccf0aab46e1888e2f45b6e995c621c70b32651>.
>>>
>>> Before if there was only one channel that channel was always at index 0.
>>> Regardless of whether the channel was RX or TX. But after that change
>>> the RX channel is always at offset 1, regardless of whether the DMA has
>>> one or two channels. This is a breakage in ABI.
>>>
>>> If you have the choice I'd recommend to not use the Xilinx DMA, it gets
>>> broken pretty much every other release.
>> I expect that you are talking about Xilinx releases and I hope that this
>> has changed over times when most of changes are upstreamed already. The
>> patch above you are referencing has been applied by Vinod and he is
>> checking patches a lot. If there is a problem and any breakage it needs
>> to be fixed. And bugs happen all the time and we have a way how to work
>> with it.
>
> I don't know if it has gotten better. When I upgrade to a new release
> what takes up most of the time is figuring out why the Xilinx DMA
> doesn't work anymore. Its been like this for years.

Are you saying that upstreaming this driver doesn't improve his quality?
But I would expect when you figured this out you have sent patches to
fix it.


>> If you see there any issue please report them and let's fix them and
>> continue on this topic from technical point of view.
>> In connection to this problem what are you suggesting? Just revert this
>> patch or fix ordering differently? Would be good to provide your
>> suggestion and fix it.
>
> Reverting would re-introduce the issue the patch was supposed to fix.
>
> The would have been to use index 0 for the channel if there is only one
> channel. If there are two channels use 0 for TX and 1 for RX.
>
> The problem is that the change has been around for a while and restoring
> the previous behavior will break users that are expecting the new
> behavior. It is a bit of a catch-22.

Ok. It means we need to find a way how to fix it and don't break
existing users.
I expect there shouldn't be hard to detect if there is only one channel.
If there is just use index 0.

Thanks,
Michal


2021-01-12 07:00:29

by Paul Thomas

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

On Sun, Jan 10, 2021 at 10:16 AM Paul Thomas <[email protected]> wrote:
>
> On Fri, Jan 8, 2021 at 1:36 PM Radhey Shyam Pandey <[email protected]> wrote:
> >
> > > -----Original Message-----
> > > From: Paul Thomas <[email protected]>
> > > Sent: Friday, January 8, 2021 9:27 PM
> > > To: Radhey Shyam Pandey <[email protected]>
> > > Cc: Dan Williams <[email protected]>; Vinod Koul
> > > <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
> > > <[email protected]>; Romain Perier
> > > <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
> > > Ferland <[email protected]>; Sebastian von Ohr
> > > <[email protected]>; [email protected]; Linux ARM <linux-
> > > [email protected]>; linux-kernel <linux-
> > > [email protected]>; [email protected]; Shravya Kumbham
> > > <[email protected]>; git <[email protected]>
> > > Subject: Re: dmaengine : xilinx_dma two issues
> > >
> > > Hi All,
> > >
> > > On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
> > > wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Radhey Shyam Pandey
> > > > > Sent: Monday, January 4, 2021 10:50 AM
> > > > > To: Paul Thomas <[email protected]>; Dan Williams
> > > > > <[email protected]>; Vinod Koul <[email protected]>; Michal
> > > > > Simek <[email protected]>; Matthew Murrian
> > > > > <[email protected]>; Romain Perier
> > > > > <[email protected]>; Krzysztof Kozlowski <[email protected]>;
> > > > > Marc Ferland <[email protected]>; Sebastian von Ohr
> > > > > <[email protected]>; [email protected]; Linux ARM <linux-
> > > > > [email protected]>; linux-kernel <linux-
> > > > > [email protected]>; Shravya Kumbham <[email protected]>; git
> > > > > <[email protected]>
> > > > > Subject: RE: dmaengine : xilinx_dma two issues
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Paul Thomas <[email protected]>
> > > > > > Sent: Monday, December 28, 2020 10:14 AM
> > > > > > To: Dan Williams <[email protected]>; Vinod Koul
> > > > > > <[email protected]>; Michal Simek <[email protected]>; Radhey
> > > > > > Shyam Pandey <[email protected]>; Matthew Murrian
> > > > > > <[email protected]>; Romain Perier
> > > > > <[email protected]>;
> > > > > > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > > > > > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > > > > > [email protected]; Linux ARM <linux-
> > > > > > [email protected]>; linux-kernel <linux-
> > > > > > [email protected]>
> > > > > > Subject: dmaengine : xilinx_dma two issues
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'm trying to get the 5.10 kernel up and running for our system,
> > > > > > and I'm running into a couple of issues with xilinx_dma.
> > > > > + (Xilinx mailing list)
> > > > >
> > > > > Thanks for bringing the issues to our notice. Replies inline.
> > > > >
> > > > > >
> > > > > > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > > > > > probe fix node order dependency' breaks our usage. Before this
> > > > > > commit a
> > > > > call to:
> > > > > > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > > > > > after that commit it returns -19. The reason for this seems to be
> > > > > > that the only channel that is setup is channel 1 (chan->id is 1 in
> > > > > xilinx_dma_chan_probe()).
> > > > > > However in
> > > > > > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > > > > > dma_spec-
> > > > > > >args[0];), which causes the:
> > > > > > !xdev->chan[chan_id]
> > > > > > test to fail in of_dma_xilinx_xlate()
> > > > >
> > > > > What is the channel number passed in dmaclient DT?
> > > Is this a question for me?
> >
> > Yes, please also share the dmaclient DT client node. Need to see
> > channel number passed to dmas property. Something like below-
> >
> > dmas = <& axi_dma_0 1>
> > dma-names = "axi_dma_0"
> OK, I think I need to revisit this and clean it up some. Currently In
> the driver (a custom iio adc driver) it is hard coded:
> dma_request_chan(&indio_dev->dev, "axi_dma_0");
>
> However, the DT also has the entries (currently unused by the driver):
> dmas = <&axi_dma_0 0>;
> dma-names = "axi_dma_0";
>
> I'll go back and clean up our driver to do something like adi-axi-adc.c does:
>
> if (!device_property_present(dev, "dmas"))
> return 0;
>
> if (device_property_read_string(dev, "dma-names", &dma_name))
> dma_name = "axi_dma_0";
>
> Should the dmas node get used by the driver? I see the second argument
> is: '0' for write/tx and '1' for read/rx channel. So I should be
> setting this to 1 like this?
> dmas = <&axi_dma_0 1>;
> dma-names = "axi_dma_0";
OK I tested this and using a DT entry like above fixes the failure
during probe. Now that I sort of understand this I'm good.
>
> But where does that field get used?
>
> thanks,
> Paul
>
> >
> > >
> > > >
> > > > Any update on this issue?
> > >
> > > >
> > > > >
> > > > > dmas = <& axi_dma_0 1>
> > > > > dma-names = "axi_dma_0"
> > > > >
> > > > > >
> > > > > > Our device-tree entry looks like this:
> > > > > > axi_dma_0: dma@80002000 {
> > > > > > status = "okay";
> > > > > > #dma-cells = <1>;
> > > > > > compatible = "xlnx,axi-dma-1.00.a";
> > > > > > interrupt-parent = <&gic>;
> > > > > > interrupts = <0 89 4>;
> > > > > > reg = <0x0 0x80002000 0x0 0x1000>;
> > > > > > xlnx,addrwidth = <0x20>;
> > > > > > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > > > > > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > > > > > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > > > > > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > > > > > dma-channel@80002030 {
> > > > > > compatible = "xlnx,axi-dma-s2mm-channel";
> > > > > > dma-channels = <0x1>;
> > > > > > interrupts = <0 89 4>;
> > > > > > xlnx,datawidth = <0x20>;
> > > > > > xlnx,device-id = <0x0>;
> > > > > > };
> > > > > > };
> > > > > >
> > > > > > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> > > > > >
> > > > > > The second issue goes a little further back to commit
> > > > > > e81274cd6b526
> > > > > > 'dmaengine: add support to dynamic register/unregister of channels'.
> > > > > > After this commit even just removing the module 'rmmod
> > > > > > xilinx_dma', without ever using it, results in a kernel oops like this:
> > > > > > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > > > > > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > > > > > defined so it is not safe to unbind this driver while in use
> > > > > > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > > > > > Probed!!
> > > > > > [ 42.100660] Unable to handle kernel paging request at virtual
> > > > > > address dead000000000108
> > > > > > [ 42.108598] Mem abort info:
> > > > > > [ 42.111393] ESR = 0x96000044
> > > > > > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > > [ 42.119744] SET = 0, FnV = 0
> > > > > > [ 42.122794] EA = 0, S1PTW = 0
> > > > > > [ 42.125918] Data abort info:
> > > > > > [ 42.128789] ISV = 0, ISS = 0x00000044
> > > > > > [ 42.132617] CM = 0, WnR = 1
> > > > > > [ 42.135577] [dead000000000108] address between user and kernel
> > > > > > address ranges
> > > > > > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > > > > > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > > > > > uio_pdrv_genirq
> > > > > > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > > > > > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > > > > > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > > > > > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > > > > > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > > > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > > > > > [ 42.185636] sp : ffffffc01112bca0
> > > > > > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > > > > > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > > > > > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > > > > > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > > > > > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > > > > > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > > > > > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > > > > > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > > > > > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > > > > > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > > > > > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > > > > > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > > > > > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > > > > > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > > > > > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > > > > > [ 42.268488] Call trace:
> > > > > > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > > > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > > > > > [ 42.281446] platform_drv_remove+0x24/0x38
> > > > > > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > > > > > [ 42.290659] driver_detach+0x64/0xd8
> > > > > > [ 42.294226] bus_remove_driver+0x58/0xb8
> > > > > > [ 42.298133] driver_unregister+0x30/0x60
> > > > > > [ 42.302048] platform_driver_unregister+0x14/0x20
> > > > > > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > > > > > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > > > > > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > > > > > [ 42.321959] do_el0_svc+0x70/0x90
> > > > > > [ 42.325267] el0_svc+0x14/0x20
> > > > > > [ 42.328313] el0_sync_handler+0x90/0xb8
> > > > > > [ 42.332141] el0_sync+0x158/0x180
> > > > > > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > > > > > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> > > > > >
> > > > > > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version
> > > > > > of xilinx_dma.c and never remove the module then it is working
> > > > > > with the
> > > > > > 5.10.1 kernel.
> > > > >
> > > > > Ok, we will analyze this issue and report back the findings.
> > > >
> > > > + Dave
> > > >
> > > > I was able to reproduce this crash on the unloading xilinx_dma module.
> > > > This is introduced due to the e81274cd6b526 mainline commit added in
> > > > the 5.6 kernel version. The crash is coming from -
> > > >
> > > > xilinx_dma_chan_remove+0x74/0xa0:
> > > > __list_del at ./include/linux/list.h:112 (inlined by) __list_del_entry
> > > > at./include/linux/list.h:135 (inlined by) list_del at
> > > > ./include/linux/list.h:146 (inlined by) xilinx_dma_chan_remove at
> > > > drivers/dma/xilinx/xilinx_dma.c:2546
> > > >
> > > > Looking into e81274cd6b526 commit - It deletes channel device_node
> > > entry.
> > > > Same channel device_node entry is also deleted in
> > > > xilinx_dma_chan_remove as a result we see this crash.
> > > >
> > > > @@ -993,12 +1007,22 @@ static
> > > > void __dma_async_device_channel_unregister(struct dma_device *device,
> > > > "%s called while %d clients hold a reference\n",
> > > > __func__, chan->client_count);
> > > > mutex_lock(&dma_list_mutex);
> > > > + list_del(&chan->device_node);
> > > > + device->chancnt--;
> > > >
> > > > I will let Dave comment on the background of this change. In
> > > > dmaengine driver - we are adding channel device node entry so deleting
> > > > it in the exit path looks fine to me. Also, I checked other dma
> > > > drivers are also deleting channel device_node entry in .remove so it
> > > > likely a common problem.
This just leaves the oops on removing the module.

thanks,
Paul

> > > >
> > > > >
> > > > > >
> > > > > > Hopefully, this will be clear to someone how these issues can be
> > > > > > resolved. In general we've been very happy using the xilinx dma.
> > > > > >
> > > > > > I'm not subscribed to the linux-kernel ML so if you need any
> > > > > > further info or testing just keep me in the to: list.
> > > > > >
> > > > > > thanks,
> > > > > > Paul
> > >
> > > Thanks for looking at this! Let me know if anyone needs more info.
> > >
> > > -Paul

2021-01-15 19:34:22

by Paul Thomas

[permalink] [raw]
Subject: Re: dmaengine : xilinx_dma two issues

On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]> wrote:
>
> > -----Original Message-----
> > From: Radhey Shyam Pandey
> > Sent: Monday, January 4, 2021 10:50 AM
> > To: Paul Thomas <[email protected]>; Dan Williams
> > <[email protected]>; Vinod Koul <[email protected]>; Michal Simek
> > <[email protected]>; Matthew Murrian <[email protected]>;
> > Romain Perier <[email protected]>; Krzysztof Kozlowski
> > <[email protected]>; Marc Ferland <[email protected]>; Sebastian von Ohr
> > <[email protected]>; [email protected]; Linux ARM <linux-
> > [email protected]>; linux-kernel <linux-
> > [email protected]>; Shravya Kumbham <[email protected]>; git
> > <[email protected]>
> > Subject: RE: dmaengine : xilinx_dma two issues
> >
> > > -----Original Message-----
> > > From: Paul Thomas <[email protected]>
> > > Sent: Monday, December 28, 2020 10:14 AM
> > > To: Dan Williams <[email protected]>; Vinod Koul
> > > <[email protected]>; Michal Simek <[email protected]>; Radhey Shyam
> > > Pandey <[email protected]>; Matthew Murrian
> > > <[email protected]>; Romain Perier
> > <[email protected]>;
> > > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > > [email protected]; Linux ARM <linux-
> > > [email protected]>; linux-kernel <linux-
> > > [email protected]>
> > > Subject: dmaengine : xilinx_dma two issues
> > >
> > > Hello,
> > >
> > > I'm trying to get the 5.10 kernel up and running for our system, and
> > > I'm running into a couple of issues with xilinx_dma.
> > + (Xilinx mailing list)
> >
> > Thanks for bringing the issues to our notice. Replies inline.
> >
> > >
> > > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > > probe fix node order dependency' breaks our usage. Before this commit a
> > call to:
> > > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > > after that commit it returns -19. The reason for this seems to be that
> > > the only channel that is setup is channel 1 (chan->id is 1 in
> > xilinx_dma_chan_probe()).
> > > However in
> > > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > > dma_spec-
> > > >args[0];), which causes the:
> > > !xdev->chan[chan_id]
> > > test to fail in of_dma_xilinx_xlate()
> >
> > What is the channel number passed in
> > dmaclient DT?
>
> Any update on this issue?
>
> >
> > dmas = <& axi_dma_0 1>
> > dma-names = "axi_dma_0"
> >
> > >
> > > Our device-tree entry looks like this:
> > > axi_dma_0: dma@80002000 {
> > > status = "okay";
> > > #dma-cells = <1>;
> > > compatible = "xlnx,axi-dma-1.00.a";
> > > interrupt-parent = <&gic>;
> > > interrupts = <0 89 4>;
> > > reg = <0x0 0x80002000 0x0 0x1000>;
> > > xlnx,addrwidth = <0x20>;
> > > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > > dma-channel@80002030 {
> > > compatible = "xlnx,axi-dma-s2mm-channel";
> > > dma-channels = <0x1>;
> > > interrupts = <0 89 4>;
> > > xlnx,datawidth = <0x20>;
> > > xlnx,device-id = <0x0>;
> > > };
> > > };
> > >
> > > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> > >
> > > The second issue goes a little further back to commit e81274cd6b526
> > > 'dmaengine: add support to dynamic register/unregister of channels'.
> > > After this commit even just removing the module 'rmmod xilinx_dma',
> > > without ever using it, results in a kernel oops like this:
> > > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > > defined so it is not safe to unbind this driver while in use
> > > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > > Probed!!
> > > [ 42.100660] Unable to handle kernel paging request at virtual
> > > address dead000000000108
> > > [ 42.108598] Mem abort info:
> > > [ 42.111393] ESR = 0x96000044
> > > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > > [ 42.119744] SET = 0, FnV = 0
> > > [ 42.122794] EA = 0, S1PTW = 0
> > > [ 42.125918] Data abort info:
> > > [ 42.128789] ISV = 0, ISS = 0x00000044
> > > [ 42.132617] CM = 0, WnR = 1
> > > [ 42.135577] [dead000000000108] address between user and kernel
> > > address ranges
> > > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > > uio_pdrv_genirq
> > > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > > [ 42.185636] sp : ffffffc01112bca0
> > > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > > [ 42.268488] Call trace:
> > > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > > [ 42.281446] platform_drv_remove+0x24/0x38
> > > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > > [ 42.290659] driver_detach+0x64/0xd8
> > > [ 42.294226] bus_remove_driver+0x58/0xb8
> > > [ 42.298133] driver_unregister+0x30/0x60
> > > [ 42.302048] platform_driver_unregister+0x14/0x20
> > > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > > [ 42.321959] do_el0_svc+0x70/0x90
> > > [ 42.325267] el0_svc+0x14/0x20
> > > [ 42.328313] el0_sync_handler+0x90/0xb8
> > > [ 42.332141] el0_sync+0x158/0x180
> > > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> > >
> > > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version of
> > > xilinx_dma.c and never remove the module then it is working with the
> > > 5.10.1 kernel.
> >
> > Ok, we will analyze this issue and report back the findings.
>
> + Dave
>
> I was able to reproduce this crash on the unloading xilinx_dma module.
> This is introduced due to the e81274cd6b526 mainline commit added
> in the 5.6 kernel version. The crash is coming from -
>
> xilinx_dma_chan_remove+0x74/0xa0:
> __list_del at ./include/linux/list.h:112
> (inlined by) __list_del_entry at./include/linux/list.h:135
> (inlined by) list_del at ./include/linux/list.h:146
> (inlined by) xilinx_dma_chan_remove at drivers/dma/xilinx/xilinx_dma.c:2546
>
> Looking into e81274cd6b526 commit - It deletes channel device_node entry.
> Same channel device_node entry is also deleted in xilinx_dma_chan_remove
> as a result we see this crash.
>
> @@ -993,12 +1007,22 @@ static
> void __dma_async_device_channel_unregister(struct dma_device *device,
> "%s called while %d clients hold a reference\n",
> __func__, chan->client_count);
> mutex_lock(&dma_list_mutex);
> + list_del(&chan->device_node);
> + device->chancnt--;

OK, so following the above if I remove the list_del() from
xilinx_dma_chan_remove() like this:

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 22faea653ea8..ebe8ec6a040c 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2542,8 +2542,6 @@ static void xilinx_dma_chan_remove(struct
xilinx_dma_chan *chan)
free_irq(chan->irq, chan);

tasklet_kill(&chan->tasklet);
-
- list_del(&chan->common.device_node);
}

static int axidma_clk_init(struct platform_device *pdev, struct clk **axi_clk,

Then this fixes the oops on remove. As a side note it seems that if
the oops has already happened then the Software System Reset doesn't
work in our system.

-Paul

>
> I will let Dave comment on the background of this change. In
> dmaengine driver - we are adding channel device node entry
> so deleting it in the exit path looks fine to me. Also, I checked
> other dma drivers are also deleting channel device_node entry
> in .remove so it likely a common problem.
>
> >
> > >
> > > Hopefully, this will be clear to someone how these issues can be
> > > resolved. In general we've been very happy using the xilinx dma.
> > >
> > > I'm not subscribed to the linux-kernel ML so if you need any further
> > > info or testing just keep me in the to: list.
> > >
> > > thanks,
> > > Paul

2021-01-18 04:39:38

by Radhey Shyam Pandey

[permalink] [raw]
Subject: RE: dmaengine : xilinx_dma two issues

> -----Original Message-----
> From: Paul Thomas <[email protected]>
> Sent: Saturday, January 16, 2021 1:00 AM
> To: Radhey Shyam Pandey <[email protected]>
> Cc: Dan Williams <[email protected]>; Vinod Koul
> <[email protected]>; Michal Simek <[email protected]>; Matthew Murrian
> <[email protected]>; Romain Perier
> <[email protected]>; Krzysztof Kozlowski <[email protected]>; Marc
> Ferland <[email protected]>; Sebastian von Ohr
> <[email protected]>; [email protected]; Linux ARM <linux-
> [email protected]>; linux-kernel <linux-
> [email protected]>; [email protected]; Shravya Kumbham
> <[email protected]>; git <[email protected]>
> Subject: Re: dmaengine : xilinx_dma two issues
>
> On Fri, Jan 8, 2021 at 2:13 AM Radhey Shyam Pandey <[email protected]>
> wrote:
> >
> > > -----Original Message-----
> > > From: Radhey Shyam Pandey
> > > Sent: Monday, January 4, 2021 10:50 AM
> > > To: Paul Thomas <[email protected]>; Dan Williams
> > > <[email protected]>; Vinod Koul <[email protected]>; Michal
> Simek
> > > <[email protected]>; Matthew Murrian
> <[email protected]>;
> > > Romain Perier <[email protected]>; Krzysztof Kozlowski
> > > <[email protected]>; Marc Ferland <[email protected]>; Sebastian von
> Ohr
> > > <[email protected]>; [email protected]; Linux ARM <linux-
> > > [email protected]>; linux-kernel <linux-
> > > [email protected]>; Shravya Kumbham <[email protected]>; git
> > > <[email protected]>
> > > Subject: RE: dmaengine : xilinx_dma two issues
> > >
> > > > -----Original Message-----
> > > > From: Paul Thomas <[email protected]>
> > > > Sent: Monday, December 28, 2020 10:14 AM
> > > > To: Dan Williams <[email protected]>; Vinod Koul
> > > > <[email protected]>; Michal Simek <[email protected]>; Radhey
> Shyam
> > > > Pandey <[email protected]>; Matthew Murrian
> > > > <[email protected]>; Romain Perier
> > > <[email protected]>;
> > > > Krzysztof Kozlowski <[email protected]>; Marc Ferland
> > > > <[email protected]>; Sebastian von Ohr <[email protected]>;
> > > > [email protected]; Linux ARM <linux-
> > > > [email protected]>; linux-kernel <linux-
> > > > [email protected]>
> > > > Subject: dmaengine : xilinx_dma two issues
> > > >
> > > > Hello,
> > > >
> > > > I'm trying to get the 5.10 kernel up and running for our system, and
> > > > I'm running into a couple of issues with xilinx_dma.
> > > + (Xilinx mailing list)
> > >
> > > Thanks for bringing the issues to our notice. Replies inline.
> > >
> > > >
> > > > First, commit 14ccf0aab46e 'dmaengine: xilinx_dma: In dma channel
> > > > probe fix node order dependency' breaks our usage. Before this
> commit a
> > > call to:
> > > > dma_request_chan(&indio_dev->dev, "axi_dma_0"); returns fine, but
> > > > after that commit it returns -19. The reason for this seems to be that
> > > > the only channel that is setup is channel 1 (chan->id is 1 in
> > > xilinx_dma_chan_probe()).
> > > > However in
> > > > of_dma_xilinx_xlate() chan_id is gets set to 0 (int chan_id =
> > > > dma_spec-
> > > > >args[0];), which causes the:
> > > > !xdev->chan[chan_id]
> > > > test to fail in of_dma_xilinx_xlate()
> > >
> > > What is the channel number passed in
> > > dmaclient DT?
> >
> > Any update on this issue?
> >
> > >
> > > dmas = <& axi_dma_0 1>
> > > dma-names = "axi_dma_0"
> > >
> > > >
> > > > Our device-tree entry looks like this:
> > > > axi_dma_0: dma@80002000 {
> > > > status = "okay";
> > > > #dma-cells = <1>;
> > > > compatible = "xlnx,axi-dma-1.00.a";
> > > > interrupt-parent = <&gic>;
> > > > interrupts = <0 89 4>;
> > > > reg = <0x0 0x80002000 0x0 0x1000>;
> > > > xlnx,addrwidth = <0x20>;
> > > > clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>,
> > > > <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk LPD_LSBUS>;
> > > > clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk",
> > > > "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
> > > > dma-channel@80002030 {
> > > > compatible = "xlnx,axi-dma-s2mm-channel";
> > > > dma-channels = <0x1>;
> > > > interrupts = <0 89 4>;
> > > > xlnx,datawidth = <0x20>;
> > > > xlnx,device-id = <0x0>;
> > > > };
> > > > };
> > > >
> > > > This is on a 5.10.1 kernel on arm64 zynqmp hardware.
> > > >
> > > > The second issue goes a little further back to commit e81274cd6b526
> > > > 'dmaengine: add support to dynamic register/unregister of channels'.
> > > > After this commit even just removing the module 'rmmod xilinx_dma',
> > > > without ever using it, results in a kernel oops like this:
> > > > [ 37.214568] xilinx-vdma 80002000.dma: ch 0: SG disabled
> > > > [ 37.219807] xilinx-vdma 80002000.dma: WARN: Device release is not
> > > > defined so it is not safe to unbind this driver while in use
> > > > [ 37.231299] xilinx-vdma 80002000.dma: Xilinx AXI DMA Engine Driver
> > > > Probed!!
> > > > [ 42.100660] Unable to handle kernel paging request at virtual
> > > > address dead000000000108
> > > > [ 42.108598] Mem abort info:
> > > > [ 42.111393] ESR = 0x96000044
> > > > [ 42.114443] EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [ 42.119744] SET = 0, FnV = 0
> > > > [ 42.122794] EA = 0, S1PTW = 0
> > > > [ 42.125918] Data abort info:
> > > > [ 42.128789] ISV = 0, ISS = 0x00000044
> > > > [ 42.132617] CM = 0, WnR = 1
> > > > [ 42.135577] [dead000000000108] address between user and kernel
> > > > address ranges
> > > > [ 42.142705] Internal error: Oops: 96000044 [#1] SMP
> > > > [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard
> > > > uio_pdrv_genirq
> > > > [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted
> > > > 5.10.1-00026-g3a2e6dd7a05-dirty #192
> > > > [ 42.163302] Hardware name: Enclustra XU5 SOM (DT)
> > > > [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> > > > [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
> > > > [ 42.185636] sp : ffffffc01112bca0
> > > > [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640
> > > > [ 42.194238] x27: 0000000000000000 x26: 0000000000000000
> > > > [ 42.199542] x25: 0000000000000000 x24: 0000000000000000
> > > > [ 42.204845] x23: 0000000000000000 x22: 0000000000000000
> > > > [ 42.210149] x21: ffffffc0088a2028 x20: ffffff8040c08410
> > > > [ 42.215452] x19: ffffff80423fa480 x18: ffffffffffffffff
> > > > [ 42.220756] x17: 0000000000000000 x16: 0000000000000000
> > > > [ 42.226059] x15: ffffffc010ce88c8 x14: 0000000000000040
> > > > [ 42.231363] x13: ffffff0000000000 x12: ffffffffffffffff
> > > > [ 42.236667] x11: 0000000000000028 x10: ffffffff7fffffff
> > > > [ 42.241970] x9 : ffffffff00f0dfe0 x8 : 0000000000000000
> > > > [ 42.247273] x7 : ffffffc010da4000 x6 : 0000000000000000
> > > > [ 42.252577] x5 : 0000000000210d00 x4 : ffffffc010da4da0
> > > > [ 42.257881] x3 : ffffff80423fa578 x2 : 0000000000000000
> > > > [ 42.263184] x1 : dead000000000100 x0 : dead000000000122
> > > > [ 42.268488] Call trace:
> > > > [ 42.270923] xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
> > > > [ 42.276399] xilinx_dma_remove+0x3c/0x70 [xilinx_dma]
> > > > [ 42.281446] platform_drv_remove+0x24/0x38
> > > > [ 42.285530] device_release_driver_internal+0xec/0x1a8
> > > > [ 42.290659] driver_detach+0x64/0xd8
> > > > [ 42.294226] bus_remove_driver+0x58/0xb8
> > > > [ 42.298133] driver_unregister+0x30/0x60
> > > > [ 42.302048] platform_driver_unregister+0x14/0x20
> > > > [ 42.306744] xilinx_vdma_driver_exit+0x18/0x978 [xilinx_dma]
> > > > [ 42.312396] __arm64_sys_delete_module+0x1e4/0x270
> > > > [ 42.317178] el0_svc_common.constprop.4+0x68/0x170
> > > > [ 42.321959] do_el0_svc+0x70/0x90
> > > > [ 42.325267] el0_svc+0x14/0x20
> > > > [ 42.328313] el0_sync_handler+0x90/0xb8
> > > > [ 42.332141] el0_sync+0x158/0x180
> > > > [ 42.335442] Code: 95dfce29 9103c260 95de7ffb a9490261 (f9000420)
> > > > [ 42.341525] ---[ end trace dbd90aeb5ca71943 ]---
> > > >
> > > > So if I use the 04c2bc2bede1 (commit before 14ccf0aab46e) version of
> > > > xilinx_dma.c and never remove the module then it is working with the
> > > > 5.10.1 kernel.
> > >
> > > Ok, we will analyze this issue and report back the findings.
> >
> > + Dave
> >
> > I was able to reproduce this crash on the unloading xilinx_dma module.
> > This is introduced due to the e81274cd6b526 mainline commit added
> > in the 5.6 kernel version. The crash is coming from -
> >
> > xilinx_dma_chan_remove+0x74/0xa0:
> > __list_del at ./include/linux/list.h:112
> > (inlined by) __list_del_entry at./include/linux/list.h:135
> > (inlined by) list_del at ./include/linux/list.h:146
> > (inlined by) xilinx_dma_chan_remove at
> drivers/dma/xilinx/xilinx_dma.c:2546
> >
> > Looking into e81274cd6b526 commit - It deletes channel device_node
> entry.
> > Same channel device_node entry is also deleted in
> xilinx_dma_chan_remove
> > as a result we see this crash.
> >
> > @@ -993,12 +1007,22 @@ static
> > void __dma_async_device_channel_unregister(struct dma_device *device,
> > "%s called while %d clients hold a reference\n",
> > __func__, chan->client_count);
> > mutex_lock(&dma_list_mutex);
> > + list_del(&chan->device_node);
> > + device->chancnt--;
>
> OK, so following the above if I remove the list_del() from
> xilinx_dma_chan_remove() like this:
>
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> index 22faea653ea8..ebe8ec6a040c 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -2542,8 +2542,6 @@ static void xilinx_dma_chan_remove(struct
> xilinx_dma_chan *chan)
> free_irq(chan->irq, chan);
>
> tasklet_kill(&chan->tasklet);
> -
> - list_del(&chan->common.device_node);
> }
>
> static int axidma_clk_init(struct platform_device *pdev, struct clk **axi_clk,
>
> Then this fixes the oops on remove. As a side note it seems that if

Thanks for the update. Yes as expected the issue is because device_node
is deleted twice. I will start a separate discussion thread with Dave and
get the background on why in commit # e81274cd6b526 was added to
also delete device_node in __dma_async_device_channel_unregister.

> the oops has already happened then the Software System Reset doesn't
> work in our system.
>
> -Paul
>
> >
> > I will let Dave comment on the background of this change. In
> > dmaengine driver - we are adding channel device node entry
> > so deleting it in the exit path looks fine to me. Also, I checked
> > other dma drivers are also deleting channel device_node entry
> > in .remove so it likely a common problem.
> >
> > >
> > > >
> > > > Hopefully, this will be clear to someone how these issues can be
> > > > resolved. In general we've been very happy using the xilinx dma.
> > > >
> > > > I'm not subscribed to the linux-kernel ML so if you need any further
> > > > info or testing just keep me in the to: list.
> > > >
> > > > thanks,
> > > > Paul