2018-09-19 12:36:50

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

From: Laurentiu Tudor <[email protected]>

This patch series adds SMMU support for NXP LS1043A and LS1046A chips
and consists mostly in important driver fixes and the required device
tree updates. It touches several subsystems and consists of three main
parts:
- changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
reserved memory areas, fixes and defered probe support
- changes in drivers/net/ethernet/freescale/dpaa_eth drivers
consisting in misc dma mapping related fixes and probe ordering
- addition of the actual arm smmu device tree node together with
various adjustments to the device trees

Performance impact

Running iperf benchmarks in a back-to-back setup (both sides
having smmu enabled) on a 10GBps port show an important
networking performance degradation of around %40 (9.48Gbps
linerate vs 5.45Gbps). If you need performance but without
SMMU support you can use "iommu.passthrough=1" to disable
SMMU.

USB issue and workaround

There's a problem with the usb controllers in these chips
generating smaller, 40-bit wide dma addresses instead of the 48-bit
supported at the smmu input. So you end up in a situation where the
smmu is mapped with 48-bit address translations, but the device
generates transactions with clipped 40-bit addresses, thus smmu
context faults are triggered. I encountered a similar situation for
mmc that I managed to fix in software [1] however for USB I did not
find a proper place in the code to add a similar fix. The only
workaround I found was to add this kernel parameter which limits the
usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
This workaround if far from ideal, so any suggestions for a code
based workaround in this area would be greatly appreciated.

The patch set is based on net-next so, if generally agreed, I'd suggest
to get the patches through the netdev tree after getting all the Acks.

[1] https://patchwork.kernel.org/patch/10506627/

Laurentiu Tudor (21):
soc/fsl/qman: fixup liodns only on ppc targets
soc/fsl/bman: map FBPR area in the iommu
soc/fsl/qman: map FQD and PFDR areas in the iommu
soc/fsl/qman-portal: map CENA area in the iommu
soc/fsl/qbman: add APIs to retrieve the probing status
soc/fsl/qman_portals: defer probe after qman's probe
soc/fsl/bman_portals: defer probe after bman's probe
soc/fsl/qbman_portals: add APIs to retrieve the probing status
fsl/fman: backup and restore ICID registers
fsl/fman: add API to get the device behind a fman port
dpaa_eth: defer probing after qbman
dpaa_eth: base dma mappings on the fman rx port
dpaa_eth: fix iova handling for contiguous frames
dpaa_eth: fix iova handling for sg frames
dpaa_eth: fix SG frame cleanup
arm64: dts: ls1046a: add smmu node
arm64: dts: ls1043a: add smmu node
arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
arm64: dts: ls104x: add missing dma ranges property
arm64: dts: ls104x: add iommu-map to pci controllers
arm64: dts: ls104x: make dma-coherent global to the SoC

.../arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 52 ++++++-
.../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 48 +++++++
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 136 ++++++++++++------
drivers/net/ethernet/freescale/fman/fman.c | 35 ++++-
drivers/net/ethernet/freescale/fman/fman.h | 4 +
.../net/ethernet/freescale/fman/fman_port.c | 14 ++
.../net/ethernet/freescale/fman/fman_port.h | 2 +
drivers/soc/fsl/qbman/bman_ccsr.c | 23 +++
drivers/soc/fsl/qbman/bman_portal.c | 20 ++-
drivers/soc/fsl/qbman/qman_ccsr.c | 30 ++++
drivers/soc/fsl/qbman/qman_portal.c | 35 +++++
include/soc/fsl/bman.h | 16 +++
include/soc/fsl/qman.h | 17 +++
13 files changed, 379 insertions(+), 53 deletions(-)

--
2.17.1



2018-09-19 12:37:04

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 06/21] soc/fsl/qman_portals: defer probe after qman's probe

From: Laurentiu Tudor <[email protected]>

Defer probe of qman portals after qman probing. This fixes the crash
below, seen on NXP LS1043A SoCs:

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000004
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[0000000000000004] user address but active_mm is swapper
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.18.0-rc1-next-20180622-00200-g986f5c179185 #9
Hardware name: LS1043A RDB Board (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO)
pc : qman_set_sdest+0x74/0xa0
lr : qman_portal_probe+0x22c/0x470
sp : ffff00000803bbc0
x29: ffff00000803bbc0 x28: 0000000000000000
x27: ffff0000090c1b88 x26: ffff00000927cb68
x25: ffff00000927c000 x24: ffff00000927cb60
x23: 0000000000000000 x22: 0000000000000000
x21: ffff0000090e9000 x20: ffff800073b5c810
x19: ffff800027401298 x18: ffffffffffffffff
x17: 0000000000000001 x16: 0000000000000000
x15: ffff0000090e96c8 x14: ffff80002740138a
x13: ffff0000090f2000 x12: 0000000000000030
x11: ffff000008f25000 x10: 0000000000000000
x9 : ffff80007bdfd2c0 x8 : 0000000000004000
x7 : ffff80007393cc18 x6 : 0040000000000001
x5 : 0000000000000000 x4 : ffffffffffffffff
x3 : 0000000000000004 x2 : ffff00000927c900
x1 : 0000000000000000 x0 : 0000000000000004
Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
Call trace:
qman_set_sdest+0x74/0xa0
platform_drv_probe+0x50/0xa8
driver_probe_device+0x214/0x2f8
__driver_attach+0xd8/0xe0
bus_for_each_dev+0x68/0xc8
driver_attach+0x20/0x28
bus_add_driver+0x108/0x228
driver_register+0x60/0x110
__platform_driver_register+0x40/0x48
qman_portal_driver_init+0x20/0x84
do_one_initcall+0x58/0x168
kernel_init_freeable+0x184/0x22c
kernel_init+0x10/0x108
ret_from_fork+0x10/0x18
Code: f9400443 11001000 927e4800 8b000063 (b9400063)
---[ end trace 4f6d50489ecfb930 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/qman_portal.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index 012bb95e87e1..7fd13f8c8da2 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -229,6 +229,14 @@ static int qman_portal_probe(struct platform_device *pdev)
int irq, cpu, err;
u32 val;

+ err = qman_is_probed();
+ if (!err)
+ return -EPROBE_DEFER;
+ if (err < 0) {
+ dev_err(&pdev->dev, "failing probe due to qman probe error\n");
+ return -ENODEV;
+ }
+
pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL);
if (!pcfg)
return -ENOMEM;
--
2.17.1


2018-09-19 12:37:11

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 08/21] soc/fsl/qbman_portals: add APIs to retrieve the probing status

From: Laurentiu Tudor <[email protected]>

Add a couple of new APIs to check the probing status of the required
cpu bound qman and bman portals:
'int bman_portals_probed()' and 'int qman_portals_probed()'.
They return the following values.
* 1 if qman/bman portals were all probed correctly
* 0 if qman/bman portals were not yet probed
* -1 if probing of qman/bman portals failed
Drivers that use qman/bman portal driver services are required to use
these APIs before calling any functions exported by these drivers or
otherwise they will crash the kernel.
First user will be the dpaa1 ethernet driver, coming in a subsequent
patch.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/bman_portal.c | 10 ++++++++++
drivers/soc/fsl/qbman/qman_portal.c | 10 ++++++++++
include/soc/fsl/bman.h | 8 ++++++++
include/soc/fsl/qman.h | 9 +++++++++
4 files changed, 37 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index f9edd28894fd..8048d35de8a2 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -32,6 +32,7 @@

static struct bman_portal *affine_bportals[NR_CPUS];
static struct cpumask portal_cpus;
+static int __bman_portals_probed;
/* protect bman global registers and global data shared among portals */
static DEFINE_SPINLOCK(bman_lock);

@@ -85,6 +86,12 @@ static int bman_online_cpu(unsigned int cpu)
return 0;
}

+int bman_portals_probed(void)
+{
+ return __bman_portals_probed;
+}
+EXPORT_SYMBOL_GPL(bman_portals_probed);
+
static int bman_portal_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -148,6 +155,7 @@ static int bman_portal_probe(struct platform_device *pdev)
spin_lock(&bman_lock);
cpu = cpumask_next_zero(-1, &portal_cpus);
if (cpu >= nr_cpu_ids) {
+ __bman_portals_probed = 1;
/* unassigned portal, skip init */
spin_unlock(&bman_lock);
return 0;
@@ -173,6 +181,8 @@ static int bman_portal_probe(struct platform_device *pdev)
err_ioremap2:
memunmap(pcfg->addr_virt_ce);
err_ioremap1:
+ __bman_portals_probed = 1;
+
return -ENXIO;
}

diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index 7fd13f8c8da2..1a987aa2ec8c 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -39,6 +39,7 @@ EXPORT_SYMBOL(qman_dma_portal);
#define CONFIG_FSL_DPA_PIRQ_FAST 1

static struct cpumask portal_cpus;
+static int __qman_portals_probed;
/* protect qman global registers and global data shared among portals */
static DEFINE_SPINLOCK(qman_lock);

@@ -219,6 +220,12 @@ static int qman_online_cpu(unsigned int cpu)
return 0;
}

+int qman_portals_probed(void)
+{
+ return __qman_portals_probed;
+}
+EXPORT_SYMBOL_GPL(qman_portals_probed);
+
static int qman_portal_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -306,6 +313,7 @@ static int qman_portal_probe(struct platform_device *pdev)
spin_lock(&qman_lock);
cpu = cpumask_next_zero(-1, &portal_cpus);
if (cpu >= nr_cpu_ids) {
+ __qman_portals_probed = 1;
/* unassigned portal, skip init */
spin_unlock(&qman_lock);
return 0;
@@ -336,6 +344,8 @@ static int qman_portal_probe(struct platform_device *pdev)
err_ioremap2:
memunmap(pcfg->addr_virt_ce);
err_ioremap1:
+ __qman_portals_probed = -1;
+
return -ENXIO;
}

diff --git a/include/soc/fsl/bman.h b/include/soc/fsl/bman.h
index 5b99cb2ea5ef..173e4049d963 100644
--- a/include/soc/fsl/bman.h
+++ b/include/soc/fsl/bman.h
@@ -133,5 +133,13 @@ int bman_acquire(struct bman_pool *pool, struct bm_buffer *bufs, u8 num);
* failed to probe or 0 if the bman driver did not probed yet.
*/
int bman_is_probed(void);
+/**
+ * bman_portals_probed - Check if all cpu bound bman portals are probed
+ *
+ * Returns 1 if all the required cpu bound bman portals successfully probed,
+ * -1 if probe errors appeared or 0 if the bman portals did not yet finished
+ * probing.
+ */
+int bman_portals_probed(void);

#endif /* __FSL_BMAN_H */
diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
index 597783b8a3a0..7732e48081eb 100644
--- a/include/soc/fsl/qman.h
+++ b/include/soc/fsl/qman.h
@@ -1194,4 +1194,13 @@ int qman_release_cgrid(u32 id);
*/
int qman_is_probed(void);

+/**
+ * qman_portals_probed - Check if all cpu bound qman portals are probed
+ *
+ * Returns 1 if all the required cpu bound qman portals successfully probed,
+ * -1 if probe errors appeared or 0 if the qman portals did not yet finished
+ * probing.
+ */
+int qman_portals_probed(void);
+
#endif /* __FSL_QMAN_H */
--
2.17.1


2018-09-19 12:37:18

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 12/21] dpaa_eth: base dma mappings on the fman rx port

From: Laurentiu Tudor <[email protected]>

The dma transactions initiator is the rx fman port so that's the device
that the dma mappings should be done. Previously the mappings were done
through the MAC device which makes no sense because it's neither dma-able
nor connected in any way to smmu.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6ca3fdbef580..ac9e50c8a556 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2796,8 +2796,15 @@ static int dpaa_eth_probe(struct platform_device *pdev)
return -ENODEV;
}

+ mac_dev = dpaa_mac_dev_get(pdev);
+ if (IS_ERR(mac_dev)) {
+ dev_err(&pdev->dev, "dpaa_mac_dev_get() failed\n");
+ err = PTR_ERR(mac_dev);
+ goto probe_err;
+ }
+
/* device used for DMA mapping */
- dev = pdev->dev.parent;
+ dev = fman_port_get_device(mac_dev->port[RX]);
err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
if (err) {
dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
@@ -2822,13 +2829,6 @@ static int dpaa_eth_probe(struct platform_device *pdev)

priv->msg_enable = netif_msg_init(debug, DPAA_MSG_DEFAULT);

- mac_dev = dpaa_mac_dev_get(pdev);
- if (IS_ERR(mac_dev)) {
- dev_err(dev, "dpaa_mac_dev_get() failed\n");
- err = PTR_ERR(mac_dev);
- goto free_netdev;
- }
-
/* If fsl_fm_max_frm is set to a higher value than the all-common 1500,
* we choose conservatively and let the user explicitly set a higher
* MTU via ifconfig. Otherwise, the user may end up with different MTUs
@@ -2964,9 +2964,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
qman_release_cgrid(priv->cgr_data.cgr.cgrid);
free_dpaa_bps:
dpaa_bps_free(priv);
-free_netdev:
dev_set_drvdata(dev, NULL);
free_netdev(net_dev);
+probe_err:

return err;
}
--
2.17.1


2018-09-19 12:37:19

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 15/21] dpaa_eth: fix SG frame cleanup

From: Laurentiu Tudor <[email protected]>

Fix issue with the entry indexing in the sg frame cleanup code being
off-by-1. This problem showed up when doing some basic iperf tests and
manifested in traffic coming to a halt.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 8db861f281a0..605f06f0def8 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1663,7 +1663,7 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
qm_sg_entry_get_len(&sgt[0]), dma_dir);

/* remaining pages were mapped with skb_frag_dma_map() */
- for (i = 1; i < nr_frags; i++) {
+ for (i = 1; i <= nr_frags; i++) {
WARN_ON(qm_sg_entry_is_ext(&sgt[i]));

dma_unmap_page(dev, qm_sg_addr(&sgt[i]),
--
2.17.1


2018-09-19 12:37:30

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 21/21] arm64: dts: ls104x: make dma-coherent global to the SoC

From: Laurentiu Tudor <[email protected]>

These SoCs are really completely dma coherent in their entirety so add
the dma-coherent property at the soc level in the device tree and drop
the instances where it's specifically added to a few select devices.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 5 +----
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 3b7b2e60bd9a..d02106cb2116 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -215,6 +215,7 @@
#size-cells = <2>;
ranges;
dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
+ dma-coherent;

clockgen: clocking@1ee1000 {
compatible = "fsl,ls1043a-clockgen";
@@ -680,7 +681,6 @@
reg-names = "ahci", "sata-ecc";
interrupts = <0 69 0x4>;
clocks = <&clockgen 4 0>;
- dma-coherent;
};

msi1: msi-controller1@1571000 {
@@ -715,7 +715,6 @@
#address-cells = <3>;
#size-cells = <2>;
device_type = "pci";
- dma-coherent;
iommu-map = <0 &mmu 0 1>;
num-lanes = <4>;
bus-range = <0x0 0xff>;
@@ -741,7 +740,6 @@
#address-cells = <3>;
#size-cells = <2>;
device_type = "pci";
- dma-coherent;
iommu-map = <0 &mmu 0 1>;
num-lanes = <2>;
bus-range = <0x0 0xff>;
@@ -767,7 +765,6 @@
#address-cells = <3>;
#size-cells = <2>;
device_type = "pci";
- dma-coherent;
iommu-map = <0 &mmu 0 1>;
num-lanes = <2>;
bus-range = <0x0 0xff>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 890d1565791f..3bdea0470f69 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -188,6 +188,7 @@
#size-cells = <2>;
ranges;
dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
+ dma-coherent;

ddr: memory-controller@1080000 {
compatible = "fsl,qoriq-memory-controller";
--
2.17.1


2018-09-19 12:37:38

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 19/21] arm64: dts: ls104x: add missing dma ranges property

From: Laurentiu Tudor <[email protected]>

These chips have a 48-bit address size so make sure that the dma-ranges
reflects this. Otherwise the linux kernel's dma sub-system will set
the default dma masks to full 64-bit, badly breaking dmas.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 90296b9fb171..48091409c472 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -214,6 +214,7 @@
#address-cells = <2>;
#size-cells = <2>;
ranges;
+ dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;

clockgen: clocking@1ee1000 {
compatible = "fsl,ls1043a-clockgen";
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 15094dd8400e..40484f6f6d42 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -187,6 +187,7 @@
#address-cells = <2>;
#size-cells = <2>;
ranges;
+ dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;

ddr: memory-controller@1080000 {
compatible = "fsl,qoriq-memory-controller";
--
2.17.1


2018-09-19 12:37:38

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 01/21] soc/fsl/qman: fixup liodns only on ppc targets

From: Laurentiu Tudor <[email protected]>

ARM SoCs use SMMU so the liodn fixup done in the qman driver is no
longer making sense and it also breaks the ICID settings inherited
from u-boot. Do the fixups only for PPC targets.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/qman_ccsr.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 79cba58387a5..619e22030460 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -597,6 +597,7 @@ static int qman_init_ccsr(struct device *dev)
#define LIO_CFG_LIODN_MASK 0x0fff0000
void qman_liodn_fixup(u16 channel)
{
+#ifdef CONFIG_PPC
static int done;
static u32 liodn_offset;
u32 before, after;
@@ -616,6 +617,7 @@ void qman_liodn_fixup(u16 channel)
qm_ccsr_out(REG_REV3_QCSP_LIO_CFG(idx), after);
else
qm_ccsr_out(REG_QCSP_LIO_CFG(idx), after);
+#endif
}

#define IO_CFG_SDEST_MASK 0x00ff0000
--
2.17.1


2018-09-19 12:37:48

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 20/21] arm64: dts: ls104x: add iommu-map to pci controllers

From: Laurentiu Tudor <[email protected]>

The pci controllers are also behind the smmu so add the iommu-map
property to reflect this. The bootloader needs to patch the stream id
ranges to some sane values.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 3 +++
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 3 +++
2 files changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 48091409c472..3b7b2e60bd9a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -716,6 +716,7 @@
#size-cells = <2>;
device_type = "pci";
dma-coherent;
+ iommu-map = <0 &mmu 0 1>;
num-lanes = <4>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x40 0x00010000 0x0 0x00010000 /* downstream I/O */
@@ -741,6 +742,7 @@
#size-cells = <2>;
device_type = "pci";
dma-coherent;
+ iommu-map = <0 &mmu 0 1>;
num-lanes = <2>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x48 0x00010000 0x0 0x00010000 /* downstream I/O */
@@ -766,6 +768,7 @@
#size-cells = <2>;
device_type = "pci";
dma-coherent;
+ iommu-map = <0 &mmu 0 1>;
num-lanes = <2>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x50 0x00010000 0x0 0x00010000 /* downstream I/O */
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 40484f6f6d42..890d1565791f 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -685,6 +685,7 @@
#size-cells = <2>;
device_type = "pci";
dma-coherent;
+ iommu-map = <0 &mmu 0 1>;
num-lanes = <4>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x40 0x00010000 0x0 0x00010000 /* downstream I/O */
@@ -710,6 +711,7 @@
#size-cells = <2>;
device_type = "pci";
dma-coherent;
+ iommu-map = <0 &mmu 0 1>;
num-lanes = <2>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x48 0x00010000 0x0 0x00010000 /* downstream I/O */
@@ -735,6 +737,7 @@
#size-cells = <2>;
device_type = "pci";
dma-coherent;
+ iommu-map = <0 &mmu 0 1>;
num-lanes = <2>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x50 0x00010000 0x0 0x00010000 /* downstream I/O */
--
2.17.1


2018-09-19 12:37:56

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID

From: Laurentiu Tudor <[email protected]>

The StreamID entering the SMMU is actually a concatenation of the
SMMU TBU ID and the ICID configured in software.
Since the TBU ID is internal to the SoC and since we want that the
actual the ICID configured in software to enter the SMMU witout any
additional set bits, mask out the TBU ID bits and leave only the
relevant ICID bits to enter SMMU.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 8b3eba167508..90296b9fb171 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -226,6 +226,7 @@
compatible = "arm,mmu-500";
reg = <0 0x9000000 0 0x400000>;
dma-coherent;
+ stream-match-mask = <0x7f00>;
#global-interrupts = <2>;
#iommu-cells = <1>;
interrupts = <0 142 4>, /* global secure fault */
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 06863d3e4a7d..15094dd8400e 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -232,6 +232,7 @@
compatible = "arm,mmu-500";
reg = <0 0x9000000 0 0x400000>;
dma-coherent;
+ stream-match-mask = <0x7f00>;
#global-interrupts = <2>;
#iommu-cells = <1>;
interrupts = <0 142 4>, /* global secure fault */
--
2.17.1


2018-09-19 12:38:03

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 17/21] arm64: dts: ls1043a: add smmu node

From: Laurentiu Tudor <[email protected]>

This allows for the SMMU device to be probed by the SMMU kernel driver.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
.../arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 42 +++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 7881e3d81a9a..8b3eba167508 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -222,6 +222,48 @@
clocks = <&sysclk>;
};

+ mmu: iommu@9000000 {
+ compatible = "arm,mmu-500";
+ reg = <0 0x9000000 0 0x400000>;
+ dma-coherent;
+ #global-interrupts = <2>;
+ #iommu-cells = <1>;
+ interrupts = <0 142 4>, /* global secure fault */
+ <0 143 4>, /* combined secure interrupt */
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>;
+ };
+
scfg: scfg@1570000 {
compatible = "fsl,ls1043a-scfg", "syscon";
reg = <0x0 0x1570000 0x0 0x10000>;
--
2.17.1


2018-09-19 12:38:50

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 14/21] dpaa_eth: fix iova handling for sg frames

From: Laurentiu Tudor <[email protected]>

The driver relies on the no longer valid assumption that dma addresses
(iovas) are identical to physical addressees and uses phys_to_virt() to
make iova -> vaddr conversions. Fix this also for scatter-gather frames
using the iova -> phys conversion function added in the previous patch.
While at it, clean-up a redundant dpaa_bpid2pool() and pass the bp
as parameter.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 41 +++++++++++--------
1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index e9e081c3f8cc..8db861f281a0 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1646,14 +1646,17 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,

if (unlikely(qm_fd_get_format(fd) == qm_fd_sg)) {
nr_frags = skb_shinfo(skb)->nr_frags;
- dma_unmap_single(dev, addr,
- qm_fd_get_offset(fd) + DPAA_SGT_SIZE,
- dma_dir);

/* The sgt buffer has been allocated with netdev_alloc_frag(),
* it's from lowmem.
*/
- sgt = phys_to_virt(addr + qm_fd_get_offset(fd));
+ sgt = phys_to_virt(dpaa_iova_to_phys(dev,
+ addr +
+ qm_fd_get_offset(fd)));
+
+ dma_unmap_single(dev, addr,
+ qm_fd_get_offset(fd) + DPAA_SGT_SIZE,
+ dma_dir);

/* sgt[0] is from lowmem, was dma_map_single()-ed */
dma_unmap_single(dev, qm_sg_addr(&sgt[0]),
@@ -1668,7 +1671,7 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
}

/* Free the page frag that we allocated on Tx */
- skb_free_frag(phys_to_virt(addr));
+ skb_free_frag(skbh);
} else {
dma_unmap_single(dev, addr,
skb_tail_pointer(skb) - (u8 *)skbh, dma_dir);
@@ -1729,14 +1732,14 @@ static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
* The page fragment holding the S/G Table is recycled here.
*/
static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
- const struct qm_fd *fd)
+ const struct qm_fd *fd,
+ struct dpaa_bp *dpaa_bp,
+ void *vaddr)
{
ssize_t fd_off = qm_fd_get_offset(fd);
- dma_addr_t addr = qm_fd_addr(fd);
const struct qm_sg_entry *sgt;
struct page *page, *head_page;
- struct dpaa_bp *dpaa_bp;
- void *vaddr, *sg_vaddr;
+ void *sg_vaddr;
int frag_off, frag_len;
struct sk_buff *skb;
dma_addr_t sg_addr;
@@ -1745,7 +1748,6 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
int *count_ptr;
int i;

- vaddr = phys_to_virt(addr);
WARN_ON(!IS_ALIGNED((unsigned long)vaddr, SMP_CACHE_BYTES));

/* Iterate through the SGT entries and add data buffers to the skb */
@@ -1756,14 +1758,18 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
WARN_ON(qm_sg_entry_is_ext(&sgt[i]));

sg_addr = qm_sg_addr(&sgt[i]);
- sg_vaddr = phys_to_virt(sg_addr);
- WARN_ON(!IS_ALIGNED((unsigned long)sg_vaddr,
- SMP_CACHE_BYTES));

/* We may use multiple Rx pools */
dpaa_bp = dpaa_bpid2pool(sgt[i].bpid);
- if (!dpaa_bp)
+ if (!dpaa_bp) {
+ pr_info("%s: fail to get dpaa_bp for sg bpid %d\n",
+ __func__, sgt[i].bpid);
goto free_buffers;
+ }
+ sg_vaddr = phys_to_virt(dpaa_iova_to_phys(dpaa_bp->dev,
+ sg_addr));
+ WARN_ON(!IS_ALIGNED((unsigned long)sg_vaddr,
+ SMP_CACHE_BYTES));

count_ptr = this_cpu_ptr(dpaa_bp->percpu_count);
dma_unmap_single(dpaa_bp->dev, sg_addr, dpaa_bp->size,
@@ -1835,10 +1841,11 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
/* free all the SG entries */
for (i = 0; i < DPAA_SGT_MAX_ENTRIES ; i++) {
sg_addr = qm_sg_addr(&sgt[i]);
- sg_vaddr = phys_to_virt(sg_addr);
- skb_free_frag(sg_vaddr);
dpaa_bp = dpaa_bpid2pool(sgt[i].bpid);
if (dpaa_bp) {
+ sg_addr = dpaa_iova_to_phys(dpaa_bp->dev, sg_addr);
+ sg_vaddr = phys_to_virt(sg_addr);
+ skb_free_frag(sg_vaddr);
count_ptr = this_cpu_ptr(dpaa_bp->percpu_count);
(*count_ptr)--;
}
@@ -2324,7 +2331,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
if (likely(fd_format == qm_fd_contig))
skb = contig_fd_to_skb(priv, fd, dpaa_bp, vaddr);
else
- skb = sg_fd_to_skb(priv, fd);
+ skb = sg_fd_to_skb(priv, fd, dpaa_bp, vaddr);
if (!skb)
return qman_cb_dqrr_consume;

--
2.17.1


2018-09-19 12:38:58

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 11/21] dpaa_eth: defer probing after qbman

From: Laurentiu Tudor <[email protected]>

Enabling SMMU altered the order of device probing causing the dpaa1
ethernet driver to get probed before qbman and causing a boot crash.
Add predictability in the probing order by deferring the ethernet
driver probe after qbman and portals by using the recently introduced
qbman APIs.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 31 +++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index a5131a510e8b..6ca3fdbef580 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2765,6 +2765,37 @@ static int dpaa_eth_probe(struct platform_device *pdev)
int err = 0, i, channel;
struct device *dev;

+ err = bman_is_probed();
+ if (!err)
+ return -EPROBE_DEFER;
+ if (err < 0) {
+ dev_err(&pdev->dev, "failing probe due to bman probe error\n");
+ return -ENODEV;
+ }
+ err = qman_is_probed();
+ if (!err)
+ return -EPROBE_DEFER;
+ if (err < 0) {
+ dev_err(&pdev->dev, "failing probe due to qman probe error\n");
+ return -ENODEV;
+ }
+ err = bman_portals_probed();
+ if (!err)
+ return -EPROBE_DEFER;
+ if (err < 0) {
+ dev_err(&pdev->dev,
+ "failing probe due to bman portals probe error\n");
+ return -ENODEV;
+ }
+ err = qman_portals_probed();
+ if (!err)
+ return -EPROBE_DEFER;
+ if (err < 0) {
+ dev_err(&pdev->dev,
+ "failing probe due to qman portals probe error\n");
+ return -ENODEV;
+ }
+
/* device used for DMA mapping */
dev = pdev->dev.parent;
err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
--
2.17.1


2018-09-19 12:39:03

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 16/21] arm64: dts: ls1046a: add smmu node

From: Laurentiu Tudor <[email protected]>

This allows for the SMMU device to be probed by the SMMU kernel driver.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
.../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 42 +++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index ef83786b8b90..06863d3e4a7d 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -228,6 +228,48 @@
bus-width = <4>;
};

+ mmu: iommu@9000000 {
+ compatible = "arm,mmu-500";
+ reg = <0 0x9000000 0 0x400000>;
+ dma-coherent;
+ #global-interrupts = <2>;
+ #iommu-cells = <1>;
+ interrupts = <0 142 4>, /* global secure fault */
+ <0 143 4>, /* combined secure interrupt */
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>,
+ <0 142 4>;
+ };
+
scfg: scfg@1570000 {
compatible = "fsl,ls1046a-scfg", "syscon";
reg = <0x0 0x1570000 0x0 0x10000>;
--
2.17.1


2018-09-19 12:39:11

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 10/21] fsl/fman: add API to get the device behind a fman port

From: Laurentiu Tudor <[email protected]>

Add an API that retrieves the 'struct device' that the specified fman
port probed against. The new API will be used in a subsequent iommu
enablement related patch.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/net/ethernet/freescale/fman/fman_port.c | 14 ++++++++++++++
drivers/net/ethernet/freescale/fman/fman_port.h | 2 ++
2 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index ee82ee1384eb..bd76c9730692 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1728,6 +1728,20 @@ u32 fman_port_get_qman_channel_id(struct fman_port *port)
}
EXPORT_SYMBOL(fman_port_get_qman_channel_id);

+/**
+ * fman_port_get_device
+ * port: Pointer to the FMan port device
+ *
+ * Get the 'struct device' associated to the specified FMan port device
+ *
+ * Return: pointer to associated 'struct device'
+ */
+struct device *fman_port_get_device(struct fman_port *port)
+{
+ return port->dev;
+}
+EXPORT_SYMBOL(fman_port_get_device);
+
int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset)
{
if (port->buffer_offsets.hash_result_offset == ILLEGAL_BASE)
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index 9dbb69f40121..82f12661a46d 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -157,4 +157,6 @@ int fman_port_get_tstamp(struct fman_port *port, const void *data, u64 *tstamp);

struct fman_port *fman_port_bind(struct device *dev);

+struct device *fman_port_get_device(struct fman_port *port);
+
#endif /* __FMAN_PORT_H */
--
2.17.1


2018-09-19 12:39:44

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 05/21] soc/fsl/qbman: add APIs to retrieve the probing status

From: Laurentiu Tudor <[email protected]>

Add a couple of new APIs to check the probing status of qman and bman:
'int bman_is_probed()' and 'int qman_is_probed()'.
They return the following values.
* 1 if qman/bman were probed correctly
* 0 if qman/bman were not yet probed
* -1 if probing of qman/bman failed
Drivers that use qman/bman driver services are required to use these
APIs before calling any functions exported by qman or bman drivers
or otherwise they will crash the kernel.
The APIs will be used in the following couple of qbman portal patches
and later in the series in the dpaa1 ethernet driver.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/bman_ccsr.c | 11 +++++++++++
drivers/soc/fsl/qbman/qman_ccsr.c | 11 +++++++++++
include/soc/fsl/bman.h | 8 ++++++++
include/soc/fsl/qman.h | 8 ++++++++
4 files changed, 38 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c
index 680f67f04fb4..2c11883d42a5 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -121,6 +121,7 @@ static void bm_set_memory(u64 ba, u32 size)
*/
static dma_addr_t fbpr_a;
static size_t fbpr_sz;
+static int __bman_probed;

static int bman_fbpr(struct reserved_mem *rmem)
{
@@ -167,6 +168,12 @@ static irqreturn_t bman_isr(int irq, void *ptr)
return IRQ_HANDLED;
}

+int bman_is_probed(void)
+{
+ return __bman_probed;
+}
+EXPORT_SYMBOL_GPL(bman_is_probed);
+
static int fsl_bman_probe(struct platform_device *pdev)
{
int ret, err_irq;
@@ -177,6 +184,8 @@ static int fsl_bman_probe(struct platform_device *pdev)
u16 id, bm_pool_cnt;
u8 major, minor;

+ __bman_probed = -1;
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
dev_err(dev, "Can't get %pOF property 'IORESOURCE_MEM'\n",
@@ -267,6 +276,8 @@ static int fsl_bman_probe(struct platform_device *pdev)
return ret;
}

+ __bman_probed = 1;
+
return 0;
};

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 7163f7511ce1..0bfbe24b479a 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -274,6 +274,7 @@ static const struct qman_error_info_mdata error_mdata[] = {
static u32 __iomem *qm_ccsr_start;
/* A SDQCR mask comprising all the available/visible pool channels */
static u32 qm_pools_sdqcr;
+static int __qman_probed;

static inline u32 qm_ccsr_in(u32 offset)
{
@@ -689,6 +690,12 @@ static int qman_resource_init(struct device *dev)
return 0;
}

+int qman_is_probed(void)
+{
+ return __qman_probed;
+}
+EXPORT_SYMBOL_GPL(qman_is_probed);
+
static int fsl_qman_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -699,6 +706,8 @@ static int fsl_qman_probe(struct platform_device *pdev)
u16 id;
u8 major, minor;

+ __qman_probed = -1;
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
dev_err(dev, "Can't get %pOF property 'IORESOURCE_MEM'\n",
@@ -847,6 +856,8 @@ static int fsl_qman_probe(struct platform_device *pdev)
if (ret)
return ret;

+ __qman_probed = 1;
+
return 0;
}

diff --git a/include/soc/fsl/bman.h b/include/soc/fsl/bman.h
index eaaf56df4086..5b99cb2ea5ef 100644
--- a/include/soc/fsl/bman.h
+++ b/include/soc/fsl/bman.h
@@ -126,4 +126,12 @@ int bman_release(struct bman_pool *pool, const struct bm_buffer *bufs, u8 num);
*/
int bman_acquire(struct bman_pool *pool, struct bm_buffer *bufs, u8 num);

+/**
+ * bman_is_probed - Check if bman is probed
+ *
+ * Returns 1 if the bman driver successfully probed, -1 if the bman driver
+ * failed to probe or 0 if the bman driver did not probed yet.
+ */
+int bman_is_probed(void);
+
#endif /* __FSL_BMAN_H */
diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
index d4dfefdee6c1..597783b8a3a0 100644
--- a/include/soc/fsl/qman.h
+++ b/include/soc/fsl/qman.h
@@ -1186,4 +1186,12 @@ int qman_alloc_cgrid_range(u32 *result, u32 count);
*/
int qman_release_cgrid(u32 id);

+/**
+ * qman_is_probed - Check if qman is probed
+ *
+ * Returns 1 if the qman driver successfully probed, -1 if the qman driver
+ * failed to probe or 0 if the qman driver did not probed yet.
+ */
+int qman_is_probed(void);
+
#endif /* __FSL_QMAN_H */
--
2.17.1


2018-09-19 12:39:47

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 09/21] fsl/fman: backup and restore ICID registers

From: Laurentiu Tudor <[email protected]>

During probing, FMAN is reset thus losing all its register
settings. Backup port ICID registers before reset and restore
them after, similarly to how it's done on powerpc / PAMU based
platforms.
This also has the side effect of disabling the old code path
(liodn backup/restore handling) that obviously make no sense
in the context of SMMU on ARMs.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/net/ethernet/freescale/fman/fman.c | 35 +++++++++++++++++++++-
drivers/net/ethernet/freescale/fman/fman.h | 4 +++
2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index c415ac67cb7b..8f9136892d98 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -629,6 +629,7 @@ static void set_port_order_restoration(struct fman_fpm_regs __iomem *fpm_rg,
iowrite32be(tmp, &fpm_rg->fmfp_prc);
}

+#ifdef CONFIG_PPC
static void set_port_liodn(struct fman *fman, u8 port_id,
u32 liodn_base, u32 liodn_ofst)
{
@@ -646,6 +647,27 @@ static void set_port_liodn(struct fman *fman, u8 port_id,
iowrite32be(tmp, &fman->dma_regs->fmdmplr[port_id / 2]);
iowrite32be(liodn_ofst, &fman->bmi_regs->fmbm_spliodn[port_id - 1]);
}
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+static void save_restore_port_icids(struct fman *fman, bool save)
+{
+ int port_idxes[] = {
+ 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc,
+ 0xd, 0xe, 0xf, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
+ 0x10, 0x11, 0x30, 0x31
+ };
+ int idx, i;
+
+ for (i = 0; i < ARRAY_SIZE(port_idxes); i++) {
+ idx = port_idxes[i];
+ if (save)
+ fman->sp_icids[idx] =
+ ioread32be(&fman->bmi_regs->fmbm_spliodn[idx]);
+ else
+ iowrite32be(fman->sp_icids[idx],
+ &fman->bmi_regs->fmbm_spliodn[idx]);
+ }
+}
+#endif

static void enable_rams_ecc(struct fman_fpm_regs __iomem *fpm_rg)
{
@@ -1914,7 +1936,10 @@ static int fman_reset(struct fman *fman)
static int fman_init(struct fman *fman)
{
struct fman_cfg *cfg = NULL;
- int err = 0, i, count;
+ int err = 0, count;
+#ifdef CONFIG_PPC
+ int i;
+#endif

if (is_init_done(fman->cfg))
return -EINVAL;
@@ -1934,6 +1959,7 @@ static int fman_init(struct fman *fman)
memset_io((void __iomem *)(fman->base_addr + CGP_OFFSET), 0,
fman->state->fm_port_num_of_cg);

+#ifdef CONFIG_PPC
/* Save LIODN info before FMan reset
* Skipping non-existent port 0 (i = 1)
*/
@@ -1953,6 +1979,9 @@ static int fman_init(struct fman *fman)
}
fman->liodn_base[i] = liodn_base;
}
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+ save_restore_port_icids(fman, true);
+#endif

err = fman_reset(fman);
if (err)
@@ -2181,8 +2210,12 @@ int fman_set_port_params(struct fman *fman,
if (err)
goto return_err;

+#ifdef CONFIG_PPC
set_port_liodn(fman, port_id, fman->liodn_base[port_id],
fman->liodn_offset[port_id]);
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+ save_restore_port_icids(fman, false);
+#endif

if (fman->state->rev_info.major < 6)
set_port_order_restoration(fman->fpm_regs, port_id);
diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h
index 935c317fa696..19f20fa58053 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -346,8 +346,12 @@ struct fman {
unsigned long fifo_offset;
size_t fifo_size;

+#ifdef CONFIG_PPC
u32 liodn_base[64];
u32 liodn_offset[64];
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+ u32 sp_icids[64];
+#endif

struct fman_dts_params dts_params;
};
--
2.17.1


2018-09-19 12:39:57

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 04/21] soc/fsl/qman-portal: map CENA area in the iommu

From: Laurentiu Tudor <[email protected]>

Add a one-to-one iommu mapping for qman portal CENA register area.
This is required for QMAN stashing to work without faults behind
an iommu.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/qman_portal.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index a120002b630e..012bb95e87e1 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -29,6 +29,7 @@
*/

#include "qman_priv.h"
+#include <linux/iommu.h>

struct qman_portal *qman_dma_portal;
EXPORT_SYMBOL(qman_dma_portal);
@@ -222,6 +223,7 @@ static int qman_portal_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
struct device_node *node = dev->of_node;
+ struct iommu_domain *domain;
struct qm_portal_config *pcfg;
struct resource *addr_phys[2];
int irq, cpu, err;
@@ -276,6 +278,21 @@ static int qman_portal_probe(struct platform_device *pdev)
goto err_ioremap2;
}

+ /* Create an 1-to-1 iommu mapping for cena portal area */
+ domain = iommu_get_domain_for_dev(dev);
+ if (domain) {
+ /*
+ * Note: not mapping this as cacheable triggers the infamous
+ * QMan CIDE error.
+ */
+ err = iommu_map(iommu_get_domain_for_dev(dev),
+ addr_phys[0]->start, addr_phys[0]->start,
+ resource_size(addr_phys[0]),
+ IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+ if (err)
+ dev_warn(dev, "failed to iommu_map() %d\n", err);
+ }
+
pcfg->pools = qm_get_pools_sdqcr();

spin_lock(&qman_lock);
--
2.17.1


2018-09-19 12:40:19

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 07/21] soc/fsl/bman_portals: defer probe after bman's probe

From: Laurentiu Tudor <[email protected]>

A crash in bman portal probing could not be triggered (as is the case
with qman portals) but it does make calls [1] into the bman driver so
lets make sure the bman portal probing happens after bman's.

[1] bman_p_irqsource_add() (in bman) called by:
init_pcfg() called by:
bman_portal_probe()

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/bman_portal.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 2f71f7df3465..f9edd28894fd 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -91,7 +91,15 @@ static int bman_portal_probe(struct platform_device *pdev)
struct device_node *node = dev->of_node;
struct bm_portal_config *pcfg;
struct resource *addr_phys[2];
- int irq, cpu;
+ int irq, cpu, err;
+
+ err = bman_is_probed();
+ if (!err)
+ return -EPROBE_DEFER;
+ if (err < 0) {
+ dev_err(&pdev->dev, "failing probe due to bman probe error\n");
+ return -ENODEV;
+ }

pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL);
if (!pcfg)
--
2.17.1


2018-09-19 12:40:41

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 13/21] dpaa_eth: fix iova handling for contiguous frames

From: Laurentiu Tudor <[email protected]>

The driver relies on the no longer valid assumption that dma addresses
(iovas) are identical to physical addressees and uses phys_to_virt() to
make iova -> vaddr conversions. Fix this by adding a function that does
proper iova -> phys conversions using the iommu api and update the code
to use it.
Also, a dma_unmap_single() call had to be moved further down the code
because iova -> vaddr conversions were required before the unmap.
For now only the contiguous frame case is handled and the SG case is
split in a following patch.
While at it, clean-up a redundant dpaa_bpid2pool() and pass the bp
as parameter.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 44 ++++++++++---------
1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ac9e50c8a556..e9e081c3f8cc 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -50,6 +50,7 @@
#include <linux/highmem.h>
#include <linux/percpu.h>
#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
#include <linux/sort.h>
#include <soc/fsl/bman.h>
#include <soc/fsl/qman.h>
@@ -1595,6 +1596,17 @@ static int dpaa_eth_refill_bpools(struct dpaa_priv *priv)
return 0;
}

+static phys_addr_t dpaa_iova_to_phys(struct device *dev, dma_addr_t addr)
+{
+ struct iommu_domain *domain;
+
+ domain = iommu_get_domain_for_dev(dev);
+ if (domain)
+ return iommu_iova_to_phys(domain, addr);
+ else
+ return addr;
+}
+
/* Cleanup function for outgoing frame descriptors that were built on Tx path,
* either contiguous frames or scatter/gather ones.
* Skb freeing is not handled here.
@@ -1617,7 +1629,7 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
int nr_frags, i;
u64 ns;

- skbh = (struct sk_buff **)phys_to_virt(addr);
+ skbh = (struct sk_buff **)phys_to_virt(dpaa_iova_to_phys(dev, addr));
skb = *skbh;

if (priv->tx_tstamp && skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
@@ -1687,25 +1699,21 @@ static u8 rx_csum_offload(const struct dpaa_priv *priv, const struct qm_fd *fd)
* accommodate the shared info area of the skb.
*/
static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
- const struct qm_fd *fd)
+ const struct qm_fd *fd,
+ struct dpaa_bp *dpaa_bp,
+ void *vaddr)
{
ssize_t fd_off = qm_fd_get_offset(fd);
- dma_addr_t addr = qm_fd_addr(fd);
- struct dpaa_bp *dpaa_bp;
struct sk_buff *skb;
- void *vaddr;

- vaddr = phys_to_virt(addr);
WARN_ON(!IS_ALIGNED((unsigned long)vaddr, SMP_CACHE_BYTES));

- dpaa_bp = dpaa_bpid2pool(fd->bpid);
- if (!dpaa_bp)
- goto free_buffer;
-
skb = build_skb(vaddr, dpaa_bp->size +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
- if (WARN_ONCE(!skb, "Build skb failure on Rx\n"))
- goto free_buffer;
+ if (WARN_ONCE(!skb, "Build skb failure on Rx\n")) {
+ skb_free_frag(vaddr);
+ return NULL;
+ }
WARN_ON(fd_off != priv->rx_headroom);
skb_reserve(skb, fd_off);
skb_put(skb, qm_fd_get_length(fd));
@@ -1713,10 +1721,6 @@ static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
skb->ip_summed = rx_csum_offload(priv, fd);

return skb;
-
-free_buffer:
- skb_free_frag(vaddr);
- return NULL;
}

/* Build an skb with the data of the first S/G entry in the linear portion and
@@ -2302,12 +2306,12 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
if (!dpaa_bp)
return qman_cb_dqrr_consume;

- dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
-
/* prefetch the first 64 bytes of the frame or the SGT start */
- vaddr = phys_to_virt(addr);
+ vaddr = phys_to_virt(dpaa_iova_to_phys(dpaa_bp->dev, addr));
prefetch(vaddr + qm_fd_get_offset(fd));

+ dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
+
/* The only FD types that we may receive are contig and S/G */
WARN_ON((fd_format != qm_fd_contig) && (fd_format != qm_fd_sg));

@@ -2318,7 +2322,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
(*count_ptr)--;

if (likely(fd_format == qm_fd_contig))
- skb = contig_fd_to_skb(priv, fd);
+ skb = contig_fd_to_skb(priv, fd, dpaa_bp, vaddr);
else
skb = sg_fd_to_skb(priv, fd);
if (!skb)
--
2.17.1


2018-09-19 12:41:04

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 03/21] soc/fsl/qman: map FQD and PFDR areas in the iommu

From: Laurentiu Tudor <[email protected]>

Add a one-to-one iommu mapping for qman private data memory areas
(FQD and PFDR). This is required for QMAN to work without faults
behind an iommu.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/qman_ccsr.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 619e22030460..7163f7511ce1 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -29,6 +29,7 @@
*/

#include "qman_priv.h"
+#include <linux/iommu.h>

u16 qman_ip_rev;
EXPORT_SYMBOL(qman_ip_rev);
@@ -692,6 +693,7 @@ static int fsl_qman_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
struct device_node *node = dev->of_node;
+ struct iommu_domain *domain;
struct resource *res;
int ret, err_irq;
u16 id;
@@ -769,6 +771,21 @@ static int fsl_qman_probe(struct platform_device *pdev)
}
dev_dbg(dev, "Allocated PFDR 0x%llx 0x%zx\n", pfdr_a, pfdr_sz);

+ /* Create an 1-to-1 iommu mapping for fqd and pfdr areas */
+ domain = iommu_get_domain_for_dev(dev);
+ if (domain) {
+ ret = iommu_map(domain,
+ fqd_a, fqd_a, fqd_sz,
+ IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+ if (ret)
+ dev_warn(dev, "iommu_map(fqd) failed %d\n", ret);
+ ret = iommu_map(domain,
+ pfdr_a, pfdr_a, pfdr_sz,
+ IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+ if (ret)
+ dev_warn(dev, "iommu_map(pfdr) failed %d\n", ret);
+ }
+
ret = qman_init_ccsr(dev);
if (ret) {
dev_err(dev, "CCSR setup failed\n");
--
2.17.1


2018-09-19 12:41:20

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH 02/21] soc/fsl/bman: map FBPR area in the iommu

From: Laurentiu Tudor <[email protected]>

Add a one-to-one iommu mapping for bman private data memory (FBPR).
This is required for BMAN to work without faults behind an iommu.

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/soc/fsl/qbman/bman_ccsr.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c
index 05c42235dd41..680f67f04fb4 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -29,6 +29,7 @@
*/

#include "bman_priv.h"
+#include <linux/iommu.h>

u16 bman_ip_rev;
EXPORT_SYMBOL(bman_ip_rev);
@@ -171,6 +172,7 @@ static int fsl_bman_probe(struct platform_device *pdev)
int ret, err_irq;
struct device *dev = &pdev->dev;
struct device_node *node = dev->of_node;
+ struct iommu_domain *domain;
struct resource *res;
u16 id, bm_pool_cnt;
u8 major, minor;
@@ -216,6 +218,16 @@ static int fsl_bman_probe(struct platform_device *pdev)

dev_dbg(dev, "Allocated FBPR 0x%llx 0x%zx\n", fbpr_a, fbpr_sz);

+ /* Create an 1-to-1 iommu mapping for FBPR area */
+ domain = iommu_get_domain_for_dev(dev);
+ if (domain) {
+ ret = iommu_map(iommu_get_domain_for_dev(dev),
+ fbpr_a, fbpr_a, fbpr_sz,
+ IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+ if (ret)
+ dev_warn(dev, "failed to iommu_map() %d\n", ret);
+ }
+
bm_set_memory(fbpr_a, fbpr_sz);

err_irq = platform_get_irq(pdev, 0);
--
2.17.1


2018-09-19 13:28:57

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

Hi Laurentiu,

On 19/09/18 13:35, [email protected] wrote:
> From: Laurentiu Tudor <[email protected]>
>
> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
> and consists mostly in important driver fixes and the required device
> tree updates. It touches several subsystems and consists of three main
> parts:
> - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
> reserved memory areas, fixes and defered probe support
> - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
> consisting in misc dma mapping related fixes and probe ordering
> - addition of the actual arm smmu device tree node together with
> various adjustments to the device trees
>
> Performance impact
>
> Running iperf benchmarks in a back-to-back setup (both sides
> having smmu enabled) on a 10GBps port show an important
> networking performance degradation of around %40 (9.48Gbps
> linerate vs 5.45Gbps). If you need performance but without
> SMMU support you can use "iommu.passthrough=1" to disable
> SMMU.
>
> USB issue and workaround
>
> There's a problem with the usb controllers in these chips
> generating smaller, 40-bit wide dma addresses instead of the 48-bit
> supported at the smmu input. So you end up in a situation where the
> smmu is mapped with 48-bit address translations, but the device
> generates transactions with clipped 40-bit addresses, thus smmu
> context faults are triggered. I encountered a similar situation for
> mmc that I managed to fix in software [1] however for USB I did not
> find a proper place in the code to add a similar fix. The only
> workaround I found was to add this kernel parameter which limits the
> usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
> This workaround if far from ideal, so any suggestions for a code
> based workaround in this area would be greatly appreciated.

If you have a nominally-64-bit device with a
narrower-than-the-main-interconnect link in front of it, that should
already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
provided the interconnect hierarchy can be described appropriately (or
at least massaged sufficiently to satisfy the binding), e.g.:

/ {
...

soc {
ranges;
dma-ranges = <0 0 10000 0>;

dev_48bit { ... };

periph_bus {
ranges;
dma-ranges = <0 0 100 0>;

dev_40bit { ... };
};
};
};

and if that fails to work as expected (except for PCI hosts where
handling dma-ranges properly still needs sorting out), please do let us
know ;)

Robin.

> The patch set is based on net-next so, if generally agreed, I'd suggest
> to get the patches through the netdev tree after getting all the Acks.
>
> [1] https://patchwork.kernel.org/patch/10506627/
>
> Laurentiu Tudor (21):
> soc/fsl/qman: fixup liodns only on ppc targets
> soc/fsl/bman: map FBPR area in the iommu
> soc/fsl/qman: map FQD and PFDR areas in the iommu
> soc/fsl/qman-portal: map CENA area in the iommu
> soc/fsl/qbman: add APIs to retrieve the probing status
> soc/fsl/qman_portals: defer probe after qman's probe
> soc/fsl/bman_portals: defer probe after bman's probe
> soc/fsl/qbman_portals: add APIs to retrieve the probing status
> fsl/fman: backup and restore ICID registers
> fsl/fman: add API to get the device behind a fman port
> dpaa_eth: defer probing after qbman
> dpaa_eth: base dma mappings on the fman rx port
> dpaa_eth: fix iova handling for contiguous frames
> dpaa_eth: fix iova handling for sg frames
> dpaa_eth: fix SG frame cleanup
> arm64: dts: ls1046a: add smmu node
> arm64: dts: ls1043a: add smmu node
> arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
> arm64: dts: ls104x: add missing dma ranges property
> arm64: dts: ls104x: add iommu-map to pci controllers
> arm64: dts: ls104x: make dma-coherent global to the SoC
>
> .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 52 ++++++-
> .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 48 +++++++
> .../net/ethernet/freescale/dpaa/dpaa_eth.c | 136 ++++++++++++------
> drivers/net/ethernet/freescale/fman/fman.c | 35 ++++-
> drivers/net/ethernet/freescale/fman/fman.h | 4 +
> .../net/ethernet/freescale/fman/fman_port.c | 14 ++
> .../net/ethernet/freescale/fman/fman_port.h | 2 +
> drivers/soc/fsl/qbman/bman_ccsr.c | 23 +++
> drivers/soc/fsl/qbman/bman_portal.c | 20 ++-
> drivers/soc/fsl/qbman/qman_ccsr.c | 30 ++++
> drivers/soc/fsl/qbman/qman_portal.c | 35 +++++
> include/soc/fsl/bman.h | 16 +++
> include/soc/fsl/qman.h | 17 +++
> 13 files changed, 379 insertions(+), 53 deletions(-)
>

2018-09-19 13:31:23

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 16/21] arm64: dts: ls1046a: add smmu node

On 19/09/18 13:36, [email protected] wrote:
> From: Laurentiu Tudor <[email protected]>
>
> This allows for the SMMU device to be probed by the SMMU kernel driver.
>
> Signed-off-by: Laurentiu Tudor <[email protected]>
> ---
> .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 42 +++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index ef83786b8b90..06863d3e4a7d 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -228,6 +228,48 @@
> bus-width = <4>;
> };
>
> + mmu: iommu@9000000 {
> + compatible = "arm,mmu-500";
> + reg = <0 0x9000000 0 0x400000>;
> + dma-coherent;
> + #global-interrupts = <2>;
> + #iommu-cells = <1>;
> + interrupts = <0 142 4>, /* global secure fault */

Either that's not really the secure global interrupt, or those context
interrupts are wrong.

Robin.

> + <0 143 4>, /* combined secure interrupt */
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>,
> + <0 142 4>;
> + };
> +
> scfg: scfg@1570000 {
> compatible = "fsl,ls1046a-scfg", "syscon";
> reg = <0x0 0x1570000 0x0 0x10000>;
>

2018-09-19 14:10:08

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID

On 19/09/18 13:36, [email protected] wrote:
> From: Laurentiu Tudor <[email protected]>
>
> The StreamID entering the SMMU is actually a concatenation of the
> SMMU TBU ID and the ICID configured in software.
> Since the TBU ID is internal to the SoC and since we want that the
> actual the ICID configured in software to enter the SMMU witout any
> additional set bits, mask out the TBU ID bits and leave only the
> relevant ICID bits to enter SMMU.
>
> Signed-off-by: Laurentiu Tudor <[email protected]>
> ---
> arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
> arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
> index 8b3eba167508..90296b9fb171 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
> @@ -226,6 +226,7 @@
> compatible = "arm,mmu-500";
> reg = <0 0x9000000 0 0x400000>;
> dma-coherent;
> + stream-match-mask = <0x7f00>;

The TBU ID only forms the top 5 bits, so also ignoring bits 9:8 raises
an eyebrow - if the LS104x SMMU really is configured for 8-bit SID input
then it's harmless, but if it's actually a 9 or 10-bit configuration
then you probably want to avoid masking them (or at least document why)
- IIRC there *was* stuff wired there on LS2085 at least.

Robin.

> #global-interrupts = <2>;
> #iommu-cells = <1>;
> interrupts = <0 142 4>, /* global secure fault */
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 06863d3e4a7d..15094dd8400e 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -232,6 +232,7 @@
> compatible = "arm,mmu-500";
> reg = <0 0x9000000 0 0x400000>;
> dma-coherent;
> + stream-match-mask = <0x7f00>;
> #global-interrupts = <2>;
> #iommu-cells = <1>;
> interrupts = <0 142 4>, /* global secure fault */
>

2018-09-19 14:20:39

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [PATCH 16/21] arm64: dts: ls1046a: add smmu node

Hi Robin,

On 19.09.2018 16:30, Robin Murphy wrote:
> On 19/09/18 13:36, [email protected] wrote:
>> From: Laurentiu Tudor <[email protected]>
>>
>> This allows for the SMMU device to be probed by the SMMU kernel driver.
>>
>> Signed-off-by: Laurentiu Tudor <[email protected]>
>> ---
>>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 42 +++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> index ef83786b8b90..06863d3e4a7d 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> @@ -228,6 +228,48 @@
>>               bus-width = <4>;
>>           };
>> +        mmu: iommu@9000000 {
>> +            compatible = "arm,mmu-500";
>> +            reg = <0 0x9000000 0 0x400000>;
>> +            dma-coherent;
>> +            #global-interrupts = <2>;
>> +            #iommu-cells = <1>;
>> +            interrupts = <0 142 4>, /* global secure fault */
>
> Either that's not really the secure global interrupt, or those context
> interrupts are wrong.

Now that you pointing out, I realize that the comments don't make much
sense. Actually, 142 is the non-secure interrupt (all ints are ORed on
this IRQ) while 143 is the secure version. I'll update the comments in
the next re-spin.

---
Thanks & Best Regards, Laurentiu


>
>> +                     <0 143 4>, /* combined secure interrupt */
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>;
>> +        };
>> +
>>           scfg: scfg@1570000 {
>>               compatible = "fsl,ls1046a-scfg", "syscon";
>>               reg = <0x0 0x1570000 0x0 0x10000>;
>>

2018-09-19 14:21:26

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID

Hi Robin,

On 19.09.2018 16:41, Robin Murphy wrote:
> On 19/09/18 13:36, [email protected] wrote:
>> From: Laurentiu Tudor <[email protected]>
>>
>> The StreamID entering the SMMU is actually a concatenation of the
>> SMMU TBU ID and the ICID configured in software.
>> Since the TBU ID is internal to the SoC and since we want that the
>> actual the ICID configured in software to enter the SMMU witout any
>> additional set bits, mask out the TBU ID bits and leave only the
>> relevant ICID bits to enter SMMU.
>>
>> Signed-off-by: Laurentiu Tudor <[email protected]>
>> ---
>>   arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
>>   arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> index 8b3eba167508..90296b9fb171 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> @@ -226,6 +226,7 @@
>>               compatible = "arm,mmu-500";
>>               reg = <0 0x9000000 0 0x400000>;
>>               dma-coherent;
>> +            stream-match-mask = <0x7f00>;
>
> The TBU ID only forms the top 5 bits, so also ignoring bits 9:8 raises
> an eyebrow - if the LS104x SMMU really is configured for 8-bit SID input
> then it's harmless,

On these lower-end platforms the SID input is configured and documented
as 8-bit.

> but if it's actually a 9 or 10-bit configuration
> then you probably want to avoid masking them (or at least document why)
> - IIRC there *was* stuff wired there on LS2085 at least.

Yes, on LS2s there are 2 extra-bits in there carrying some signaling.
However, on LS1s they are not present.

---
Thanks & Best Regards, Laurentiu

>
>>               #global-interrupts = <2>;
>>               #iommu-cells = <1>;
>>               interrupts = <0 142 4>, /* global secure fault */
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> index 06863d3e4a7d..15094dd8400e 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> @@ -232,6 +232,7 @@
>>               compatible = "arm,mmu-500";
>>               reg = <0 0x9000000 0 0x400000>;
>>               dma-coherent;
>> +            stream-match-mask = <0x7f00>;
>>               #global-interrupts = <2>;
>>               #iommu-cells = <1>;
>>               interrupts = <0 142 4>, /* global secure fault */
>>

2018-09-19 14:24:12

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

Hi Robin,

On 19.09.2018 16:25, Robin Murphy wrote:
> Hi Laurentiu,
>
> On 19/09/18 13:35, [email protected] wrote:
>> From: Laurentiu Tudor <[email protected]>
>>
>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>> and consists mostly in important driver fixes and the required device
>> tree updates. It touches several subsystems and consists of three main
>> parts:
>>   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>     reserved memory areas, fixes and defered probe support
>>   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>     consisting in misc dma mapping related fixes and probe ordering
>>   - addition of the actual arm smmu device tree node together with
>>     various adjustments to the device trees
>>
>> Performance impact
>>
>>      Running iperf benchmarks in a back-to-back setup (both sides
>>      having smmu enabled) on a 10GBps port show an important
>>      networking performance degradation of around %40 (9.48Gbps
>>      linerate vs 5.45Gbps). If you need performance but without
>>      SMMU support you can use "iommu.passthrough=1" to disable
>>      SMMU.
>>
>> USB issue and workaround
>>
>>      There's a problem with the usb controllers in these chips
>>      generating smaller, 40-bit wide dma addresses instead of the 48-bit
>>      supported at the smmu input. So you end up in a situation where the
>>      smmu is mapped with 48-bit address translations, but the device
>>      generates transactions with clipped 40-bit addresses, thus smmu
>>      context faults are triggered. I encountered a similar situation for
>>      mmc that I  managed to fix in software [1] however for USB I did not
>>      find a proper place in the code to add a similar fix. The only
>>      workaround I found was to add this kernel parameter which limits the
>>      usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>      This workaround if far from ideal, so any suggestions for a code
>>      based workaround in this area would be greatly appreciated.
>
> If you have a nominally-64-bit device with a
> narrower-than-the-main-interconnect link in front of it, that should
> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
> provided the interconnect hierarchy can be described appropriately (or
> at least massaged sufficiently to satisfy the binding), e.g.:
>
> / {
>     ...
>
>     soc {
>         ranges;
>         dma-ranges = <0 0 10000 0>;
>
>         dev_48bit { ... };
>
>         periph_bus {
>             ranges;
>             dma-ranges = <0 0 100 0>;
>
>             dev_40bit { ... };
>         };
>     };
> };
>
> and if that fails to work as expected (except for PCI hosts where
> handling dma-ranges properly still needs sorting out), please do let us
> know ;)
>

Just to confirm, Is this [1] the change I was supposed to test?
Because if so, I'm still seeing context faults [2] with what looks like
clipped to 40-bits addresses. :-(
IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn
will be limited to the SMMU input size of 48-bit. Won't that overwrite
the default dma mask derived from dma-ranges?

---
Best Regards, Laurentiu

[1] -----------------------------------------------------------------

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 3bdea0470f69..a214c3df37fd 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -612,6 +612,7 @@
compatible = "snps,dwc3";
reg = <0x0 0x2f00000 0x0 0x10000>;
interrupts = <GIC_SPI 60 IRQ_TYPE_LEVEL_HIGH>;
+ dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
dr_mode = "host";
snps,quirk-frame-length-adjustment = <0x20>;
snps,dis_rxdet_inp3_quirk;
@@ -621,6 +622,7 @@
compatible = "snps,dwc3";
reg = <0x0 0x3000000 0x0 0x10000>;
interrupts = <GIC_SPI 61 IRQ_TYPE_LEVEL_HIGH>;
+ dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
dr_mode = "host";
snps,quirk-frame-length-adjustment = <0x20>;
snps,dis_rxdet_inp3_quirk;
@@ -630,6 +632,7 @@
compatible = "snps,dwc3";
reg = <0x0 0x3100000 0x0 0x10000>;
interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_HIGH>;
+ dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
dr_mode = "host";
snps,quirk-frame-length-adjustment = <0x20>;
snps,dis_rxdet_inp3_quirk;

[2] -----------------------------------------------------------------
[ 2.090577] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[ 2.096064] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
assigned bus number 2
[ 2.103720] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
[ 2.110346] arm-smmu 9000000.iommu: Unhandled context fault:
fsr=0x402, iova=0xffffffb000, fsynr=0x1b0000, cb=3
[ 2.120449] usb usb2: We don't know the algorithms for LPM for this
host, disabling LPM.
[ 2.128717] hub 2-0:1.0: USB hub found
[ 2.132473] hub 2-0:1.0: 1 port detected
[ 2.136527] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
[ 2.142014] xhci-hcd xhci-hcd.1.auto: new USB bus registered,
assigned bus number 3
[ 2.149747] xhci-hcd xhci-hcd.1.auto: hcc params 0x0220f66d hci
version 0x100 quirks 0x0000000002010010
[ 2.159149] xhci-hcd xhci-hcd.1.auto: irq 50, io mem 0x03000000
[ 2.165284] hub 3-0:1.0: USB hub found
[ 2.169039] hub 3-0:1.0: 1 port detected
[ 2.173051] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
[ 2.178536] xhci-hcd xhci-hcd.1.auto: new USB bus registered,
assigned bus number 4
[ 2.186193] xhci-hcd xhci-hcd.1.auto: Host supports USB 3.0 SuperSpeed
[ 2.192809] arm-smmu 9000000.iommu: Unhandled context fault:
fsr=0x402, iova=0xffffffb000, fsynr=0x1f0000, cb=4
[ 2.192822] usb usb4: We don't know the algorithms for LPM for this
host, disabling LPM.
[ 2.211141] hub 4-0:1.0: USB hub found
[ 2.214896] hub 4-0:1.0: 1 port detected
[ 2.218935] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
[ 2.224425] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
assigned bus number 5
[ 2.232153] xhci-hcd xhci-hcd.2.auto: hcc params 0x0220f66d hci
version 0x100 quirks 0x0000000002010010
[ 2.241562] xhci-hcd xhci-hcd.2.auto: irq 51, io mem 0x03100000
[ 2.247694] hub 5-0:1.0: USB hub found
[ 2.251449] hub 5-0:1.0: 1 port detected
[ 2.255458] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
[ 2.260945] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
assigned bus number 6
[ 2.268601] xhci-hcd xhci-hcd.2.auto: Host supports USB 3.0 SuperSpeed
[ 2.275218] arm-smmu 9000000.iommu: Unhandled context fault:
fsr=0x402, iova=0xffffffb000, fsynr=0x110000, cb=5
[ 2.275230] usb usb6: We don't know the algorithms for LPM for this
host, disabling LPM.


>> The patch set is based on net-next so, if generally agreed, I'd suggest
>> to get the patches through the netdev tree after getting all the Acks.
>>
>> [1]
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fpatch%2F10506627%2F&amp;data=02%7C01%7Claurentiu.tudor%40nxp.com%7C63c4e1dfc126488eb4ba08d61e336607%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636729603447603039&amp;sdata=XhjOX9aLgoe%2BSTBgZztv6zCz0vMebSXW%2Fnb2QcD5shY%3D&amp;reserved=0
>>
>>
>> Laurentiu Tudor (21):
>>    soc/fsl/qman: fixup liodns only on ppc targets
>>    soc/fsl/bman: map FBPR area in the iommu
>>    soc/fsl/qman: map FQD and PFDR areas in the iommu
>>    soc/fsl/qman-portal: map CENA area in the iommu
>>    soc/fsl/qbman: add APIs to retrieve the probing status
>>    soc/fsl/qman_portals: defer probe after qman's probe
>>    soc/fsl/bman_portals: defer probe after bman's probe
>>    soc/fsl/qbman_portals: add APIs to retrieve the probing status
>>    fsl/fman: backup and restore ICID registers
>>    fsl/fman: add API to get the device behind a fman port
>>    dpaa_eth: defer probing after qbman
>>    dpaa_eth: base dma mappings on the fman rx port
>>    dpaa_eth: fix iova handling for contiguous frames
>>    dpaa_eth: fix iova handling for sg frames
>>    dpaa_eth: fix SG frame cleanup
>>    arm64: dts: ls1046a: add smmu node
>>    arm64: dts: ls1043a: add smmu node
>>    arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
>>    arm64: dts: ls104x: add missing dma ranges property
>>    arm64: dts: ls104x: add iommu-map to pci controllers
>>    arm64: dts: ls104x: make dma-coherent global to the SoC
>>
>>   .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |  52 ++++++-
>>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |  48 +++++++
>>   .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 136 ++++++++++++------
>>   drivers/net/ethernet/freescale/fman/fman.c    |  35 ++++-
>>   drivers/net/ethernet/freescale/fman/fman.h    |   4 +
>>   .../net/ethernet/freescale/fman/fman_port.c   |  14 ++
>>   .../net/ethernet/freescale/fman/fman_port.h   |   2 +
>>   drivers/soc/fsl/qbman/bman_ccsr.c             |  23 +++
>>   drivers/soc/fsl/qbman/bman_portal.c           |  20 ++-
>>   drivers/soc/fsl/qbman/qman_ccsr.c             |  30 ++++
>>   drivers/soc/fsl/qbman/qman_portal.c           |  35 +++++
>>   include/soc/fsl/bman.h                        |  16 +++
>>   include/soc/fsl/qman.h                        |  17 +++
>>   13 files changed, 379 insertions(+), 53 deletions(-)
>>

2018-09-19 14:38:57

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

On 19/09/18 15:18, Laurentiu Tudor wrote:
> Hi Robin,
>
> On 19.09.2018 16:25, Robin Murphy wrote:
>> Hi Laurentiu,
>>
>> On 19/09/18 13:35, [email protected] wrote:
>>> From: Laurentiu Tudor <[email protected]>
>>>
>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>> and consists mostly in important driver fixes and the required device
>>> tree updates. It touches several subsystems and consists of three main
>>> parts:
>>>   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>>     reserved memory areas, fixes and defered probe support
>>>   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>     consisting in misc dma mapping related fixes and probe ordering
>>>   - addition of the actual arm smmu device tree node together with
>>>     various adjustments to the device trees
>>>
>>> Performance impact
>>>
>>>      Running iperf benchmarks in a back-to-back setup (both sides
>>>      having smmu enabled) on a 10GBps port show an important
>>>      networking performance degradation of around %40 (9.48Gbps
>>>      linerate vs 5.45Gbps). If you need performance but without
>>>      SMMU support you can use "iommu.passthrough=1" to disable
>>>      SMMU.
>>>
>>> USB issue and workaround
>>>
>>>      There's a problem with the usb controllers in these chips
>>>      generating smaller, 40-bit wide dma addresses instead of the 48-bit
>>>      supported at the smmu input. So you end up in a situation where the
>>>      smmu is mapped with 48-bit address translations, but the device
>>>      generates transactions with clipped 40-bit addresses, thus smmu
>>>      context faults are triggered. I encountered a similar situation for
>>>      mmc that I  managed to fix in software [1] however for USB I did not
>>>      find a proper place in the code to add a similar fix. The only
>>>      workaround I found was to add this kernel parameter which limits the
>>>      usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>      This workaround if far from ideal, so any suggestions for a code
>>>      based workaround in this area would be greatly appreciated.
>>
>> If you have a nominally-64-bit device with a
>> narrower-than-the-main-interconnect link in front of it, that should
>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>> provided the interconnect hierarchy can be described appropriately (or
>> at least massaged sufficiently to satisfy the binding), e.g.:
>>
>> / {
>>     ...
>>
>>     soc {
>>         ranges;
>>         dma-ranges = <0 0 10000 0>;
>>
>>         dev_48bit { ... };
>>
>>         periph_bus {
>>             ranges;
>>             dma-ranges = <0 0 100 0>;
>>
>>             dev_40bit { ... };
>>         };
>>     };
>> };
>>
>> and if that fails to work as expected (except for PCI hosts where
>> handling dma-ranges properly still needs sorting out), please do let us
>> know ;)
>>
>
> Just to confirm, Is this [1] the change I was supposed to test?

Not quite - dma-ranges is only valid for nodes representing a bus, so
putting it directly in the USB device nodes doesn't work (FWIW that's
why PCI is broken, because the parser doesn't expect the
bus-as-leaf-node case). That's teh point of that intermediate simple-bus
node represented by "periph_bus" in my example (sorry, I should have put
compatibles in to make it clearer) - often that's actually true to life
(i.e. "soc" is something like a CCI and "periph_bus" is something like
an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
CCI ports) but at worst it's just a necessary evil to make the binding
happy (if it literally only represents the point-to-point link between
the device master port and interconnect slave port).

> Because if so, I'm still seeing context faults [2] with what looks like
> clipped to 40-bits addresses. :-(
> IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn
> will be limited to the SMMU input size of 48-bit. Won't that overwrite
> the default dma mask derived from dma-ranges?

Indeed it will, but those default masks were effectively only ever a
best-effort thing anyway - it's an ease-of-implementation detail that
bus_dma_mask is not currently reflected in the device masks, although we
may eventually change that; the crucial part is that the DMA ops
implementations know about it and should now enforce it properly
regardless of whether drivers set something wider.

Robin.

>
> ---
> Best Regards, Laurentiu
>
> [1] -----------------------------------------------------------------
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 3bdea0470f69..a214c3df37fd 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -612,6 +612,7 @@
> compatible = "snps,dwc3";
> reg = <0x0 0x2f00000 0x0 0x10000>;
> interrupts = <GIC_SPI 60 IRQ_TYPE_LEVEL_HIGH>;
> + dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
> dr_mode = "host";
> snps,quirk-frame-length-adjustment = <0x20>;
> snps,dis_rxdet_inp3_quirk;
> @@ -621,6 +622,7 @@
> compatible = "snps,dwc3";
> reg = <0x0 0x3000000 0x0 0x10000>;
> interrupts = <GIC_SPI 61 IRQ_TYPE_LEVEL_HIGH>;
> + dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
> dr_mode = "host";
> snps,quirk-frame-length-adjustment = <0x20>;
> snps,dis_rxdet_inp3_quirk;
> @@ -630,6 +632,7 @@
> compatible = "snps,dwc3";
> reg = <0x0 0x3100000 0x0 0x10000>;
> interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_HIGH>;
> + dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
> dr_mode = "host";
> snps,quirk-frame-length-adjustment = <0x20>;
> snps,dis_rxdet_inp3_quirk;
>
> [2] -----------------------------------------------------------------
> [ 2.090577] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
> [ 2.096064] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned bus number 2
> [ 2.103720] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
> [ 2.110346] arm-smmu 9000000.iommu: Unhandled context fault:
> fsr=0x402, iova=0xffffffb000, fsynr=0x1b0000, cb=3
> [ 2.120449] usb usb2: We don't know the algorithms for LPM for this
> host, disabling LPM.
> [ 2.128717] hub 2-0:1.0: USB hub found
> [ 2.132473] hub 2-0:1.0: 1 port detected
> [ 2.136527] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> [ 2.142014] xhci-hcd xhci-hcd.1.auto: new USB bus registered,
> assigned bus number 3
> [ 2.149747] xhci-hcd xhci-hcd.1.auto: hcc params 0x0220f66d hci
> version 0x100 quirks 0x0000000002010010
> [ 2.159149] xhci-hcd xhci-hcd.1.auto: irq 50, io mem 0x03000000
> [ 2.165284] hub 3-0:1.0: USB hub found
> [ 2.169039] hub 3-0:1.0: 1 port detected
> [ 2.173051] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> [ 2.178536] xhci-hcd xhci-hcd.1.auto: new USB bus registered,
> assigned bus number 4
> [ 2.186193] xhci-hcd xhci-hcd.1.auto: Host supports USB 3.0 SuperSpeed
> [ 2.192809] arm-smmu 9000000.iommu: Unhandled context fault:
> fsr=0x402, iova=0xffffffb000, fsynr=0x1f0000, cb=4
> [ 2.192822] usb usb4: We don't know the algorithms for LPM for this
> host, disabling LPM.
> [ 2.211141] hub 4-0:1.0: USB hub found
> [ 2.214896] hub 4-0:1.0: 1 port detected
> [ 2.218935] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
> [ 2.224425] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
> assigned bus number 5
> [ 2.232153] xhci-hcd xhci-hcd.2.auto: hcc params 0x0220f66d hci
> version 0x100 quirks 0x0000000002010010
> [ 2.241562] xhci-hcd xhci-hcd.2.auto: irq 51, io mem 0x03100000
> [ 2.247694] hub 5-0:1.0: USB hub found
> [ 2.251449] hub 5-0:1.0: 1 port detected
> [ 2.255458] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
> [ 2.260945] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
> assigned bus number 6
> [ 2.268601] xhci-hcd xhci-hcd.2.auto: Host supports USB 3.0 SuperSpeed
> [ 2.275218] arm-smmu 9000000.iommu: Unhandled context fault:
> fsr=0x402, iova=0xffffffb000, fsynr=0x110000, cb=5
> [ 2.275230] usb usb6: We don't know the algorithms for LPM for this
> host, disabling LPM.
>
>
>>> The patch set is based on net-next so, if generally agreed, I'd suggest
>>> to get the patches through the netdev tree after getting all the Acks.
>>>
>>> [1]
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fpatch%2F10506627%2F&amp;data=02%7C01%7Claurentiu.tudor%40nxp.com%7C63c4e1dfc126488eb4ba08d61e336607%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636729603447603039&amp;sdata=XhjOX9aLgoe%2BSTBgZztv6zCz0vMebSXW%2Fnb2QcD5shY%3D&amp;reserved=0
>>>
>>>
>>> Laurentiu Tudor (21):
>>>    soc/fsl/qman: fixup liodns only on ppc targets
>>>    soc/fsl/bman: map FBPR area in the iommu
>>>    soc/fsl/qman: map FQD and PFDR areas in the iommu
>>>    soc/fsl/qman-portal: map CENA area in the iommu
>>>    soc/fsl/qbman: add APIs to retrieve the probing status
>>>    soc/fsl/qman_portals: defer probe after qman's probe
>>>    soc/fsl/bman_portals: defer probe after bman's probe
>>>    soc/fsl/qbman_portals: add APIs to retrieve the probing status
>>>    fsl/fman: backup and restore ICID registers
>>>    fsl/fman: add API to get the device behind a fman port
>>>    dpaa_eth: defer probing after qbman
>>>    dpaa_eth: base dma mappings on the fman rx port
>>>    dpaa_eth: fix iova handling for contiguous frames
>>>    dpaa_eth: fix iova handling for sg frames
>>>    dpaa_eth: fix SG frame cleanup
>>>    arm64: dts: ls1046a: add smmu node
>>>    arm64: dts: ls1043a: add smmu node
>>>    arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
>>>    arm64: dts: ls104x: add missing dma ranges property
>>>    arm64: dts: ls104x: add iommu-map to pci controllers
>>>    arm64: dts: ls104x: make dma-coherent global to the SoC
>>>
>>>   .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |  52 ++++++-
>>>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |  48 +++++++
>>>   .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 136 ++++++++++++------
>>>   drivers/net/ethernet/freescale/fman/fman.c    |  35 ++++-
>>>   drivers/net/ethernet/freescale/fman/fman.h    |   4 +
>>>   .../net/ethernet/freescale/fman/fman_port.c   |  14 ++
>>>   .../net/ethernet/freescale/fman/fman_port.h   |   2 +
>>>   drivers/soc/fsl/qbman/bman_ccsr.c             |  23 +++
>>>   drivers/soc/fsl/qbman/bman_portal.c           |  20 ++-
>>>   drivers/soc/fsl/qbman/qman_ccsr.c             |  30 ++++
>>>   drivers/soc/fsl/qbman/qman_portal.c           |  35 +++++
>>>   include/soc/fsl/bman.h                        |  16 +++
>>>   include/soc/fsl/qman.h                        |  17 +++
>>>   13 files changed, 379 insertions(+), 53 deletions(-)
>> >

2018-09-20 10:40:30

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A



On 19.09.2018 17:37, Robin Murphy wrote:
> On 19/09/18 15:18, Laurentiu Tudor wrote:
>> Hi Robin,
>>
>> On 19.09.2018 16:25, Robin Murphy wrote:
>>> Hi Laurentiu,
>>>
>>> On 19/09/18 13:35, [email protected] wrote:
>>>> From: Laurentiu Tudor <[email protected]>
>>>>
>>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>>> and consists mostly in important driver fixes and the required device
>>>> tree updates. It touches several subsystems and consists of three main
>>>> parts:
>>>>    - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>>>      reserved memory areas, fixes and defered probe support
>>>>    - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>>      consisting in misc dma mapping related fixes and probe ordering
>>>>    - addition of the actual arm smmu device tree node together with
>>>>      various adjustments to the device trees
>>>>
>>>> Performance impact
>>>>
>>>>       Running iperf benchmarks in a back-to-back setup (both sides
>>>>       having smmu enabled) on a 10GBps port show an important
>>>>       networking performance degradation of around %40 (9.48Gbps
>>>>       linerate vs 5.45Gbps). If you need performance but without
>>>>       SMMU support you can use "iommu.passthrough=1" to disable
>>>>       SMMU.
>>>>
>>>> USB issue and workaround
>>>>
>>>>       There's a problem with the usb controllers in these chips
>>>>       generating smaller, 40-bit wide dma addresses instead of the
>>>> 48-bit
>>>>       supported at the smmu input. So you end up in a situation
>>>> where the
>>>>       smmu is mapped with 48-bit address translations, but the device
>>>>       generates transactions with clipped 40-bit addresses, thus smmu
>>>>       context faults are triggered. I encountered a similar
>>>> situation for
>>>>       mmc that I  managed to fix in software [1] however for USB I
>>>> did not
>>>>       find a proper place in the code to add a similar fix. The only
>>>>       workaround I found was to add this kernel parameter which
>>>> limits the
>>>>       usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>>       This workaround if far from ideal, so any suggestions for a code
>>>>       based workaround in this area would be greatly appreciated.
>>>
>>> If you have a nominally-64-bit device with a
>>> narrower-than-the-main-interconnect link in front of it, that should
>>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>>> provided the interconnect hierarchy can be described appropriately (or
>>> at least massaged sufficiently to satisfy the binding), e.g.:
>>>
>>> / {
>>>       ...
>>>
>>>       soc {
>>>           ranges;
>>>           dma-ranges = <0 0 10000 0>;
>>>
>>>           dev_48bit { ... };
>>>
>>>           periph_bus {
>>>               ranges;
>>>               dma-ranges = <0 0 100 0>;
>>>
>>>               dev_40bit { ... };
>>>           };
>>>       };
>>> };
>>>
>>> and if that fails to work as expected (except for PCI hosts where
>>> handling dma-ranges properly still needs sorting out), please do let us
>>> know ;)
>>>
>>
>> Just to confirm, Is this [1] the change I was supposed to test?
>
> Not quite - dma-ranges is only valid for nodes representing a bus, so
> putting it directly in the USB device nodes doesn't work (FWIW that's
> why PCI is broken, because the parser doesn't expect the
> bus-as-leaf-node case). That's teh point of that intermediate simple-bus
> node represented by "periph_bus" in my example (sorry, I should have put
> compatibles in to make it clearer) - often that's actually true to life
> (i.e. "soc" is something like a CCI and "periph_bus" is something like
> an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
> CCI ports) but at worst it's just a necessary evil to make the binding
> happy (if it literally only represents the point-to-point link between
> the device master port and interconnect slave port).
>

Quick update: so I adjusted to device tree according to your example and
it works so now I can get rid of that nasty kernel arg based workaround,
yey! :-)
Thanks a lot, that was really helpful.

---
Best Regards, Laurentiu

2018-09-20 11:50:08

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

On 20/09/18 11:38, Laurentiu Tudor wrote:
>
>
> On 19.09.2018 17:37, Robin Murphy wrote:
>> On 19/09/18 15:18, Laurentiu Tudor wrote:
>>> Hi Robin,
>>>
>>> On 19.09.2018 16:25, Robin Murphy wrote:
>>>> Hi Laurentiu,
>>>>
>>>> On 19/09/18 13:35, [email protected] wrote:
>>>>> From: Laurentiu Tudor <[email protected]>
>>>>>
>>>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>>>> and consists mostly in important driver fixes and the required device
>>>>> tree updates. It touches several subsystems and consists of three main
>>>>> parts:
>>>>>    - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>>>>      reserved memory areas, fixes and defered probe support
>>>>>    - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>>>      consisting in misc dma mapping related fixes and probe ordering
>>>>>    - addition of the actual arm smmu device tree node together with
>>>>>      various adjustments to the device trees
>>>>>
>>>>> Performance impact
>>>>>
>>>>>       Running iperf benchmarks in a back-to-back setup (both sides
>>>>>       having smmu enabled) on a 10GBps port show an important
>>>>>       networking performance degradation of around %40 (9.48Gbps
>>>>>       linerate vs 5.45Gbps). If you need performance but without
>>>>>       SMMU support you can use "iommu.passthrough=1" to disable
>>>>>       SMMU.

I should have said before - thanks for the numbers there as well. Always
good to add another datapoint to my collection. If you're interested
I've added SMMUv2 support to the "non-strict mode" series (of which I
should be posting v8 soon), so it might be fun to see how well that
works on MMU-500 in the real world.

>>>>>
>>>>> USB issue and workaround
>>>>>
>>>>>       There's a problem with the usb controllers in these chips
>>>>>       generating smaller, 40-bit wide dma addresses instead of the
>>>>> 48-bit
>>>>>       supported at the smmu input. So you end up in a situation
>>>>> where the
>>>>>       smmu is mapped with 48-bit address translations, but the device
>>>>>       generates transactions with clipped 40-bit addresses, thus smmu
>>>>>       context faults are triggered. I encountered a similar
>>>>> situation for
>>>>>       mmc that I  managed to fix in software [1] however for USB I
>>>>> did not
>>>>>       find a proper place in the code to add a similar fix. The only
>>>>>       workaround I found was to add this kernel parameter which
>>>>> limits the
>>>>>       usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>>>       This workaround if far from ideal, so any suggestions for a code
>>>>>       based workaround in this area would be greatly appreciated.
>>>>
>>>> If you have a nominally-64-bit device with a
>>>> narrower-than-the-main-interconnect link in front of it, that should
>>>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>>>> provided the interconnect hierarchy can be described appropriately (or
>>>> at least massaged sufficiently to satisfy the binding), e.g.:
>>>>
>>>> / {
>>>>       ...
>>>>
>>>>       soc {
>>>>           ranges;
>>>>           dma-ranges = <0 0 10000 0>;
>>>>
>>>>           dev_48bit { ... };
>>>>
>>>>           periph_bus {
>>>>               ranges;
>>>>               dma-ranges = <0 0 100 0>;
>>>>
>>>>               dev_40bit { ... };
>>>>           };
>>>>       };
>>>> };
>>>>
>>>> and if that fails to work as expected (except for PCI hosts where
>>>> handling dma-ranges properly still needs sorting out), please do let us
>>>> know ;)
>>>>
>>>
>>> Just to confirm, Is this [1] the change I was supposed to test?
>>
>> Not quite - dma-ranges is only valid for nodes representing a bus, so
>> putting it directly in the USB device nodes doesn't work (FWIW that's
>> why PCI is broken, because the parser doesn't expect the
>> bus-as-leaf-node case). That's teh point of that intermediate simple-bus
>> node represented by "periph_bus" in my example (sorry, I should have put
>> compatibles in to make it clearer) - often that's actually true to life
>> (i.e. "soc" is something like a CCI and "periph_bus" is something like
>> an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
>> CCI ports) but at worst it's just a necessary evil to make the binding
>> happy (if it literally only represents the point-to-point link between
>> the device master port and interconnect slave port).
>>
>
> Quick update: so I adjusted to device tree according to your example and
> it works so now I can get rid of that nasty kernel arg based workaround,
> yey! :-)

Cool! In fact, judging by the block diagrams on the website, the "basic
peripherals and interconnect" section hanging off the side of the CCI
implies that probably is true to the real topology as I imagined, so it
doesn't even count as a horrible hack :)

> Thanks a lot, that was really helpful.

No problem. FWIW if you ever come to doing ACPI support for these SoCs,
the equivalent is merely a case of setting the device memory address
size limit field appropriately for all the named components.

Robin.

2018-09-20 14:35:23

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A



On 20.09.2018 14:49, Robin Murphy wrote:
> On 20/09/18 11:38, Laurentiu Tudor wrote:
>>
>>
>> On 19.09.2018 17:37, Robin Murphy wrote:
>>> On 19/09/18 15:18, Laurentiu Tudor wrote:
>>>> Hi Robin,
>>>>
>>>> On 19.09.2018 16:25, Robin Murphy wrote:
>>>>> Hi Laurentiu,
>>>>>
>>>>> On 19/09/18 13:35, [email protected] wrote:
>>>>>> From: Laurentiu Tudor <[email protected]>
>>>>>>
>>>>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>>>>> and consists mostly in important driver fixes and the required device
>>>>>> tree updates. It touches several subsystems and consists of three
>>>>>> main
>>>>>> parts:
>>>>>>     - changes in soc/drivers/fsl/qbman drivers adding iommu
>>>>>> mapping of
>>>>>>       reserved memory areas, fixes and defered probe support
>>>>>>     - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>>>>       consisting in misc dma mapping related fixes and probe ordering
>>>>>>     - addition of the actual arm smmu device tree node together with
>>>>>>       various adjustments to the device trees
>>>>>>
>>>>>> Performance impact
>>>>>>
>>>>>>        Running iperf benchmarks in a back-to-back setup (both sides
>>>>>>        having smmu enabled) on a 10GBps port show an important
>>>>>>        networking performance degradation of around %40 (9.48Gbps
>>>>>>        linerate vs 5.45Gbps). If you need performance but without
>>>>>>        SMMU support you can use "iommu.passthrough=1" to disable
>>>>>>        SMMU.
>
> I should have said before - thanks for the numbers there as well. Always
> good to add another datapoint to my collection. If you're interested
> I've added SMMUv2 support to the "non-strict mode" series (of which I
> should be posting v8 soon), so it might be fun to see how well that
> works on MMU-500 in the real world.

Hmm, I think I gave those a try some weeks ago and vaguely remember that
I did see improvements. Can't remember the numbers off the top of my
head but I'll re-test with the latest spin and update the numbers.

>>>>>>
>>>>>> USB issue and workaround
>>>>>>
>>>>>>        There's a problem with the usb controllers in these chips
>>>>>>        generating smaller, 40-bit wide dma addresses instead of the
>>>>>> 48-bit
>>>>>>        supported at the smmu input. So you end up in a situation
>>>>>> where the
>>>>>>        smmu is mapped with 48-bit address translations, but the
>>>>>> device
>>>>>>        generates transactions with clipped 40-bit addresses, thus
>>>>>> smmu
>>>>>>        context faults are triggered. I encountered a similar
>>>>>> situation for
>>>>>>        mmc that I  managed to fix in software [1] however for USB I
>>>>>> did not
>>>>>>        find a proper place in the code to add a similar fix. The only
>>>>>>        workaround I found was to add this kernel parameter which
>>>>>> limits the
>>>>>>        usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>>>>        This workaround if far from ideal, so any suggestions for a
>>>>>> code
>>>>>>        based workaround in this area would be greatly appreciated.
>>>>>
>>>>> If you have a nominally-64-bit device with a
>>>>> narrower-than-the-main-interconnect link in front of it, that should
>>>>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>>>>> provided the interconnect hierarchy can be described appropriately (or
>>>>> at least massaged sufficiently to satisfy the binding), e.g.:
>>>>>
>>>>> / {
>>>>>        ...
>>>>>
>>>>>        soc {
>>>>>            ranges;
>>>>>            dma-ranges = <0 0 10000 0>;
>>>>>
>>>>>            dev_48bit { ... };
>>>>>
>>>>>            periph_bus {
>>>>>                ranges;
>>>>>                dma-ranges = <0 0 100 0>;
>>>>>
>>>>>                dev_40bit { ... };
>>>>>            };
>>>>>        };
>>>>> };
>>>>>
>>>>> and if that fails to work as expected (except for PCI hosts where
>>>>> handling dma-ranges properly still needs sorting out), please do
>>>>> let us
>>>>> know ;)
>>>>>
>>>>
>>>> Just to confirm, Is this [1] the change I was supposed to test?
>>>
>>> Not quite - dma-ranges is only valid for nodes representing a bus, so
>>> putting it directly in the USB device nodes doesn't work (FWIW that's
>>> why PCI is broken, because the parser doesn't expect the
>>> bus-as-leaf-node case). That's teh point of that intermediate simple-bus
>>> node represented by "periph_bus" in my example (sorry, I should have put
>>> compatibles in to make it clearer) - often that's actually true to life
>>> (i.e. "soc" is something like a CCI and "periph_bus" is something like
>>> an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
>>> CCI ports) but at worst it's just a necessary evil to make the binding
>>> happy (if it literally only represents the point-to-point link between
>>> the device master port and interconnect slave port).
>>>
>>
>> Quick update: so I adjusted to device tree according to your example and
>> it works so now I can get rid of that nasty kernel arg based workaround,
>> yey! :-)
>
> Cool! In fact, judging by the block diagrams on the website, the "basic
> peripherals and interconnect" section hanging off the side of the CCI
> implies that probably is true to the real topology as I imagined, so it
> doesn't even count as a horrible hack :)

Indeed, on this chip there's a NoC lumping behind it several low-speed
devices such as usb, sata, esdhc.

>> Thanks a lot, that was really helpful.
>
> No problem. FWIW if you ever come to doing ACPI support for these SoCs,
> the equivalent is merely a case of setting the device memory address
> size limit field appropriately for all the named components.
>

Thanks, I'll keep this in mind. If i remember correctly, there are
people over here working on UEFI + ACPI support for some LS chips but
progress appears to be slow.

---
Best Regards, Laurentiu

2018-09-20 19:07:46

by Leo Li

[permalink] [raw]
Subject: Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A

On Thu, Sep 20, 2018 at 5:39 AM Laurentiu Tudor <[email protected]> wrote:
>
>
>
> On 19.09.2018 17:37, Robin Murphy wrote:
> > On 19/09/18 15:18, Laurentiu Tudor wrote:
> >> Hi Robin,
> >>
> >> On 19.09.2018 16:25, Robin Murphy wrote:
> >>> Hi Laurentiu,
> >>>
> >>> On 19/09/18 13:35, [email protected] wrote:
> >>>> From: Laurentiu Tudor <[email protected]>
> >>>>
> >>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
> >>>> and consists mostly in important driver fixes and the required device
> >>>> tree updates. It touches several subsystems and consists of three main
> >>>> parts:
> >>>> - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
> >>>> reserved memory areas, fixes and defered probe support
> >>>> - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
> >>>> consisting in misc dma mapping related fixes and probe ordering
> >>>> - addition of the actual arm smmu device tree node together with
> >>>> various adjustments to the device trees
> >>>>
> >>>> Performance impact
> >>>>
> >>>> Running iperf benchmarks in a back-to-back setup (both sides
> >>>> having smmu enabled) on a 10GBps port show an important
> >>>> networking performance degradation of around %40 (9.48Gbps
> >>>> linerate vs 5.45Gbps). If you need performance but without
> >>>> SMMU support you can use "iommu.passthrough=1" to disable
> >>>> SMMU.
> >>>>
> >>>> USB issue and workaround
> >>>>
> >>>> There's a problem with the usb controllers in these chips
> >>>> generating smaller, 40-bit wide dma addresses instead of the
> >>>> 48-bit
> >>>> supported at the smmu input. So you end up in a situation
> >>>> where the
> >>>> smmu is mapped with 48-bit address translations, but the device
> >>>> generates transactions with clipped 40-bit addresses, thus smmu
> >>>> context faults are triggered. I encountered a similar
> >>>> situation for
> >>>> mmc that I managed to fix in software [1] however for USB I
> >>>> did not
> >>>> find a proper place in the code to add a similar fix. The only
> >>>> workaround I found was to add this kernel parameter which
> >>>> limits the
> >>>> usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
> >>>> This workaround if far from ideal, so any suggestions for a code
> >>>> based workaround in this area would be greatly appreciated.
> >>>
> >>> If you have a nominally-64-bit device with a
> >>> narrower-than-the-main-interconnect link in front of it, that should
> >>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
> >>> provided the interconnect hierarchy can be described appropriately (or
> >>> at least massaged sufficiently to satisfy the binding), e.g.:
> >>>
> >>> / {
> >>> ...
> >>>
> >>> soc {
> >>> ranges;
> >>> dma-ranges = <0 0 10000 0>;
> >>>
> >>> dev_48bit { ... };
> >>>
> >>> periph_bus {
> >>> ranges;
> >>> dma-ranges = <0 0 100 0>;
> >>>
> >>> dev_40bit { ... };
> >>> };
> >>> };
> >>> };
> >>>
> >>> and if that fails to work as expected (except for PCI hosts where
> >>> handling dma-ranges properly still needs sorting out), please do let us
> >>> know ;)
> >>>
> >>
> >> Just to confirm, Is this [1] the change I was supposed to test?
> >
> > Not quite - dma-ranges is only valid for nodes representing a bus, so
> > putting it directly in the USB device nodes doesn't work (FWIW that's
> > why PCI is broken, because the parser doesn't expect the
> > bus-as-leaf-node case). That's teh point of that intermediate simple-bus
> > node represented by "periph_bus" in my example (sorry, I should have put
> > compatibles in to make it clearer) - often that's actually true to life
> > (i.e. "soc" is something like a CCI and "periph_bus" is something like
> > an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
> > CCI ports) but at worst it's just a necessary evil to make the binding
> > happy (if it literally only represents the point-to-point link between
> > the device master port and interconnect slave port).
> >
>
> Quick update: so I adjusted to device tree according to your example and
> it works so now I can get rid of that nasty kernel arg based workaround,
> yey! :-)

Great that we have a generic solution like I hoped for! So you will
submit a new revision of the series to include these dts updates,
right?

Regards,
Leo

2018-09-21 07:34:02

by Laurentiu Tudor

[permalink] [raw]
Subject: RE: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A



> -----Original Message-----
> From: Li Yang [mailto:[email protected]]
> Sent: Thursday, September 20, 2018 10:07 PM
>
> On Thu, Sep 20, 2018 at 5:39 AM Laurentiu Tudor <[email protected]>
> wrote:
> >
> >
> >
> > On 19.09.2018 17:37, Robin Murphy wrote:
> > > On 19/09/18 15:18, Laurentiu Tudor wrote:
> > >> Hi Robin,
> > >>
> > >> On 19.09.2018 16:25, Robin Murphy wrote:
> > >>> Hi Laurentiu,
> > >>>
> > >>> On 19/09/18 13:35, [email protected] wrote:
> > >>>> From: Laurentiu Tudor <[email protected]>
> > >>>>
> > >>>> This patch series adds SMMU support for NXP LS1043A and LS1046A
> chips
> > >>>> and consists mostly in important driver fixes and the required
> device
> > >>>> tree updates. It touches several subsystems and consists of three
> main
> > >>>> parts:
> > >>>> - changes in soc/drivers/fsl/qbman drivers adding iommu mapping
> of
> > >>>> reserved memory areas, fixes and defered probe support
> > >>>> - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
> > >>>> consisting in misc dma mapping related fixes and probe
> ordering
> > >>>> - addition of the actual arm smmu device tree node together with
> > >>>> various adjustments to the device trees
> > >>>>
> > >>>> Performance impact
> > >>>>
> > >>>> Running iperf benchmarks in a back-to-back setup (both sides
> > >>>> having smmu enabled) on a 10GBps port show an important
> > >>>> networking performance degradation of around %40 (9.48Gbps
> > >>>> linerate vs 5.45Gbps). If you need performance but without
> > >>>> SMMU support you can use "iommu.passthrough=1" to disable
> > >>>> SMMU.
> > >>>>
> > >>>> USB issue and workaround
> > >>>>
> > >>>> There's a problem with the usb controllers in these chips
> > >>>> generating smaller, 40-bit wide dma addresses instead of the
> > >>>> 48-bit
> > >>>> supported at the smmu input. So you end up in a situation
> > >>>> where the
> > >>>> smmu is mapped with 48-bit address translations, but the
> device
> > >>>> generates transactions with clipped 40-bit addresses, thus
> smmu
> > >>>> context faults are triggered. I encountered a similar
> > >>>> situation for
> > >>>> mmc that I managed to fix in software [1] however for USB I
> > >>>> did not
> > >>>> find a proper place in the code to add a similar fix. The
> only
> > >>>> workaround I found was to add this kernel parameter which
> > >>>> limits the
> > >>>> usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
> > >>>> This workaround if far from ideal, so any suggestions for a
> code
> > >>>> based workaround in this area would be greatly appreciated.
> > >>>
> > >>> If you have a nominally-64-bit device with a
> > >>> narrower-than-the-main-interconnect link in front of it, that should
> > >>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-
> ranges,
> > >>> provided the interconnect hierarchy can be described appropriately
> (or
> > >>> at least massaged sufficiently to satisfy the binding), e.g.:
> > >>>
> > >>> / {
> > >>> ...
> > >>>
> > >>> soc {
> > >>> ranges;
> > >>> dma-ranges = <0 0 10000 0>;
> > >>>
> > >>> dev_48bit { ... };
> > >>>
> > >>> periph_bus {
> > >>> ranges;
> > >>> dma-ranges = <0 0 100 0>;
> > >>>
> > >>> dev_40bit { ... };
> > >>> };
> > >>> };
> > >>> };
> > >>>
> > >>> and if that fails to work as expected (except for PCI hosts where
> > >>> handling dma-ranges properly still needs sorting out), please do let
> us
> > >>> know ;)
> > >>>
> > >>
> > >> Just to confirm, Is this [1] the change I was supposed to test?
> > >
> > > Not quite - dma-ranges is only valid for nodes representing a bus, so
> > > putting it directly in the USB device nodes doesn't work (FWIW that's
> > > why PCI is broken, because the parser doesn't expect the
> > > bus-as-leaf-node case). That's teh point of that intermediate simple-
> bus
> > > node represented by "periph_bus" in my example (sorry, I should have
> put
> > > compatibles in to make it clearer) - often that's actually true to
> life
> > > (i.e. "soc" is something like a CCI and "periph_bus" is something like
> > > an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
> > > CCI ports) but at worst it's just a necessary evil to make the binding
> > > happy (if it literally only represents the point-to-point link between
> > > the device master port and interconnect slave port).
> > >
> >
> > Quick update: so I adjusted to device tree according to your example and
> > it works so now I can get rid of that nasty kernel arg based workaround,
> > yey! :-)
>
> Great that we have a generic solution like I hoped for! So you will
> submit a new revision of the series to include these dts updates,
> right?
>

Yes, I already have it prepared. Just delaying the v2 for a few days maybe there will be some more feedback.

---
Best Regards, Laurentiu