The ARC HS processor provides an IOC port (I/O coherency bus
interface) that allows external devices such as DMA devices
to access memory through the cache hierarchy, providing
coherency between I/O transactions and the complete memory
hierarchy.
Some recent SoC with ARC HS (like HSDK) allow to select bus
port (IOC or non-IOC port) for connecting DMA devices in runtime.
With this patch we can use both HW-coherent and regular DMA
peripherals simultaneously.
NOTE:
This patch series was stress tested on HSDK with iperf3 (ethernet)
and bonie++ (usb and sdio).
NOTE:
If you want to test some device without IOC it is not enough
to remove "dma-coherent" property from dts. You had to remap this
device to regular DDR AXI port intead of IOC AXI port.
You also need to take into account that all DMA devices on ARC will
not work without IOC after commit:
a8eb92d02dd7 - arc: fix arc_dma_{map,unmap}_page <Christoph Hellwig>
(I will create fix soon, see STAR 9001374807)
NOTE:
We don't touch any aperture configuration in this patch series. So we
don't switch any devices between IOC and non-IOC AXI ports on any board.
It can be done later if it is required.
Eugeniy Paltsev (4):
ARC: DTS: mark DMA devices connected through IOC port as dma-coherent
ARC: allow to use IOC and non-IOC DMA devices simultaneously
ARC: refactor arch/arc/mm/dma.c
ARC: IOC: panic if both IOC and ZONE_HIGHMEM enabled
arch/arc/Kconfig | 1 +
arch/arc/boot/dts/axc003.dtsi | 26 +++++++++++++++++
arch/arc/boot/dts/axc003_idu.dtsi | 26 +++++++++++++++++
arch/arc/boot/dts/hsdk.dts | 4 +++
arch/arc/include/asm/dma-mapping.h | 13 +++++++++
arch/arc/mm/cache.c | 30 ++++++++++----------
arch/arc/mm/dma.c | 58 ++++++++++++++++++--------------------
7 files changed, 112 insertions(+), 46 deletions(-)
create mode 100644 arch/arc/include/asm/dma-mapping.h
--
2.14.4
Mark DMA devices on AXS103 and HSDK boards connected through IOC
port as dma-coherent.
Signed-off-by: Eugeniy Paltsev <[email protected]>
---
arch/arc/boot/dts/axc003.dtsi | 26 ++++++++++++++++++++++++++
arch/arc/boot/dts/axc003_idu.dtsi | 26 ++++++++++++++++++++++++++
arch/arc/boot/dts/hsdk.dts | 4 ++++
3 files changed, 56 insertions(+)
diff --git a/arch/arc/boot/dts/axc003.dtsi b/arch/arc/boot/dts/axc003.dtsi
index dc91c663bcc0..d75d65ddf8e3 100644
--- a/arch/arc/boot/dts/axc003.dtsi
+++ b/arch/arc/boot/dts/axc003.dtsi
@@ -93,6 +93,32 @@
};
};
+ /*
+ * Mark DMA peripherals connected via IOC port as dma-coherent. We do
+ * it via overlay because peripherals defined in axs10x_mb.dtsi are
+ * used for both AXS101 and AXS103 boards and only AXS103 has IOC (so
+ * only AXS103 board has HW-coherent DMA peripherals)
+ * We don't need to mark pgu@17000 as dma-coherent because it uses
+ * external DMA buffer located outside of IOC aperture.
+ */
+ axs10x_mb {
+ ethernet@0x18000 {
+ dma-coherent;
+ };
+
+ ehci@0x40000 {
+ dma-coherent;
+ };
+
+ ohci@0x60000 {
+ dma-coherent;
+ };
+
+ mmc@0x15000 {
+ dma-coherent;
+ };
+ };
+
/*
* The DW APB ICTL intc on MB is connected to CPU intc via a
* DT "invisible" DW APB GPIO block, configured to simply pass thru
diff --git a/arch/arc/boot/dts/axc003_idu.dtsi b/arch/arc/boot/dts/axc003_idu.dtsi
index 69ff4895f2ba..a05bb737ea63 100644
--- a/arch/arc/boot/dts/axc003_idu.dtsi
+++ b/arch/arc/boot/dts/axc003_idu.dtsi
@@ -100,6 +100,32 @@
};
};
+ /*
+ * Mark DMA peripherals connected via IOC port as dma-coherent. We do
+ * it via overlay because peripherals defined in axs10x_mb.dtsi are
+ * used for both AXS101 and AXS103 boards and only AXS103 has IOC (so
+ * only AXS103 board has HW-coherent DMA peripherals)
+ * We don't need to mark pgu@17000 as dma-coherent because it uses
+ * external DMA buffer located outside of IOC aperture.
+ */
+ axs10x_mb {
+ ethernet@0x18000 {
+ dma-coherent;
+ };
+
+ ehci@0x40000 {
+ dma-coherent;
+ };
+
+ ohci@0x60000 {
+ dma-coherent;
+ };
+
+ mmc@0x15000 {
+ dma-coherent;
+ };
+ };
+
/*
* This INTC is actually connected to DW APB GPIO
* which acts as a wire between MB INTC and CPU INTC.
diff --git a/arch/arc/boot/dts/hsdk.dts b/arch/arc/boot/dts/hsdk.dts
index 006aa3de5348..ebb686c21393 100644
--- a/arch/arc/boot/dts/hsdk.dts
+++ b/arch/arc/boot/dts/hsdk.dts
@@ -176,6 +176,7 @@
phy-handle = <&phy0>;
resets = <&cgu_rst HSDK_ETH_RESET>;
reset-names = "stmmaceth";
+ dma-coherent;
mdio {
#address-cells = <1>;
@@ -194,12 +195,14 @@
compatible = "snps,hsdk-v1.0-ohci", "generic-ohci";
reg = <0x60000 0x100>;
interrupts = <15>;
+ dma-coherent;
};
ehci@40000 {
compatible = "snps,hsdk-v1.0-ehci", "generic-ehci";
reg = <0x40000 0x100>;
interrupts = <15>;
+ dma-coherent;
};
mmc@a000 {
@@ -212,6 +215,7 @@
clock-names = "biu", "ciu";
interrupts = <12>;
bus-width = <4>;
+ dma-coherent;
};
};
--
2.14.4
Refactoring, no functional change intended.
Signed-off-by: Eugeniy Paltsev <[email protected]>
---
arch/arc/mm/dma.c | 28 ++++++++++------------------
1 file changed, 10 insertions(+), 18 deletions(-)
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index b693818cd8e5..46584c7c2858 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -27,30 +27,24 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
struct page *page;
phys_addr_t paddr;
void *kvaddr;
- int need_coh = 1, need_kvaddr = 0;
+ bool need_coh = !(attrs & DMA_ATTR_NON_CONSISTENT);
page = alloc_pages(gfp, order);
if (!page)
return NULL;
- if (attrs & DMA_ATTR_NON_CONSISTENT)
- need_coh = 0;
-
- /*
- * - A coherent buffer needs MMU mapping to enforce non-cachability
- * - A highmem page needs a virtual handle (hence MMU mapping)
- * independent of cachability
- */
- if (PageHighMem(page) || need_coh)
- need_kvaddr = 1;
-
/* This is linear addr (0x8000_0000 based) */
paddr = page_to_phys(page);
*dma_handle = paddr;
- /* This is kernel Virtual address (0x7000_0000 based) */
- if (need_kvaddr) {
+ /*
+ * - A coherent buffer needs MMU mapping to enforce non-cachability
+ * - A highmem page needs a virtual handle (hence MMU mapping)
+ * independent of cachability.
+ * kvaddr is kernel Virtual address (0x7000_0000 based)
+ */
+ if (PageHighMem(page) || need_coh) {
kvaddr = ioremap_nocache(paddr, size);
if (kvaddr == NULL) {
__free_pages(page, order);
@@ -81,11 +75,9 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
{
phys_addr_t paddr = dma_handle;
struct page *page = virt_to_page(paddr);
- int is_non_coh = 1;
-
- is_non_coh = (attrs & DMA_ATTR_NON_CONSISTENT);
+ bool is_coh = !(attrs & DMA_ATTR_NON_CONSISTENT);
- if (PageHighMem(page) || !is_non_coh)
+ if (PageHighMem(page) || is_coh)
iounmap((void __force __iomem *)vaddr);
__free_pages(page, get_order(size));
--
2.14.4
Signed-off-by: Eugeniy Paltsev <[email protected]>
---
arch/arc/mm/cache.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 34935ce38631..e0b3484641c3 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -1143,6 +1143,19 @@ noinline void __init arc_ioc_setup(void)
{
unsigned int ioc_base, mem_sz;
+ /*
+ * As for today we don't support both IOC and ZONE_HIGHMEM enabled
+ * simultaneously. This happens because as of today IOC aperture covers
+ * only ZONE_NORMAL (low mem) and any dma transactions outside this
+ * region won't be HW coherent.
+ * If we want to use both IOC and ZONE_HIGHMEM we can use
+ * bounce_buffer to handle dma transactions to HIGHMEM.
+ * Also it is possible to modify dma_direct cache ops or increase IOC
+ * aperture size if we are planning to use HIGHMEM without PAE.
+ */
+ if (IS_ENABLED(CONFIG_HIGHMEM))
+ panic("IOC and HIGHMEM can't be used simultaneously");
+
/* Flush + invalidate + disable L1 dcache */
__dc_disable();
--
2.14.4
The ARC HS processor provides an IOC port (I/O coherency bus
interface) that allows external devices such as DMA devices
to access memory through the cache hierarchy, providing
coherency between I/O transactions and the complete memory
hierarchy.
Some recent SoC with ARC HS (like HSDK) allow to select bus
port (IOC or non-IOC port) for connecting DMA devices in runtime.
With this patch we can use both HW-coherent and regular DMA
peripherals simultaneously.
For example we can connect Ethernet and SDIO controllers through
IOC port (so we don't need to need to maintain cache coherency
for these devices manualy. All cache sync ops will be nop)
And we can connect USB directly to RAM port (so we had to
maintain cache coherency manualy. Cache sync ops will be real
flush/invalidate operations)
Cache ops are set per-device and depends on "dma-coherent" device
tree property:
"dma_noncoherent_ops" are used if no "dma-coherent" property is
present (or IOC is disabled)
"dma_direct_ops" are used if "dma-coherent" property is present.
Signed-off-by: Eugeniy Paltsev <[email protected]>
---
Changes RFC v2 -> PATCH v1:
* Move device tree changes to separate patch. Add AXS103 device tree
changes.
* Don't check IOC status in arch_dma_free function (Thanks to Christoph)
* Add panic for IOC + HIGHMEM in separate patch.
Changes RFC v1 -> RFC v2 (Thanks to Vineet and Christoph):
* Panic if both IOC and HIGHMEM_ZONE are enabled.
* Move arch_setup_dma_ops to arch/arc/mm/dma.c
* Tweak the boot printing about IOC
* Print info about cache ops used for each device.
* Refactor arch_setup_dma_ops
arch/arc/Kconfig | 1 +
arch/arc/include/asm/dma-mapping.h | 13 +++++++++++++
arch/arc/mm/cache.c | 17 ++---------------
arch/arc/mm/dma.c | 34 +++++++++++++++++++---------------
4 files changed, 35 insertions(+), 30 deletions(-)
create mode 100644 arch/arc/include/asm/dma-mapping.h
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index e81bcd271be7..0a2fcd2a8c32 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -17,6 +17,7 @@ config ARC
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_NONCOHERENT_OPS
+ select DMA_DIRECT_OPS
select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
select GENERIC_CLOCKEVENTS
diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h
new file mode 100644
index 000000000000..c946c0a83e76
--- /dev/null
+++ b/arch/arc/include/asm/dma-mapping.h
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+// (C) 2018 Synopsys, Inc. (http://www.synopsys.com)
+
+#ifndef ASM_ARC_DMA_MAPPING_H
+#define ASM_ARC_DMA_MAPPING_H
+
+#include <asm-generic/dma-mapping.h>
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+ const struct iommu_ops *iommu, bool coherent);
+#define arch_setup_dma_ops arch_setup_dma_ops
+
+#endif
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 9dbe645ee127..34935ce38631 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -65,7 +65,7 @@ char *arc_cache_mumbojumbo(int c, char *buf, int len)
n += scnprintf(buf + n, len - n, "Peripherals\t: %#lx%s%s\n",
perip_base,
- IS_AVAIL3(ioc_exists, ioc_enable, ", IO-Coherency "));
+ IS_AVAIL3(ioc_exists, ioc_enable, ", per device IO-Coherency "));
return buf;
}
@@ -896,15 +896,6 @@ static void __dma_cache_wback_slc(phys_addr_t start, unsigned long sz)
slc_op(start, sz, OP_FLUSH);
}
-/*
- * DMA ops for systems with IOC
- * IOC hardware snoops all DMA traffic keeping the caches consistent with
- * memory - eliding need for any explicit cache maintenance of DMA buffers
- */
-static void __dma_cache_wback_inv_ioc(phys_addr_t start, unsigned long sz) {}
-static void __dma_cache_inv_ioc(phys_addr_t start, unsigned long sz) {}
-static void __dma_cache_wback_ioc(phys_addr_t start, unsigned long sz) {}
-
/*
* Exported DMA API
*/
@@ -1253,11 +1244,7 @@ void __init arc_cache_init_master(void)
if (is_isa_arcv2() && ioc_enable)
arc_ioc_setup();
- if (is_isa_arcv2() && ioc_enable) {
- __dma_cache_wback_inv = __dma_cache_wback_inv_ioc;
- __dma_cache_inv = __dma_cache_inv_ioc;
- __dma_cache_wback = __dma_cache_wback_ioc;
- } else if (is_isa_arcv2() && l2_line_sz && slc_enable) {
+ if (is_isa_arcv2() && l2_line_sz && slc_enable) {
__dma_cache_wback_inv = __dma_cache_wback_inv_slc;
__dma_cache_inv = __dma_cache_inv_slc;
__dma_cache_wback = __dma_cache_wback_slc;
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 8c1071840979..b693818cd8e5 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -33,19 +33,7 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
if (!page)
return NULL;
- /*
- * IOC relies on all data (even coherent DMA data) being in cache
- * Thus allocate normal cached memory
- *
- * The gains with IOC are two pronged:
- * -For streaming data, elides need for cache maintenance, saving
- * cycles in flush code, and bus bandwidth as all the lines of a
- * buffer need to be flushed out to memory
- * -For coherent data, Read/Write to buffers terminate early in cache
- * (vs. always going to memory - thus are faster)
- */
- if ((is_isa_arcv2() && ioc_enable) ||
- (attrs & DMA_ATTR_NON_CONSISTENT))
+ if (attrs & DMA_ATTR_NON_CONSISTENT)
need_coh = 0;
/*
@@ -95,8 +83,7 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
struct page *page = virt_to_page(paddr);
int is_non_coh = 1;
- is_non_coh = (attrs & DMA_ATTR_NON_CONSISTENT) ||
- (is_isa_arcv2() && ioc_enable);
+ is_non_coh = (attrs & DMA_ATTR_NON_CONSISTENT);
if (PageHighMem(page) || !is_non_coh)
iounmap((void __force __iomem *)vaddr);
@@ -140,3 +127,20 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
{
dma_cache_inv(paddr, size);
}
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+ const struct iommu_ops *iommu, bool coherent)
+{
+ /*
+ * IOC hardware snoops all DMA traffic keeping the caches consistent
+ * with memory - eliding need for any explicit cache maintenance of
+ * DMA buffers - so we can use dma_direct cache ops.
+ */
+ if (is_isa_arcv2() && ioc_enable && coherent) {
+ set_dma_ops(dev, &dma_direct_ops);
+ dev_info(dev, "use dma_direct_ops cache ops\n");
+ } else {
+ set_dma_ops(dev, &dma_noncoherent_ops);
+ dev_info(dev, "use dma_noncoherent_ops cache ops\n");
+ }
+}
--
2.14.4
> select DMA_NONCOHERENT_OPS
> + select DMA_DIRECT_OPS
DMA_NONCOHERENT_OPS already selects DMA_DIRECT_OPS.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Tue, Jul 24, 2018 at 01:10:00PM +0300, Eugeniy Paltsev wrote:
> Refactoring, no functional change intended.
Might be worth explaining a bit why you are refactoring it (i.e. what is the
benefit).
>
> Signed-off-by: Eugeniy Paltsev <[email protected]>
> ---
> arch/arc/mm/dma.c | 28 ++++++++++------------------
> 1 file changed, 10 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
> index b693818cd8e5..46584c7c2858 100644
> --- a/arch/arc/mm/dma.c
> +++ b/arch/arc/mm/dma.c
> @@ -27,30 +27,24 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> struct page *page;
> phys_addr_t paddr;
> void *kvaddr;
> - int need_coh = 1, need_kvaddr = 0;
> + bool need_coh = !(attrs & DMA_ATTR_NON_CONSISTENT);
>
> page = alloc_pages(gfp, order);
> if (!page)
> return NULL;
>
> /* This is linear addr (0x8000_0000 based) */
> paddr = page_to_phys(page);
>
> *dma_handle = paddr;
>
> + /*
> + * - A coherent buffer needs MMU mapping to enforce non-cachability
> + * - A highmem page needs a virtual handle (hence MMU mapping)
> + * independent of cachability.
> + * kvaddr is kernel Virtual address (0x7000_0000 based)
> + */
> + if (PageHighMem(page) || need_coh) {
dma_alloc_attrs clears __GFP_HIGHMEM from the passed in gfp mask, so
you'll never get a highmem page here.
That also means you can merge this conditional with the one for the cache
writeback and invalidation and kill the need_coh flag entirely.
> kvaddr = ioremap_nocache(paddr, size);
> if (kvaddr == NULL) {
> __free_pages(page, order);
> @@ -81,11 +75,9 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
> {
> phys_addr_t paddr = dma_handle;
> struct page *page = virt_to_page(paddr);
> - int is_non_coh = 1;
> -
> - is_non_coh = (attrs & DMA_ATTR_NON_CONSISTENT);
> + bool is_coh = !(attrs & DMA_ATTR_NON_CONSISTENT);
>
> - if (PageHighMem(page) || !is_non_coh)
> + if (PageHighMem(page) || is_coh)
> iounmap((void __force __iomem *)vaddr);
>
Same here.
Also if you clean this up it would be great to take the per-device pfn offset
into account, even if that isn't used anywhere on arc yet, that is call
phys_to_dma and dma_to_phys to convert to an from the dma address.
On Thu, 2018-07-26 at 11:17 +0200, Christoph Hellwig wrote:
> On Tue, Jul 24, 2018 at 01:10:00PM +0300, Eugeniy Paltsev wrote:
> > Refactoring, no functional change intended.
> >
[snip]
> >
> > *dma_handle = paddr;
> >
> > + /*
> > + * - A coherent buffer needs MMU mapping to enforce non-cachability
> > + * - A highmem page needs a virtual handle (hence MMU mapping)
> > + * independent of cachability.
> > + * kvaddr is kernel Virtual address (0x7000_0000 based)
> > + */
> > + if (PageHighMem(page) || need_coh) {
>
> dma_alloc_attrs clears __GFP_HIGHMEM from the passed in gfp mask, so
> you'll never get a highmem page here.
>
Nice catch, thanks.
Will remove check for highmem page in next patch version.
> That also means you can merge this conditional with the one for the cache
> writeback and invalidation and kill the need_coh flag entirely.
>
> > kvaddr = ioremap_nocache(paddr, size);
> > if (kvaddr == NULL) {
> > __free_pages(page, order);
> > @@ -81,11 +75,9 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
> > {
> > phys_addr_t paddr = dma_handle;
> > struct page *page = virt_to_page(paddr);
> > - int is_non_coh = 1;
> > -
> > - is_non_coh = (attrs & DMA_ATTR_NON_CONSISTENT);
> > + bool is_coh = !(attrs & DMA_ATTR_NON_CONSISTENT);
> >
> > - if (PageHighMem(page) || !is_non_coh)
> > + if (PageHighMem(page) || is_coh)
> > iounmap((void __force __iomem *)vaddr);
> >
>
> Same here.
>
> Also if you clean this up it would be great to take the per-device pfn offset
> into account, even if that isn't used anywhere on arc yet, that is call
> phys_to_dma and dma_to_phys to convert to an from the dma address.
Ok, I'll look at it.
Probably I'll implement it as a separate patch as it is irrelevant to this
patch series topic.
--
Eugeniy Paltsev