2018-07-24 12:04:19

by Christoph Hellwig

[permalink] [raw]
Subject: use the generic dma-noncoherent code for sh V2

Hi all,

can you review these patches to switch sh to use the generic
dma-noncoherent code? All the requirements are in mainline already
and we've switched various architectures over to it already.

Changes since V1:
- fixed two stupid compile errors and verified them using a local
cross toolchain instead of the 0day buildbot


2018-07-24 12:03:20

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/5] sh: simplify get_arch_dma_ops

Remove the indirection through the dma_ops variable, and just return
nommu_dma_ops directly from get_arch_dma_ops.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/include/asm/dma-mapping.h | 5 ++---
arch/sh/kernel/dma-nommu.c | 8 +-------
arch/sh/mm/consistent.c | 3 ---
arch/sh/mm/init.c | 10 ----------
4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
index 41167931e5d9..149e71f95be7 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -2,12 +2,11 @@
#ifndef __ASM_SH_DMA_MAPPING_H
#define __ASM_SH_DMA_MAPPING_H

-extern const struct dma_map_ops *dma_ops;
-extern void no_iommu_init(void);
+extern const struct dma_map_ops nommu_dma_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
- return dma_ops;
+ return &nommu_dma_ops;
}

extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 3e3a32fc676e..79a9edafa5b0 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -79,10 +79,4 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
#endif
};
-
-void __init no_iommu_init(void)
-{
- if (dma_ops)
- return;
- dma_ops = &nommu_dma_ops;
-}
+EXPORT_SYMBOL(nommu_dma_ops);
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index fceb2adfcac7..e9d422bd42a5 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -20,9 +20,6 @@
#include <asm/cacheflush.h>
#include <asm/addrspace.h>

-const struct dma_map_ops *dma_ops;
-EXPORT_SYMBOL(dma_ops);
-
void *dma_generic_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs)
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 4034035fbede..7713c084d040 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -339,22 +339,12 @@ void __init paging_init(void)
free_area_init_nodes(max_zone_pfns);
}

-/*
- * Early initialization for any I/O MMUs we might have.
- */
-static void __init iommu_init(void)
-{
- no_iommu_init();
-}
-
unsigned int mem_init_done = 0;

void __init mem_init(void)
{
pg_data_t *pgdat;

- iommu_init();
-
high_memory = NULL;
for_each_online_pgdat(pgdat)
high_memory = max_t(void *, high_memory,
--
2.18.0


2018-07-24 12:03:46

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/5] sh: introduce a sh_cacheop_vaddr helper

And use it in the maple bus code to avoid a dma API dependency.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/include/asm/cacheflush.h | 7 +++++++
arch/sh/mm/consistent.c | 6 +-----
drivers/sh/maple/maple.c | 7 ++++---
3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index d103ab5a4e4b..b932e42ef028 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -101,5 +101,12 @@ void kunmap_coherent(void *kvaddr);

void cpu_cache_init(void);

+static inline void *sh_cacheop_vaddr(void *vaddr)
+{
+ if (__in_29bit_mode())
+ vaddr = (void *)CAC_ADDR((unsigned long)vaddr);
+ return vaddr;
+}
+
#endif /* __KERNEL__ */
#endif /* __ASM_SH_CACHEFLUSH_H */
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index e9d422bd42a5..1622ae6b9dbd 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -74,10 +74,7 @@ void dma_generic_free_coherent(struct device *dev, size_t size,
void sh_sync_dma_for_device(void *vaddr, size_t size,
enum dma_data_direction direction)
{
- void *addr;
-
- addr = __in_29bit_mode() ?
- (void *)CAC_ADDR((unsigned long)vaddr) : vaddr;
+ void *addr = sh_cacheop_vaddr(vaddr);

switch (direction) {
case DMA_FROM_DEVICE: /* invalidate only */
@@ -93,7 +90,6 @@ void sh_sync_dma_for_device(void *vaddr, size_t size,
BUG();
}
}
-EXPORT_SYMBOL(sh_sync_dma_for_device);

static int __init memchunk_setup(char *str)
{
diff --git a/drivers/sh/maple/maple.c b/drivers/sh/maple/maple.c
index 2e45988d1259..e5d7fb81ad66 100644
--- a/drivers/sh/maple/maple.c
+++ b/drivers/sh/maple/maple.c
@@ -300,8 +300,8 @@ static void maple_send(void)
mutex_unlock(&maple_wlist_lock);
if (maple_packets > 0) {
for (i = 0; i < (1 << MAPLE_DMA_PAGES); i++)
- sh_sync_dma_for_device(maple_sendbuf + i * PAGE_SIZE,
- PAGE_SIZE, DMA_BIDIRECTIONAL);
+ __flush_purge_region(maple_sendbuf + i * PAGE_SIZE,
+ PAGE_SIZE);
}

finish:
@@ -642,7 +642,8 @@ static void maple_dma_handler(struct work_struct *work)
list_for_each_entry_safe(mq, nmq, &maple_sentq, list) {
mdev = mq->dev;
recvbuf = mq->recvbuf->buf;
- sh_sync_dma_for_device(recvbuf, 0x400, DMA_FROM_DEVICE);
+ __flush_invalidate_region(sh_cacheop_vaddr(recvbuf),
+ 0x400);
code = recvbuf[0];
kfree(mq->sendbuf);
list_del_init(&mq->list);
--
2.18.0


2018-07-24 12:03:48

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 5/5] sh: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/Kconfig | 3 +-
arch/sh/include/asm/Kbuild | 1 +
arch/sh/include/asm/dma-mapping.h | 26 -----------
arch/sh/kernel/Makefile | 2 +-
arch/sh/kernel/dma-coherent.c | 23 +++++----
arch/sh/kernel/dma-nommu.c | 78 -------------------------------
6 files changed, 15 insertions(+), 118 deletions(-)
delete mode 100644 arch/sh/include/asm/dma-mapping.h
delete mode 100644 arch/sh/kernel/dma-nommu.c

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index c9993a0cdc7e..da4db4b5359f 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -51,7 +51,6 @@ config SUPERH
select HAVE_ARCH_AUDITSYSCALL
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_NMI
- select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH

help
@@ -164,6 +163,8 @@ config DMA_COHERENT

config DMA_NONCOHERENT
def_bool !DMA_COHERENT
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select DMA_NONCOHERENT_OPS

config PGTABLE_LEVELS
default 3 if X2TLB
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 46dd82ab2c29..6a5609a55965 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -2,6 +2,7 @@ generic-y += compat.h
generic-y += current.h
generic-y += delay.h
generic-y += div64.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += irq_regs.h
diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
deleted file mode 100644
index 1ebc6a4eb1c5..000000000000
--- a/arch/sh/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_DMA_MAPPING_H
-#define __ASM_SH_DMA_MAPPING_H
-
-extern const struct dma_map_ops nommu_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-#ifdef CONFIG_DMA_NONCOHERENT
- return &nommu_dma_ops;
-#else
- return &dma_direct_ops;
-#endif
-}
-
-extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_addr, gfp_t flag,
- unsigned long attrs);
-extern void dma_generic_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs);
-
-void sh_sync_dma_for_device(void *vaddr, size_t size,
- enum dma_data_direction dir);
-
-#endif /* __ASM_SH_DMA_MAPPING_H */
diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index d5ddb64bfffe..59673f8a3379 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE) += disassemble.o
obj-$(CONFIG_HIBERNATION) += swsusp.o
obj-$(CONFIG_DWARF_UNWINDER) += dwarf.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_callchain.o
-obj-$(CONFIG_DMA_NONCOHERENT) += dma-nommu.o dma-coherent.o
+obj-$(CONFIG_DMA_NONCOHERENT) += dma-coherent.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o

ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c
index 763ba10fbd3e..9199b5523f7c 100644
--- a/arch/sh/kernel/dma-coherent.c
+++ b/arch/sh/kernel/dma-coherent.c
@@ -7,14 +7,13 @@
*/
#include <linux/mm.h>
#include <linux/init.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/module.h>
#include <asm/cacheflush.h>
#include <asm/addrspace.h>

-void *dma_generic_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
void *ret, *ret_nocache;
int order = get_order(size);
@@ -29,7 +28,8 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
* Pages from the page allocator may have data present in
* cache. So flush the cache before using uncached memory.
*/
- sh_sync_dma_for_device(ret, size, DMA_BIDIRECTIONAL);
+ arch_sync_dma_for_device(dev, virt_to_phys(ret), size,
+ DMA_BIDIRECTIONAL);

ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
if (!ret_nocache) {
@@ -46,9 +46,8 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
return ret_nocache;
}

-void dma_generic_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
{
int order = get_order(size);
unsigned long pfn = (dma_handle >> PAGE_SHIFT);
@@ -63,12 +62,12 @@ void dma_generic_free_coherent(struct device *dev, size_t size,
iounmap(vaddr);
}

-void sh_sync_dma_for_device(void *vaddr, size_t size,
- enum dma_data_direction direction)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- void *addr = sh_cacheop_vaddr(vaddr);
+ void *addr = sh_cacheop_vaddr(phys_to_virt(paddr));

- switch (direction) {
+ switch (dir) {
case DMA_FROM_DEVICE: /* invalidate only */
__flush_invalidate_region(addr, size);
break;
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
deleted file mode 100644
index d8689b1cb743..000000000000
--- a/arch/sh/kernel/dma-nommu.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * DMA mapping support for platforms lacking IOMMUs.
- *
- * Copyright (C) 2009 Paul Mundt
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- */
-#include <linux/dma-mapping.h>
-#include <linux/io.h>
-#include <asm/cacheflush.h>
-
-static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t addr = page_to_phys(page) + offset
- - PFN_PHYS(dev->dma_pfn_offset);
-
- WARN_ON(size == 0);
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- sh_sync_dma_for_device(page_address(page) + offset, size, dir);
-
- return addr;
-}
-
-static int nommu_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- WARN_ON(nents == 0 || sg[0].length == 0);
-
- for_each_sg(sg, s, nents, i) {
- dma_addr_t offset = PFN_PHYS(dev->dma_pfn_offset);
-
- BUG_ON(!sg_page(s));
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- sh_sync_dma_for_device(sg_virt(s), s->length, dir);
-
- s->dma_address = sg_phys(s) - offset;
- s->dma_length = s->length;
- }
-
- return nents;
-}
-
-static void nommu_sync_single_for_device(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir)
-{
- sh_sync_dma_for_device(phys_to_virt(addr), size, dir);
-}
-
-static void nommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
- int nelems, enum dma_data_direction dir)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nelems, i)
- sh_sync_dma_for_device(sg_virt(s), s->length, dir);
-}
-
-const struct dma_map_ops nommu_dma_ops = {
- .alloc = dma_generic_alloc_coherent,
- .free = dma_generic_free_coherent,
- .map_page = nommu_map_page,
- .map_sg = nommu_map_sg,
- .sync_single_for_device = nommu_sync_single_for_device,
- .sync_sg_for_device = nommu_sync_sg_for_device,
-};
-EXPORT_SYMBOL(nommu_dma_ops);
--
2.18.0


2018-07-24 12:04:14

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 4/5] sh: split arch/sh/mm/consistent.c

Half of the file just contains platform device memory setup code which
is required for all builds, and half contains helpers for dma coherent
allocation, which is only needed if CONFIG_DMA_NONCOHERENT is enabled.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/kernel/Makefile | 2 +-
arch/sh/kernel/dma-coherent.c | 85 +++++++++++++++++++++++++++++++++++
arch/sh/mm/consistent.c | 80 ---------------------------------
3 files changed, 86 insertions(+), 81 deletions(-)
create mode 100644 arch/sh/kernel/dma-coherent.c

diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index cb5f1bfb52de..d5ddb64bfffe 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE) += disassemble.o
obj-$(CONFIG_HIBERNATION) += swsusp.o
obj-$(CONFIG_DWARF_UNWINDER) += dwarf.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_callchain.o
-obj-$(CONFIG_DMA_NONCOHERENT) += dma-nommu.o
+obj-$(CONFIG_DMA_NONCOHERENT) += dma-nommu.o dma-coherent.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o

ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c
new file mode 100644
index 000000000000..763ba10fbd3e
--- /dev/null
+++ b/arch/sh/kernel/dma-coherent.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2004 - 2007 Paul Mundt
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+#include <asm/cacheflush.h>
+#include <asm/addrspace.h>
+
+void *dma_generic_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp,
+ unsigned long attrs)
+{
+ void *ret, *ret_nocache;
+ int order = get_order(size);
+
+ gfp |= __GFP_ZERO;
+
+ ret = (void *)__get_free_pages(gfp, order);
+ if (!ret)
+ return NULL;
+
+ /*
+ * Pages from the page allocator may have data present in
+ * cache. So flush the cache before using uncached memory.
+ */
+ sh_sync_dma_for_device(ret, size, DMA_BIDIRECTIONAL);
+
+ ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
+ if (!ret_nocache) {
+ free_pages((unsigned long)ret, order);
+ return NULL;
+ }
+
+ split_page(pfn_to_page(virt_to_phys(ret) >> PAGE_SHIFT), order);
+
+ *dma_handle = virt_to_phys(ret);
+ if (!WARN_ON(!dev))
+ *dma_handle -= PFN_PHYS(dev->dma_pfn_offset);
+
+ return ret_nocache;
+}
+
+void dma_generic_free_coherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_handle,
+ unsigned long attrs)
+{
+ int order = get_order(size);
+ unsigned long pfn = (dma_handle >> PAGE_SHIFT);
+ int k;
+
+ if (!WARN_ON(!dev))
+ pfn += dev->dma_pfn_offset;
+
+ for (k = 0; k < (1 << order); k++)
+ __free_pages(pfn_to_page(pfn + k), 0);
+
+ iounmap(vaddr);
+}
+
+void sh_sync_dma_for_device(void *vaddr, size_t size,
+ enum dma_data_direction direction)
+{
+ void *addr = sh_cacheop_vaddr(vaddr);
+
+ switch (direction) {
+ case DMA_FROM_DEVICE: /* invalidate only */
+ __flush_invalidate_region(addr, size);
+ break;
+ case DMA_TO_DEVICE: /* writeback only */
+ __flush_wback_region(addr, size);
+ break;
+ case DMA_BIDIRECTIONAL: /* writeback and invalidate */
+ __flush_purge_region(addr, size);
+ break;
+ default:
+ BUG();
+ }
+}
+EXPORT_SYMBOL(sh_sync_dma_for_device);
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index 1622ae6b9dbd..792f36129062 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -1,10 +1,6 @@
/*
- * arch/sh/mm/consistent.c
- *
* Copyright (C) 2004 - 2007 Paul Mundt
*
- * Declared coherent memory functions based on arch/x86/kernel/pci-dma_32.c
- *
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
@@ -13,83 +9,7 @@
#include <linux/init.h>
#include <linux/platform_device.h>
#include <linux/dma-mapping.h>
-#include <linux/dma-debug.h>
#include <linux/io.h>
-#include <linux/module.h>
-#include <linux/gfp.h>
-#include <asm/cacheflush.h>
-#include <asm/addrspace.h>
-
-void *dma_generic_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp,
- unsigned long attrs)
-{
- void *ret, *ret_nocache;
- int order = get_order(size);
-
- gfp |= __GFP_ZERO;
-
- ret = (void *)__get_free_pages(gfp, order);
- if (!ret)
- return NULL;
-
- /*
- * Pages from the page allocator may have data present in
- * cache. So flush the cache before using uncached memory.
- */
- sh_sync_dma_for_device(ret, size, DMA_BIDIRECTIONAL);
-
- ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
- if (!ret_nocache) {
- free_pages((unsigned long)ret, order);
- return NULL;
- }
-
- split_page(pfn_to_page(virt_to_phys(ret) >> PAGE_SHIFT), order);
-
- *dma_handle = virt_to_phys(ret);
- if (!WARN_ON(!dev))
- *dma_handle -= PFN_PHYS(dev->dma_pfn_offset);
-
- return ret_nocache;
-}
-
-void dma_generic_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- int order = get_order(size);
- unsigned long pfn = dma_handle >> PAGE_SHIFT;
- int k;
-
- if (!WARN_ON(!dev))
- pfn += dev->dma_pfn_offset;
-
- for (k = 0; k < (1 << order); k++)
- __free_pages(pfn_to_page(pfn + k), 0);
-
- iounmap(vaddr);
-}
-
-void sh_sync_dma_for_device(void *vaddr, size_t size,
- enum dma_data_direction direction)
-{
- void *addr = sh_cacheop_vaddr(vaddr);
-
- switch (direction) {
- case DMA_FROM_DEVICE: /* invalidate only */
- __flush_invalidate_region(addr, size);
- break;
- case DMA_TO_DEVICE: /* writeback only */
- __flush_wback_region(addr, size);
- break;
- case DMA_BIDIRECTIONAL: /* writeback and invalidate */
- __flush_purge_region(addr, size);
- break;
- default:
- BUG();
- }
-}

static int __init memchunk_setup(char *str)
{
--
2.18.0


2018-07-24 12:04:30

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/5] sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case

This is a slight change in behavior as we avoid the detour through the
virtual mapping for the coherent allocator, but if this CPU really is
coherent that should be the right thing to do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/Kconfig | 1 +
arch/sh/include/asm/dma-mapping.h | 4 ++++
arch/sh/kernel/Makefile | 4 ++--
arch/sh/kernel/dma-nommu.c | 4 ----
4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index dd4f3d3e644f..c9993a0cdc7e 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -159,6 +159,7 @@ config SWAP_IO_SPACE
bool

config DMA_COHERENT
+ select DMA_DIRECT_OPS
bool

config DMA_NONCOHERENT
diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
index 149e71f95be7..1ebc6a4eb1c5 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -6,7 +6,11 @@ extern const struct dma_map_ops nommu_dma_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
+#ifdef CONFIG_DMA_NONCOHERENT
return &nommu_dma_ops;
+#else
+ return &dma_direct_ops;
+#endif
}

extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index dc80041f7363..cb5f1bfb52de 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -12,7 +12,7 @@ endif

CFLAGS_REMOVE_return_address.o = -pg

-obj-y := debugtraps.o dma-nommu.o dumpstack.o \
+obj-y := debugtraps.o dumpstack.o \
idle.o io.o irq.o irq_$(BITS).o kdebugfs.o \
machvec.o nmi_debug.o process.o \
process_$(BITS).o ptrace.o ptrace_$(BITS).o \
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE) += disassemble.o
obj-$(CONFIG_HIBERNATION) += swsusp.o
obj-$(CONFIG_DWARF_UNWINDER) += dwarf.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_callchain.o
-
+obj-$(CONFIG_DMA_NONCOHERENT) += dma-nommu.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o

ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 79a9edafa5b0..d8689b1cb743 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -51,7 +51,6 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg,
return nents;
}

-#ifdef CONFIG_DMA_NONCOHERENT
static void nommu_sync_single_for_device(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir)
{
@@ -67,16 +66,13 @@ static void nommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
for_each_sg(sg, s, nelems, i)
sh_sync_dma_for_device(sg_virt(s), s->length, dir);
}
-#endif

const struct dma_map_ops nommu_dma_ops = {
.alloc = dma_generic_alloc_coherent,
.free = dma_generic_free_coherent,
.map_page = nommu_map_page,
.map_sg = nommu_map_sg,
-#ifdef CONFIG_DMA_NONCOHERENT
.sync_single_for_device = nommu_sync_single_for_device,
.sync_sg_for_device = nommu_sync_sg_for_device,
-#endif
};
EXPORT_SYMBOL(nommu_dma_ops);
--
2.18.0


2018-07-24 20:20:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: use the generic dma-noncoherent code for sh V2

Ok, there is one more issue with this version. Wait for a new one
tomorrow.

On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
> Hi all,
>
> can you review these patches to switch sh to use the generic
> dma-noncoherent code? All the requirements are in mainline already
> and we've switched various architectures over to it already.
>
> Changes since V1:
> - fixed two stupid compile errors and verified them using a local
> cross toolchain instead of the 0day buildbot
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

2018-07-27 16:21:32

by Rob Landley

[permalink] [raw]
Subject: Re: use the generic dma-noncoherent code for sh V2

On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
>> Hi all,
>>
>> can you review these patches to switch sh to use the generic
>> dma-noncoherent code? All the requirements are in mainline already
>> and we've switched various architectures over to it already.
>
> Ok, there is one more issue with this version. Wait for a new one
> tomorrow.

Speaking of DMA:

I'm trying to wire up DMAEngine to an sh7760 board that uses platform data (and
fix the smc91x.c driver to use DMAEngine without #ifdef arm), so I've been
reading through all that stuff, but the docs seem kinda... thin?

Is there something I should have read other than
Documentation/driver-model/platform.txt,
Documentation/dmaegine/{provider,client}.txt, then trying to picking through the
source code and the sh7760 hardware pdf? (And watching the youtube video of
Laurent Pinchart's 2014 ELC talk on DMA, Maxime Ripard's 2015 ELC overview of
DMAEngine, the Xilinx video on DMAEngine...)

At first I thought the SH_DMAE could initialize itself, but the probe function
needs platform data, and although arch/sh/kernel/cpu/sh4a/setup-sh7722.c looks
_kind_ of like a model I can crib from:

A) "make ARCH=sh se7722_defconfig" results in a config with SH_DMA disabled??!?
(This is why I use miniconfig instead of defconfig format, I'm assuming that's
bit rot?)

B) That platform data is supplying sh_dmae_slave_config preallocating slave
channels to devices? (Does it have to? The docs gave me the impression the
driver would dynamically request them and devices could even share. Wasn't that
sort of the point of DMAEngine? Can my new board data _not_ do that? What's the
minimum amount of micromanaging I have to do?)

C) It's full of stuff like setting ts_low_shift to CHCR_TS_LOW_SHIFT where both
grepping Docuemntation and Google "dmaengine ts_low_shift" are unhelpful.

What I'd really like is a "hello world" version of DMAEngine somewhere I can
build and run on a supported qemu target, to set up _one_ channel with a block
device or something using it. I can't tell what's optional, or what the minimal
version of this looks like.

(Currently I've only managed to update this kernel to 4.14 because 4.15
introduced an intermittent data corruption bug in the flash, which takes long
enough to reproduce bisecting it is fiddly and ship deadlines are all blinky and
red. But next release should be current, _and_ with at least the 4.14 source
published so I can point people at it. Heck, maybe I can convince them to let me
port it to device tree next cycle, but I need to get it to _work_ first. And
doing PIO on a 100baseT controller, I.E. a ~200 mhz embedded CPU trying to copy
11 megabytes/second across a 16 bit bus with a for(;;) loop... bit of a
performance bottleneck even before you add https.)

Thanks,

Rob

>>
>> Changes since V1:
>> - fixed two stupid compile errors and verified them using a local
>> cross toolchain instead of the 0day buildbot
>> _______________________________________________
>> iommu mailing list
>> [email protected]
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> ---end quoted text---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2018-07-30 09:04:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: use the generic dma-noncoherent code for sh V2

On Fri, Jul 27, 2018 at 11:20:21AM -0500, Rob Landley wrote:
> Speaking of DMA:

Which really has nothing to do with the dma mapping code, which
also means I can't help you much unfortunately.

That being said sh is the last pending of the initial dma-noncoherent
conversion, I'd greatly appreciate if we could get this reviewed and
merge for the 4.19 merge window..

2018-07-30 09:26:53

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: use the generic dma-noncoherent code for sh V2

Hi Rob,

CC Guennadi

On Fri, Jul 27, 2018 at 6:21 PM Rob Landley <[email protected]> wrote:
> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
> > On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
> >> can you review these patches to switch sh to use the generic
> >> dma-noncoherent code? All the requirements are in mainline already
> >> and we've switched various architectures over to it already.
> >
> > Ok, there is one more issue with this version. Wait for a new one
> > tomorrow.
>
> Speaking of DMA:
>
> I'm trying to wire up DMAEngine to an sh7760 board that uses platform data (and
> fix the smc91x.c driver to use DMAEngine without #ifdef arm), so I've been
> reading through all that stuff, but the docs seem kinda... thin?
>
> Is there something I should have read other than
> Documentation/driver-model/platform.txt,
> Documentation/dmaegine/{provider,client}.txt, then trying to picking through the
> source code and the sh7760 hardware pdf? (And watching the youtube video of
> Laurent Pinchart's 2014 ELC talk on DMA, Maxime Ripard's 2015 ELC overview of
> DMAEngine, the Xilinx video on DMAEngine...)
>
> At first I thought the SH_DMAE could initialize itself, but the probe function
> needs platform data, and although arch/sh/kernel/cpu/sh4a/setup-sh7722.c looks
> _kind_ of like a model I can crib from:
>
> A) "make ARCH=sh se7722_defconfig" results in a config with SH_DMA disabled??!?
> (This is why I use miniconfig instead of defconfig format, I'm assuming that's
> bit rot?)
>
> B) That platform data is supplying sh_dmae_slave_config preallocating slave
> channels to devices? (Does it have to? The docs gave me the impression the
> driver would dynamically request them and devices could even share. Wasn't that
> sort of the point of DMAEngine? Can my new board data _not_ do that? What's the
> minimum amount of micromanaging I have to do?)
>
> C) It's full of stuff like setting ts_low_shift to CHCR_TS_LOW_SHIFT where both
> grepping Docuemntation and Google "dmaengine ts_low_shift" are unhelpful.
>
> What I'd really like is a "hello world" version of DMAEngine somewhere I can
> build and run on a supported qemu target, to set up _one_ channel with a block
> device or something using it. I can't tell what's optional, or what the minimal
> version of this looks like.

I have no experience with DMA on SH, only with DMA on (DT-based) Renesas
ARM SoCs. But I believe the DMA engines are somewhat similar.

I don't know if all pieces to support DMA were ever upstreamed.
See e.g. commit 219fb0c1436e4893 ("serial: sh-sci: Remove the platform data
dma slave rx/tx channel IDs").

Perhaps Guennadi knows/remembers?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-07-31 12:57:28

by Arnd Bergmann

[permalink] [raw]
Subject: Re: use the generic dma-noncoherent code for sh V2

On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley <[email protected]> wrote:
> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
>>> Hi all,
>>>
>>> can you review these patches to switch sh to use the generic
>>> dma-noncoherent code? All the requirements are in mainline already
>>> and we've switched various architectures over to it already.
>>
>> Ok, there is one more issue with this version. Wait for a new one
>> tomorrow.
>
> Speaking of DMA:
>
> I'm trying to wire up DMAEngine to an sh7760 board that uses platform data (and
> fix the smc91x.c driver to use DMAEngine without #ifdef arm), so I've been
> reading through all that stuff, but the docs seem kinda... thin?
>
> Is there something I should have read other than
> Documentation/driver-model/platform.txt,
> Documentation/dmaegine/{provider,client}.txt, then trying to picking through the
> source code and the sh7760 hardware pdf? (And watching the youtube video of
> Laurent Pinchart's 2014 ELC talk on DMA, Maxime Ripard's 2015 ELC overview of
> DMAEngine, the Xilinx video on DMAEngine...)
>
> At first I thought the SH_DMAE could initialize itself, but the probe function
> needs platform data, and although arch/sh/kernel/cpu/sh4a/setup-sh7722.c looks
> _kind_ of like a model I can crib from:

> B) That platform data is supplying sh_dmae_slave_config preallocating slave
> channels to devices? (Does it have to? The docs gave me the impression the
> driver would dynamically request them and devices could even share. Wasn't that
> sort of the point of DMAEngine? Can my new board data _not_ do that? What's the
> minimum amount of micromanaging I have to do?)

The thing here is that arch/sh is way behind on the API use, and it
has prevented us from cleaning up drivers as well. A slave driver
should have to just call dma_request_chan() with a constant
string to identify its channel rather than going two different ways
depending on whether it's used with DT or platform data.

If you hack on it, please convert the dmaengine platform data to use
a dma_slave_map array to pass the data into the dmaengine driver,
mapping the settings from a (pdev-name, channel-id) tuple to a pointer
that describes the channel configuration rather than having the
mapping from an numerical slave_id to a struct sh_dmae_slave_config
in the setup files. It should be a fairly mechanical conversion.

The other part I noticed is arch/sh/drivers/dma/*, which appears to
be entirely unused, and should probably removed.

Arnd

2018-07-31 13:26:50

by Arnd Bergmann

[permalink] [raw]
Subject: Re: use the generic dma-noncoherent code for sh V2

On Mon, Jul 30, 2018 at 11:06 AM, Christoph Hellwig <[email protected]> wrote:
> On Fri, Jul 27, 2018 at 11:20:21AM -0500, Rob Landley wrote:
>> Speaking of DMA:
>
> Which really has nothing to do with the dma mapping code, which
> also means I can't help you much unfortunately.
>
> That being said sh is the last pending of the initial dma-noncoherent
> conversion, I'd greatly appreciate if we could get this reviewed and
> merge for the 4.19 merge window..

I've spent 30 minutes looking through your submission to find something
wrong with it now, but it all looks fine, the only criticism would be that
some of the changelogs could provide a little more background.

The original implementation seems odd in some places, but your
new version resolves the few concerns I had (like mixing up
phys and dma addresses), and I didn't see anything that should
change behavior.

I hope that helps.

Arnd

2018-08-17 17:06:30

by Rob Landley

[permalink] [raw]
Subject: dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)



On 07/31/2018 07:56 AM, Arnd Bergmann wrote:
> On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley <[email protected]> wrote:
>> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
>>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
>>>> Hi all,
>>>>
>>>> can you review these patches to switch sh to use the generic
>>>> dma-noncoherent code? All the requirements are in mainline already
>>>> and we've switched various architectures over to it already.
>>>
>>> Ok, there is one more issue with this version. Wait for a new one
>>> tomorrow.
>>
>> Speaking of DMA:
>>
>> I'm trying to wire up DMAEngine to an sh7760 board that uses platform data (and
>> fix the smc91x.c driver to use DMAEngine without #ifdef arm), so I've been
>> reading through all that stuff, but the docs seem kinda... thin?
>>
>> Is there something I should have read other than
>> Documentation/driver-model/platform.txt,
>> Documentation/dmaegine/{provider,client}.txt, then trying to picking through the
>> source code and the sh7760 hardware pdf? (And watching the youtube video of
>> Laurent Pinchart's 2014 ELC talk on DMA, Maxime Ripard's 2015 ELC overview of
>> DMAEngine, the Xilinx video on DMAEngine...)
>>
>> At first I thought the SH_DMAE could initialize itself, but the probe function
>> needs platform data, and although arch/sh/kernel/cpu/sh4a/setup-sh7722.c looks
>> _kind_ of like a model I can crib from:
>
>> B) That platform data is supplying sh_dmae_slave_config preallocating slave
>> channels to devices? (Does it have to? The docs gave me the impression the
>> driver would dynamically request them and devices could even share. Wasn't that
>> sort of the point of DMAEngine? Can my new board data _not_ do that? What's the
>> minimum amount of micromanaging I have to do?)
>
> The thing here is that arch/sh is way behind on the API use, and it
> has prevented us from cleaning up drivers as well. A slave driver
> should have to just call dma_request_chan() with a constant
> string to identify its channel rather than going two different ways
> depending on whether it's used with DT or platform data.

I got the DMA controller hooked up to DMAEngine and the dmatest module is happy
with the result on all 8 channels. (Finding
arch/sh/kernel/cpu/sh4a/setup-sh7722.c helped a lot, finding it earlier would
have helped more. :)

The config symbols are:

CONFIG_SH_DMA=y
CONFIG_DMADEVICES=y
CONFIG_SH_DMAE_BASE=y
CONFIG_SH_DMAE=y
CONFIG_DMATEST=y #optional

The platform data is:

#include <cpu/dma-register.h>
#include <cpu/dma.h>
#include <linux/sh_dma.h>

/* DMA */
static struct resource sh7760_dma_resources[] = {
{
.start = SH_DMAC_BASE0,
.end = SH_DMAC_BASE0 + 9*16 - 1,
.flags = IORESOURCE_MEM,
}, {
.start = DMTE0_IRQ,
.end = DMTE0_IRQ,
.flags = IORESOURCE_IRQ,
}
};

static struct sh_dmae_channel dma_chan[] = {
{
.offset = 0,
.dmars = 0,
.dmars_bit = 0,
}, {
.offset = 0x10,
.dmars = 0,
.dmars_bit = 8,
}, {
.offset = 0x20,
.dmars = 4,
.dmars_bit = 0,
}, {
.offset = 0x30,
.dmars = 4,
.dmars_bit = 8,
}, {
.offset = 0x50,
.dmars = 8,
.dmars_bit = 0,
}, {
.offset = 0x60,
.dmars = 8,
.dmars_bit = 8,
}, {
.offset = 0x70,
.dmars = 12,
.dmars_bit = 0,
}, {
.offset = 0x80,
.dmars = 12,
.dmars_bit = 8,
}
};

static const unsigned int ts_shift[] = TS_SHIFT;

static struct sh_dmae_pdata sh7760_dma_pdata = {
.channel = dma_chan,
.channel_num = ARRAY_SIZE(dma_chan),
.ts_low_shift = CHCR_TS_LOW_SHIFT,
.ts_low_mask = CHCR_TS_LOW_MASK,
.ts_high_shift = CHCR_TS_HIGH_SHIFT,
.ts_high_mask = CHCR_TS_HIGH_MASK,
.ts_shift = ts_shift,
.ts_shift_num = ARRAY_SIZE(ts_shift),
.dmaor_init = DMAOR_INIT,
.dmaor_is_32bit = 1,
};

struct platform_device sh7760_dma_device = {
.name = "sh-dma-engine",
.id = -1,
.num_resources = ARRAY_SIZE(sh7760_dma_resources),
.resource = sh7760_dma_resources,
.dev = {.platform_data = &sh7760_dma_pdata},
};


And then add sh7760_dma_device to your struct platform_device array.

> If you hack on it, please convert the dmaengine platform data to use
> a dma_slave_map array to pass the data into the dmaengine driver,

The dmatest module didn't need it? I don't see why the ethernet driver would?
(Isn't the point of an allocator to allocate from a request?)

> mapping the settings from a (pdev-name, channel-id) tuple to a pointer
> that describes the channel configuration rather than having the
> mapping from an numerical slave_id to a struct sh_dmae_slave_config
> in the setup files. It should be a fairly mechanical conversion.

I think all 8 channels are generic. Drivers should be able to grab them and
release them at will, why does it need a table?

(I say this not having made the smc91x.c driver use this yet, its "conversion"
to device tree left it full of PXA #ifdefs and constants, and I've tried the
last half-dozen kernel releases and qemu releases and have yet to find an arm
mainstone board under qemu that _doesn't_ time out trying to use DMA with this
card. But that's another post...)

> The other part I noticed is arch/sh/drivers/dma/*, which appears to
> be entirely unused, and should probably removed.

I had to switch that off to get this to work, yes. I believe it predates
dmaengine and was obsoleted by it.

> Arnd
>

Rob

2018-08-17 20:25:36

by Arnd Bergmann

[permalink] [raw]
Subject: Re: dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)

On Fri, Aug 17, 2018 at 7:04 PM Rob Landley <[email protected]> wrote:
> On 07/31/2018 07:56 AM, Arnd Bergmann wrote:
> > On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley <[email protected]> wrote:
> >> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
> >>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
> >>>> Hi all,
> > If you hack on it, please convert the dmaengine platform data to use
> > a dma_slave_map array to pass the data into the dmaengine driver,
>
> The dmatest module didn't need it? I don't see why the ethernet driver would?
> (Isn't the point of an allocator to allocate from a request?)

I guess you have hit two of the special cases here:

- dmatest uses the memory-to-memory DMA engine interface, not the slave
API, so you don't have to configure a slave at all

- smc91x (and its smc911x.c relative) are apparently special in that they
use they use the DMA slave API but (AFAICT) require programming
the dmaengine hardware into a memory-to-memory transfer with no
DMA slave request signal and completely synchronous operation
(the IRQ handler polls for the DMA descriptor to be complete),
see also https://lkml.org/lkml/2018/4/3/464 for the discussion about
the recent rework of that driver's implementation.

> > mapping the settings from a (pdev-name, channel-id) tuple to a pointer
> > that describes the channel configuration rather than having the
> > mapping from an numerical slave_id to a struct sh_dmae_slave_config
> > in the setup files. It should be a fairly mechanical conversion.
>
> I think all 8 channels are generic. Drivers should be able to grab them and
> release them at will, why does it need a table?
>
> (I say this not having made the smc91x.c driver use this yet, its "conversion"
> to device tree left it full of PXA #ifdefs and constants, and I've tried the

Another point about smc91x is that it only uses DMA on the PXA platform,
which is not part of the "multiplatform" ARM setup. It's likely that no
other platform actually has a DMA engine that can talk to this device in
the absence of a request signal, or that on more modern CPU cores,
a readsl() is actually just as fast, but it avoids the setup cost of talking
to the dma engine. Possibly both of the above.

> last half-dozen kernel releases and qemu releases and have yet to find an arm
> mainstone board under qemu that _doesn't_ time out trying to use DMA with this
> card. But that's another post...)

Is smc91x the only driver that you want to make use of the DMA engine?
I suspect that every other one currently relies on passing a slave ID
shdma_chan_filter into dma_request_slave_channel_compat() or
dma_request_channel() , which are some of the interfaces we want to
remove in the future, to make everything work the same across
all platforms.

shdma_chan_filter() is one of those that expect its pointer argument to
be a number that is in turn associated with an sh_dmae_slave_config
structure in the platform data of the dma engine. What the newer
dma_request_chan() interface does is to pass a pointer to the
slave device and a string as identifier for the same data, which then
gets associated through the dma_slave_map. On smc91x, both
the device and name argument are NULL, which triggers the special
case in the pxa dmaengine driver.

> > The other part I noticed is arch/sh/drivers/dma/*, which appears to
> > be entirely unused, and should probably removed.
>
> I had to switch that off to get this to work, yes. I believe it predates
> dmaengine and was obsoleted by it.

Ok. Have you found any reason to keep it around though?

Arnd

2018-08-19 05:39:37

by Rob Landley

[permalink] [raw]
Subject: Re: dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)

On 08/17/2018 03:23 PM, Arnd Bergmann wrote:
> On Fri, Aug 17, 2018 at 7:04 PM Rob Landley <[email protected]> wrote:
>> On 07/31/2018 07:56 AM, Arnd Bergmann wrote:
>>> On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley <[email protected]> wrote:
>>>> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
>>>>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
>>>>>> Hi all,
>>> If you hack on it, please convert the dmaengine platform data to use
>>> a dma_slave_map array to pass the data into the dmaengine driver,
>>
>> The dmatest module didn't need it? I don't see why the ethernet driver would?
>> (Isn't the point of an allocator to allocate from a request?)
>
> I guess you have hit two of the special cases here:
>
> - dmatest uses the memory-to-memory DMA engine interface, not the slave
> API, so you don't have to configure a slave at all

I've read through
https://www.kernel.org/doc/Documentation/driver-api/dmaengine/client.rst twice
and am still very unclear on the slave API.

> - smc91x (and its smc911x.c relative) are apparently special in that they
> use they use the DMA slave API

Only sort of. In 4.14 at least it's under #ifdef ARCH_PXA and full of PXA
constants (PXAD_PRIO_LOWEST and such).

> but (AFAICT) require programming
> the dmaengine hardware into a memory-to-memory transfer with no
> DMA slave request signal and completely synchronous operation
> (the IRQ handler polls for the DMA descriptor to be complete),
> see also https://lkml.org/lkml/2018/4/3/464 for the discussion about
> the recent rework of that driver's implementation.

Bookmarked, thanks.

(Being able to just upgrade to a 4.19 kernel or something and have DMA work in
this driver if I've got dmaengine set up for the platform would be lovely.)

>>> mapping the settings from a (pdev-name, channel-id) tuple to a pointer
>>> that describes the channel configuration rather than having the
>>> mapping from an numerical slave_id to a struct sh_dmae_slave_config
>>> in the setup files. It should be a fairly mechanical conversion.
>>
>> I think all 8 channels are generic. Drivers should be able to grab them and
>> release them at will, why does it need a table?
>>
>> (I say this not having made the smc91x.c driver use this yet, its "conversion"
>> to device tree left it full of PXA #ifdefs and constants, and I've tried the
>
> Another point about smc91x is that it only uses DMA on the PXA platform,
> which is not part of the "multiplatform" ARM setup. It's likely that no
> other platform actually has a DMA engine that can talk to this device in
> the absence of a request signal, or that on more modern CPU cores,
> a readsl() is actually just as fast, but it avoids the setup cost of talking
> to the dma engine. Possibly both of the above.

The sh7760 has the CPU pegged at 100% trying to keep up with ethernet traffic.
Being able to use DMA on this would be very nice.

>> last half-dozen kernel releases and qemu releases and have yet to find an arm
>> mainstone board under qemu that _doesn't_ time out trying to use DMA with this
>> card. But that's another post...)
>
> Is smc91x the only driver that you want to make use of the DMA engine?

This driver's the low-hanging fruit, yeah. Copying NOR flash jffs2 data into
page cache would be nice but there's a decompression step so I'm not sure that's
a win.

> I suspect that every other one currently relies on passing a slave ID
> shdma_chan_filter into dma_request_slave_channel_compat() or
> dma_request_channel() , which are some of the interfaces we want to
> remove in the future, to make everything work the same across
> all platforms.

What are "all platforms" in this context? I tried to find an x86 variant that
uses DMAEngine but came up empty. Can I use DMAEngine on a raspberry pi perhaps?
Is there a QEMU taret I can play with DMAEngine under?

I built a mainstone kernel with dmaengine amd smc91x both enabled, and booted it
on qemu-system-arm -M mainstone, and it works fine until I try to ping the host
(via the 10.0.2.2 redirect), at which point no packets are received and a timer
expires all over the console a few seconds later. I.E. the DMA claims to be
there, but the transfer never occurs.

I built and tested every Linux version back to 4.2 (before the smc91x was
converted from PXA dma to use DMAEngine, albeit in a very PXA specific manner).
I also tested each qemu release back to 2.3.0, with no obvious behavioral
difference.

I can dig further back in qemu history maybe? Ask on the qemu list? (Did this
ever work for anyone? I can post my kernel config and qemu command line if you
think it would help?)

> shdma_chan_filter() is one of those that expect its pointer argument to
> be a number that is in turn associated with an sh_dmae_slave_config
> structure in the platform data of the dma engine. What the newer
> dma_request_chan() interface does is to pass a pointer to the
> slave device and a string as identifier for the same data, which then
> gets associated through the dma_slave_map. On smc91x, both
> the device and name argument are NULL, which triggers the special
> case in the pxa dmaengine driver.

I do not understand what the slave map is for, is it documented anywhere? (The
Documentation/randomdir/dmaengine/client.nolongertxt file says: "The association
is done via DT, ACPI or board file based dma_slave_map matching table." which is
its only mention of the existence of dma_slave_map.

If the driver just needs "a channel" and doesn't care which one, why isn't the
config info for that channel in the driver as a generic request for resource?

>>> The other part I noticed is arch/sh/drivers/dma/*, which appears to
>>> be entirely unused, and should probably removed.
>>
>> I had to switch that off to get this to work, yes. I believe it predates
>> dmaengine and was obsoleted by it.
>
> Ok. Have you found any reason to keep it around though?

I have not.

> Arnd

Rob

2018-08-20 12:35:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)

On Sun, Aug 19, 2018 at 7:38 AM Rob Landley <[email protected]> wrote:
>
> On 08/17/2018 03:23 PM, Arnd Bergmann wrote:
> > On Fri, Aug 17, 2018 at 7:04 PM Rob Landley <[email protected]> wrote:
> >> On 07/31/2018 07:56 AM, Arnd Bergmann wrote:
> >>> On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley <[email protected]> wrote:
> >>>> On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
> >>>>> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
> >>>>>> Hi all,
> >>> If you hack on it, please convert the dmaengine platform data to use
> >>> a dma_slave_map array to pass the data into the dmaengine driver,
> >>
> >> The dmatest module didn't need it? I don't see why the ethernet driver would?
> >> (Isn't the point of an allocator to allocate from a request?)
> >
> > I guess you have hit two of the special cases here:
> >
> > - dmatest uses the memory-to-memory DMA engine interface, not the slave
> > API, so you don't have to configure a slave at all
>
> I've read through
> https://www.kernel.org/doc/Documentation/driver-api/dmaengine/client.rst twice
> and am still very unclear on the slave API.
>
> > - smc91x (and its smc911x.c relative) are apparently special in that they
> > use they use the DMA slave API
>
> Only sort of. In 4.14 at least it's under #ifdef ARCH_PXA and full of PXA
> constants (PXAD_PRIO_LOWEST and such).
>
> > but (AFAICT) require programming
> > the dmaengine hardware into a memory-to-memory transfer with no
> > DMA slave request signal and completely synchronous operation
> > (the IRQ handler polls for the DMA descriptor to be complete),
> > see also https://lkml.org/lkml/2018/4/3/464 for the discussion about
> > the recent rework of that driver's implementation.
>
> Bookmarked, thanks.
>
> (Being able to just upgrade to a 4.19 kernel or something and have DMA work in
> this driver if I've got dmaengine set up for the platform would be lovely.)

I wouldn't expect too much even with the newer kernel, I think it
still relies on a special case in the pxa DMA engine driver, possibly
even in their hardware implementation.

> >>> mapping the settings from a (pdev-name, channel-id) tuple to a pointer
> >>> that describes the channel configuration rather than having the
> >>> mapping from an numerical slave_id to a struct sh_dmae_slave_config
> >>> in the setup files. It should be a fairly mechanical conversion.
> >>
> >> I think all 8 channels are generic. Drivers should be able to grab them and
> >> release them at will, why does it need a table?
> >>
> >> (I say this not having made the smc91x.c driver use this yet, its "conversion"
> >> to device tree left it full of PXA #ifdefs and constants, and I've tried the
> >
> > Another point about smc91x is that it only uses DMA on the PXA platform,
> > which is not part of the "multiplatform" ARM setup. It's likely that no
> > other platform actually has a DMA engine that can talk to this device in
> > the absence of a request signal, or that on more modern CPU cores,
> > a readsl() is actually just as fast, but it avoids the setup cost of talking
> > to the dma engine. Possibly both of the above.
>
> The sh7760 has the CPU pegged at 100% trying to keep up with ethernet traffic.
> Being able to use DMA on this would be very nice.

This is probably for the most part due to the rather slow bus interface
of the smc91x, especially if you can't use the 32-bit mode or an optimized
readsl() implementation.

Using DMA won't let you do the transfer in the background either, as it
would on any other ethernet hardware, it just means the CPU is blocked
for a little less time if the DMA engine can access the bus faster
than the readsl() implementation can on your CPU.

> >> last half-dozen kernel releases and qemu releases and have yet to find an arm
> >> mainstone board under qemu that _doesn't_ time out trying to use DMA with this
> >> card. But that's another post...)
> >
> > Is smc91x the only driver that you want to make use of the DMA engine?
>
> This driver's the low-hanging fruit, yeah. Copying NOR flash jffs2 data into
> page cache would be nice but there's a decompression step so I'm not sure that's
> a win.

Right, that would be even harder. The devices that are actually designed
for interacting with the DMA engine are likely MMC, USB and audio on
that chip. Those should be easier to do than the smc91x.

> > I suspect that every other one currently relies on passing a slave ID
> > shdma_chan_filter into dma_request_slave_channel_compat() or
> > dma_request_channel() , which are some of the interfaces we want to
> > remove in the future, to make everything work the same across
> > all platforms.
>
> What are "all platforms" in this context? I tried to find an x86 variant that
> uses DMAEngine but came up empty. Can I use DMAEngine on a raspberry pi perhaps?
> Is there a QEMU taret I can play with DMAEngine under?

Most ARM SoCs these days have a DMA engine that only uses the new
style interface with dma_request_chan() or dma_request_slave_channel().
This includes the raspberry pi, or many of the machines supported by qemu
(not sure which DMA engines are supported on qemu specifcally, I would
guess vexpress/realview, omap, and allwinner).

On x86, only some of the embedded Atom chips have a DMA engine, but
it's integrated in a more complex manner using a mix of PCI probing and
ACPI, so probably not worth looking at as an example for any other architecture.

The platforms that still use the old interface (dma_request_channel or
dma_request_slave_channel_compat) are

- atmel (should be changed over now that arch/avr32 is gone)
- PXA/MMP (patches are being worked on
- ep93xx
- ux500
- MIPS pic32 (for no reason I can see, should be changed now)
- tegra (not really, can be trivially cleaned up)
- imx3 (only for framebuffer on non-DT platforms)
- sh/shmobile (hard to do without testing arch/sh)

> I built a mainstone kernel with dmaengine amd smc91x both enabled, and booted it
> on qemu-system-arm -M mainstone, and it works fine until I try to ping the host
> (via the 10.0.2.2 redirect), at which point no packets are received and a timer
> expires all over the console a few seconds later. I.E. the DMA claims to be
> there, but the transfer never occurs.
>
> I built and tested every Linux version back to 4.2 (before the smc91x was
> converted from PXA dma to use DMAEngine, albeit in a very PXA specific manner).
> I also tested each qemu release back to 2.3.0, with no obvious behavioral
> difference.
>
> I can dig further back in qemu history maybe? Ask on the qemu list? (Did this
> ever work for anyone? I can post my kernel config and qemu command line if you
> think it would help?)

No idea really. This is not a popular platform at any rate, I wouldn't be
surprised if it hasn't worked in a long time.

> > shdma_chan_filter() is one of those that expect its pointer argument to
> > be a number that is in turn associated with an sh_dmae_slave_config
> > structure in the platform data of the dma engine. What the newer
> > dma_request_chan() interface does is to pass a pointer to the
> > slave device and a string as identifier for the same data, which then
> > gets associated through the dma_slave_map. On smc91x, both
> > the device and name argument are NULL, which triggers the special
> > case in the pxa dmaengine driver.
>
> I do not understand what the slave map is for, is it documented anywhere? (The
> Documentation/randomdir/dmaengine/client.nolongertxt file says: "The association
> is done via DT, ACPI or board file based dma_slave_map matching table." which is
> its only mention of the existence of dma_slave_map.
>
> If the driver just needs "a channel" and doesn't care which one, why isn't the
> config info for that channel in the driver as a generic request for resource?

AFAICT, the dmaengine API doesn't really support a case of the slave API
without specifying a slave, only PXA with smc91x needed that until
now. ;-)

> >>> The other part I noticed is arch/sh/drivers/dma/*, which appears to
> >>> be entirely unused, and should probably removed.
> >>
> >> I had to switch that off to get this to work, yes. I believe it predates
> >> dmaengine and was obsoleted by it.
> >
> > Ok. Have you found any reason to keep it around though?
>
> I have not.

Ok. Unless someone else does it first, I might send a patch to remove
it then. I'm also planning to send a patch to remove the broken
sh5 support, which is getting in the way of some of my y2038 work.

Arnd