2018-04-20 08:06:10

by Christoph Hellwig

[permalink] [raw]
Subject:

To: [email protected]
Cc: [email protected]
Cc: Michal Simek <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: [RFC] common non-cache coherent direct dma mapping ops

Hi all,

this series continues consolidating the dma-mapping code, with a focus
on architectures that do not (always) provide cache coherence for DMA.
Three architectures (arm, mips and powerpc) are still left to be
converted later due to complexity of their dma ops selection.

The dma-noncoherent ops calls the dma-direct ops for the actual
translation of streaming mappins and allow the architecture to provide
any cache flushing required for cpu to device and/or device to cpu
ownership transfers. The dma coherent allocator is for now still left
entirely to architecture supplied implementations due the amount of
variations. Hopefully we can do some consolidation for them later on
as well.

A lot of architectures are currently doing very questionable things
in their dma mapping routines, which are documented in the changelogs
for each patch. Please review them very careful and correct me on
incorrect assumptions.

Because this series sits on top of two previously submitted series
a git tree might be useful to actually test it. It is provided here:

git://git.infradead.org/users/hch/misc.git generic-dma-noncoherent

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/generic-dma-noncoherent


2018-04-20 08:04:57

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 01/22] dma-debug: move initialization to common code

Most mainstream architectures are using 65536 entries, so lets stick to
that. If someone is really desperate to override it that can still be
done through <asm/dma-mapping.h>, but I'd rather see a really good
rationale for that.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/arm/mm/dma-mapping-nommu.c | 9 ---------
arch/arm/mm/dma-mapping.c | 9 ---------
arch/arm64/mm/dma-mapping.c | 10 ----------
arch/c6x/kernel/dma.c | 11 -----------
arch/ia64/kernel/dma-mapping.c | 10 ----------
arch/microblaze/kernel/dma.c | 11 -----------
arch/mips/mm/dma-default.c | 10 ----------
arch/openrisc/kernel/dma.c | 11 -----------
arch/powerpc/kernel/dma.c | 3 ---
arch/s390/pci/pci_dma.c | 9 ---------
arch/sh/mm/consistent.c | 9 ---------
arch/sparc/kernel/Makefile | 2 --
arch/sparc/kernel/dma.c | 13 -------------
arch/x86/kernel/pci-dma.c | 4 ----
arch/xtensa/kernel/pci-dma.c | 9 ---------
include/linux/dma-debug.h | 6 ------
lib/dma-debug.c | 21 ++++++++++++++-------
17 files changed, 14 insertions(+), 143 deletions(-)
delete mode 100644 arch/sparc/kernel/dma.c

diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 619f24a42d09..f448a0663b10 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -241,12 +241,3 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
void arch_teardown_dma_ops(struct device *dev)
{
}
-
-#define PREALLOC_DMA_DEBUG_ENTRIES 4096
-
-static int __init dma_debug_do_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-core_initcall(dma_debug_do_init);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8c398fedbbb6..c26bf83f44ca 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1165,15 +1165,6 @@ int arm_dma_supported(struct device *dev, u64 mask)
return __dma_supported(dev, mask, false);
}

-#define PREALLOC_DMA_DEBUG_ENTRIES 4096
-
-static int __init dma_debug_do_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-core_initcall(dma_debug_do_init);
-
#ifdef CONFIG_ARM_DMA_USE_IOMMU

static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a96ec0181818..db01f2709842 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -508,16 +508,6 @@ static int __init arm64_dma_init(void)
}
arch_initcall(arm64_dma_init);

-#define PREALLOC_DMA_DEBUG_ENTRIES 4096
-
-static int __init dma_debug_do_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-fs_initcall(dma_debug_do_init);
-
-
#ifdef CONFIG_IOMMU_DMA
#include <linux/dma-iommu.h>
#include <linux/platform_device.h>
diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c
index 9fff8be75f58..31e1a9ec3a9c 100644
--- a/arch/c6x/kernel/dma.c
+++ b/arch/c6x/kernel/dma.c
@@ -136,14 +136,3 @@ const struct dma_map_ops c6x_dma_ops = {
.sync_sg_for_cpu = c6x_dma_sync_sg_for_cpu,
};
EXPORT_SYMBOL(c6x_dma_ops);
-
-/* Number of entries preallocated for DMA-API debugging */
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
-
- return 0;
-}
-fs_initcall(dma_init);
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index f2d57e66fd86..7a471d8d67d4 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -9,16 +9,6 @@ int iommu_detected __read_mostly;
const struct dma_map_ops *dma_ops;
EXPORT_SYMBOL(dma_ops);

-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
-
- return 0;
-}
-fs_initcall(dma_init);
-
const struct dma_map_ops *dma_get_ops(struct device *dev)
{
return dma_ops;
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index c91e8cef98dd..3145e7dc8ab1 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -184,14 +184,3 @@ const struct dma_map_ops dma_nommu_ops = {
.sync_sg_for_device = dma_nommu_sync_sg_for_device,
};
EXPORT_SYMBOL(dma_nommu_ops);
-
-/* Number of entries preallocated for DMA-API debugging */
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
-
- return 0;
-}
-fs_initcall(dma_init);
diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index dcafa43613b6..f9fef0028ca2 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -402,13 +402,3 @@ static const struct dma_map_ops mips_default_dma_map_ops = {

const struct dma_map_ops *mips_dma_map_ops = &mips_default_dma_map_ops;
EXPORT_SYMBOL(mips_dma_map_ops);
-
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init mips_dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
-
- return 0;
-}
-fs_initcall(mips_dma_init);
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index a945f00011b4..ec7fd45704d2 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -247,14 +247,3 @@ const struct dma_map_ops or1k_dma_map_ops = {
.sync_single_for_device = or1k_sync_single_for_device,
};
EXPORT_SYMBOL(or1k_dma_map_ops);
-
-/* Number of entries preallocated for DMA-API debugging */
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
-
- return 0;
-}
-fs_initcall(dma_init);
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index da20569de9d4..138157deeadf 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -309,8 +309,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
}
EXPORT_SYMBOL(dma_set_coherent_mask);

-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
int dma_set_mask(struct device *dev, u64 dma_mask)
{
if (ppc_md.dma_set_mask)
@@ -361,7 +359,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);

static int __init dma_init(void)
{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
#ifdef CONFIG_PCI
dma_debug_add_bus(&pci_bus_type);
#endif
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 10abf5ed6187..d387a0fbdd7e 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -668,15 +668,6 @@ void zpci_dma_exit(void)
kmem_cache_destroy(dma_region_table_cache);
}

-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init dma_debug_do_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-fs_initcall(dma_debug_do_init);
-
const struct dma_map_ops s390_pci_dma_ops = {
.alloc = s390_dma_alloc,
.free = s390_dma_free,
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index 6ea3aab508f2..a7bff3f29c2b 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -20,18 +20,9 @@
#include <asm/cacheflush.h>
#include <asm/addrspace.h>

-#define PREALLOC_DMA_DEBUG_ENTRIES 4096
-
const struct dma_map_ops *dma_ops;
EXPORT_SYMBOL(dma_ops);

-static int __init dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-fs_initcall(dma_init);
-
void *dma_generic_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs)
diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
index a284662b0e4c..cf8640841b7a 100644
--- a/arch/sparc/kernel/Makefile
+++ b/arch/sparc/kernel/Makefile
@@ -74,8 +74,6 @@ obj-$(CONFIG_SPARC64) += pcr.o
obj-$(CONFIG_SPARC64) += nmi.o
obj-$(CONFIG_SPARC64_SMP) += cpumap.o

-obj-y += dma.o
-
obj-$(CONFIG_PCIC_PCI) += pcic.o
obj-$(CONFIG_LEON_PCI) += leon_pci.o
obj-$(CONFIG_SPARC_GRPCI2)+= leon_pci_grpci2.o
diff --git a/arch/sparc/kernel/dma.c b/arch/sparc/kernel/dma.c
deleted file mode 100644
index f73e7597c971..000000000000
--- a/arch/sparc/kernel/dma.c
+++ /dev/null
@@ -1,13 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/kernel.h>
-#include <linux/dma-mapping.h>
-#include <linux/dma-debug.h>
-
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 15)
-
-static int __init dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-fs_initcall(dma_init);
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 77625b60a510..bcbaa2e8031e 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -55,9 +55,6 @@ struct device x86_dma_fallback_dev = {
};
EXPORT_SYMBOL(x86_dma_fallback_dev);

-/* Number of entries preallocated for DMA-API debugging */
-#define PREALLOC_DMA_DEBUG_ENTRIES 65536
-
void __init pci_iommu_alloc(void)
{
struct iommu_table_entry *p;
@@ -189,7 +186,6 @@ EXPORT_SYMBOL(arch_dma_supported);
static int __init pci_iommu_init(void)
{
struct iommu_table_entry *p;
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);

#ifdef CONFIG_PCI
dma_debug_add_bus(&pci_bus_type);
diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 732631ce250f..392b4a80ebc2 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -261,12 +261,3 @@ const struct dma_map_ops xtensa_dma_map_ops = {
.mapping_error = xtensa_dma_mapping_error,
};
EXPORT_SYMBOL(xtensa_dma_map_ops);
-
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-
-static int __init xtensa_dma_init(void)
-{
- dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
- return 0;
-}
-fs_initcall(xtensa_dma_init);
diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index c7d844f09c3a..a785f2507159 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -30,8 +30,6 @@ struct bus_type;

extern void dma_debug_add_bus(struct bus_type *bus);

-extern void dma_debug_init(u32 num_entries);
-
extern int dma_debug_resize_entries(u32 num_entries);

extern void debug_dma_map_page(struct device *dev, struct page *page,
@@ -100,10 +98,6 @@ static inline void dma_debug_add_bus(struct bus_type *bus)
{
}

-static inline void dma_debug_init(u32 num_entries)
-{
-}
-
static inline int dma_debug_resize_entries(u32 num_entries)
{
return 0;
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 7f5cdc1e6b29..712a897174e4 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -41,6 +41,11 @@
#define HASH_FN_SHIFT 13
#define HASH_FN_MASK (HASH_SIZE - 1)

+/* allow architectures to override this if absolutely required */
+#ifndef PREALLOC_DMA_DEBUG_ENTRIES
+#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
+#endif
+
enum {
dma_debug_single,
dma_debug_page,
@@ -1004,18 +1009,16 @@ void dma_debug_add_bus(struct bus_type *bus)
bus_register_notifier(bus, nb);
}

-/*
- * Let the architectures decide how many entries should be preallocated.
- */
-void dma_debug_init(u32 num_entries)
+static int dma_debug_init(void)
{
+ u32 num_entries;
int i;

/* Do not use dma_debug_initialized here, since we really want to be
* called to set dma_debug_initialized
*/
if (global_disable)
- return;
+ return 0;

for (i = 0; i < HASH_SIZE; ++i) {
INIT_LIST_HEAD(&dma_entry_hash[i].list);
@@ -1026,17 +1029,19 @@ void dma_debug_init(u32 num_entries)
pr_err("DMA-API: error creating debugfs entries - disabling\n");
global_disable = true;

- return;
+ return 0;
}

if (req_entries)
num_entries = req_entries;
+ else
+ num_entries = PREALLOC_DMA_DEBUG_ENTRIES;

if (prealloc_memory(num_entries) != 0) {
pr_err("DMA-API: debugging out of memory error - disabled\n");
global_disable = true;

- return;
+ return 0;
}

nr_total_entries = num_free_entries;
@@ -1044,7 +1049,9 @@ void dma_debug_init(u32 num_entries)
dma_debug_initialized = true;

pr_info("DMA-API: debugging enabled by kernel config\n");
+ return 0;
}
+core_initcall(dma_debug_init);

static __init int dma_debug_cmdline(char *str)
{
--
2.17.0


2018-04-20 08:05:26

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 02/22] dma-mapping: simplify Kconfig dependencies

ARCH_DMA_ADDR_T_64BIT is always true for 64-bit architectures now, so we
can skip the clause requiring it. 'n' is the default default, so no need
to explicitly state it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
lib/Kconfig | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index 01a37920949c..726b0562caa7 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -443,13 +443,11 @@ config IOMMU_HELPER

config DMA_DIRECT_OPS
bool
- depends on HAS_DMA && (!64BIT || ARCH_DMA_ADDR_T_64BIT)
- default n
+ depends on HAS_DMA

config DMA_VIRT_OPS
bool
- depends on HAS_DMA && (!64BIT || ARCH_DMA_ADDR_T_64BIT)
- default n
+ depends on HAS_DMA

config ARCH_HAS_SWIOTLB
bool
--
2.17.0


2018-04-20 08:06:21

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 20/22] xtensa: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/xtensa/Kconfig | 3 +
arch/xtensa/include/asm/Kbuild | 1 +
arch/xtensa/include/asm/dma-mapping.h | 26 ------
arch/xtensa/kernel/pci-dma.c | 130 +++-----------------------
4 files changed, 19 insertions(+), 141 deletions(-)
delete mode 100644 arch/xtensa/include/asm/dma-mapping.h

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index c921e8bccdc8..6035845f30a7 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -5,11 +5,14 @@ config ZONE_DMA
config XTENSA
def_bool y
select ARCH_NO_COHERENT_DMA_MMAP if !MMU
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
+ select DMA_NONCOHERENT_OPS
select GENERIC_ATOMIC64
select GENERIC_CLOCKEVENTS
select GENERIC_IRQ_SHOW
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 436b20337168..a8d6cd3bee4b 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -2,6 +2,7 @@ generic-y += bug.h
generic-y += device.h
generic-y += div64.h
generic-y += dma-contiguous.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/xtensa/include/asm/dma-mapping.h b/arch/xtensa/include/asm/dma-mapping.h
deleted file mode 100644
index 44098800dad7..000000000000
--- a/arch/xtensa/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/*
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Copyright (C) 2003 - 2005 Tensilica Inc.
- * Copyright (C) 2015 Cadence Design Systems Inc.
- */
-
-#ifndef _XTENSA_DMA_MAPPING_H
-#define _XTENSA_DMA_MAPPING_H
-
-#include <asm/cache.h>
-#include <asm/io.h>
-
-#include <linux/mm.h>
-#include <linux/scatterlist.h>
-
-extern const struct dma_map_ops xtensa_dma_map_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &xtensa_dma_map_ops;
-}
-
-#endif /* _XTENSA_DMA_MAPPING_H */
diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 392b4a80ebc2..a83d60e92908 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -16,26 +16,24 @@
*/

#include <linux/dma-contiguous.h>
+#include <linux/dma-noncoherent.h>
#include <linux/dma-direct.h>
#include <linux/gfp.h>
#include <linux/highmem.h>
#include <linux/mm.h>
-#include <linux/module.h>
-#include <linux/pci.h>
-#include <linux/string.h>
#include <linux/types.h>
#include <asm/cacheflush.h>
#include <asm/io.h>

-static void do_cache_op(dma_addr_t dma_handle, size_t size,
+static void do_cache_op(phys_addr_t paddr, size_t size,
void (*fn)(unsigned long, unsigned long))
{
- unsigned long off = dma_handle & (PAGE_SIZE - 1);
- unsigned long pfn = PFN_DOWN(dma_handle);
+ unsigned long off = paddr & (PAGE_SIZE - 1);
+ unsigned long pfn = PFN_DOWN(paddr);
struct page *page = pfn_to_page(pfn);

if (!PageHighMem(page))
- fn((unsigned long)bus_to_virt(dma_handle), size);
+ fn((unsigned long)phys_to_virt(paddr), size);
else
while (size > 0) {
size_t sz = min_t(size_t, size, PAGE_SIZE - off);
@@ -49,14 +47,13 @@ static void do_cache_op(dma_addr_t dma_handle, size_t size,
}
}

-static void xtensa_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction dir)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
switch (dir) {
case DMA_BIDIRECTIONAL:
case DMA_FROM_DEVICE:
- do_cache_op(dma_handle, size, __invalidate_dcache_range);
+ do_cache_op(paddr, size, __invalidate_dcache_range);
break;

case DMA_NONE:
@@ -68,15 +65,14 @@ static void xtensa_sync_single_for_cpu(struct device *dev,
}
}

-static void xtensa_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
switch (dir) {
case DMA_BIDIRECTIONAL:
case DMA_TO_DEVICE:
if (XCHAL_DCACHE_IS_WRITEBACK)
- do_cache_op(dma_handle, size, __flush_dcache_range);
+ do_cache_op(paddr, size, __flush_dcache_range);
break;

case DMA_NONE:
@@ -88,40 +84,13 @@ static void xtensa_sync_single_for_device(struct device *dev,
}
}

-static void xtensa_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sg, int nents,
- enum dma_data_direction dir)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- xtensa_sync_single_for_cpu(dev, sg_dma_address(s),
- sg_dma_len(s), dir);
- }
-}
-
-static void xtensa_sync_sg_for_device(struct device *dev,
- struct scatterlist *sg, int nents,
- enum dma_data_direction dir)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- xtensa_sync_single_for_device(dev, sg_dma_address(s),
- sg_dma_len(s), dir);
- }
-}
-
/*
* Note: We assume that the full memory space is always mapped to 'kseg'
* Otherwise we have to use page attributes (not implemented).
*/

-static void *xtensa_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *handle, gfp_t flag,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+ gfp_t flag, unsigned long attrs)
{
unsigned long ret;
unsigned long uncached;
@@ -171,8 +140,8 @@ static void *xtensa_dma_alloc(struct device *dev, size_t size,
return (void *)uncached;
}

-static void xtensa_dma_free(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_handle, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
{
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
unsigned long addr = (unsigned long)vaddr;
@@ -192,72 +161,3 @@ static void xtensa_dma_free(struct device *dev, size_t size, void *vaddr,
if (!dma_release_from_contiguous(dev, page, count))
__free_pages(page, get_order(size));
}
-
-static dma_addr_t xtensa_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t dma_handle = page_to_phys(page) + offset;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- xtensa_sync_single_for_device(dev, dma_handle, size, dir);
-
- return dma_handle;
-}
-
-static void xtensa_unmap_page(struct device *dev, dma_addr_t dma_handle,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- xtensa_sync_single_for_cpu(dev, dma_handle, size, dir);
-}
-
-static int xtensa_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- s->dma_address = xtensa_map_page(dev, sg_page(s), s->offset,
- s->length, dir, attrs);
- }
- return nents;
-}
-
-static void xtensa_unmap_sg(struct device *dev,
- struct scatterlist *sg, int nents,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- xtensa_unmap_page(dev, sg_dma_address(s),
- sg_dma_len(s), dir, attrs);
- }
-}
-
-int xtensa_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
- return 0;
-}
-
-const struct dma_map_ops xtensa_dma_map_ops = {
- .alloc = xtensa_dma_alloc,
- .free = xtensa_dma_free,
- .map_page = xtensa_map_page,
- .unmap_page = xtensa_unmap_page,
- .map_sg = xtensa_map_sg,
- .unmap_sg = xtensa_unmap_sg,
- .sync_single_for_cpu = xtensa_sync_single_for_cpu,
- .sync_single_for_device = xtensa_sync_single_for_device,
- .sync_sg_for_cpu = xtensa_sync_sg_for_cpu,
- .sync_sg_for_device = xtensa_sync_sg_for_device,
- .mapping_error = xtensa_dma_mapping_error,
-};
-EXPORT_SYMBOL(xtensa_dma_map_ops);
--
2.17.0


2018-04-20 08:07:10

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Parisc previously had two different non-coherent dma ops implementation
that just different in the way coherent allocations were handled or not
handled. The different behavior is not selected at runtime in the
arch_dma_alloc and arch_dma_free routines. The non-coherent allocation
in the pcx cases now uses the dma_direct helpers that are a little more
sophisticated and used by a lot of other architectures.

Fix sync_single_for_cpu to do skip the cache flush unless the transfer
is to the device to match the more tested unmap_single path which should
have the same cache coherency implications.

This also now consistenly uses flush_kernel_dcache_range for cache
flushing while previously some of the SG based operations used
flush_kernel_vmap_range instead.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/parisc/Kconfig | 4 +
arch/parisc/include/asm/dma-mapping.h | 5 -
arch/parisc/kernel/pci-dma.c | 181 ++++----------------------
arch/parisc/kernel/setup.c | 8 +-
arch/parisc/mm/init.c | 11 +-
5 files changed, 35 insertions(+), 174 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 47047f0cbe35..80166a1cbcb7 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -188,6 +188,10 @@ config PA20
config PA11
def_bool y
depends on PA7000 || PA7100LC || PA7200 || PA7300LC
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select DMA_NONCOHERENT_OPS
+ select DMA_NONCOHERENT_CACHE_SYNC

config PREFETCH
def_bool y
diff --git a/arch/parisc/include/asm/dma-mapping.h b/arch/parisc/include/asm/dma-mapping.h
index 01e1fc057c83..44a9f97194aa 100644
--- a/arch/parisc/include/asm/dma-mapping.h
+++ b/arch/parisc/include/asm/dma-mapping.h
@@ -21,11 +21,6 @@
** flush/purge and allocate "regular" cacheable pages for everything.
*/

-#ifdef CONFIG_PA11
-extern const struct dma_map_ops pcxl_dma_ops;
-extern const struct dma_map_ops pcx_dma_ops;
-#endif
-
extern const struct dma_map_ops *hppa_dma_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 91bc0cac03a1..235e2e53959e 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -21,13 +21,12 @@
#include <linux/init.h>
#include <linux/gfp.h>
#include <linux/mm.h>
-#include <linux/pci.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/string.h>
#include <linux/types.h>
-#include <linux/scatterlist.h>
-#include <linux/export.h>
+#include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>

#include <asm/cacheflush.h>
#include <asm/dma.h> /* for DMA_CHUNK_SIZE */
@@ -447,178 +446,48 @@ static void pa11_dma_free(struct device *dev, size_t size, void *vaddr,
free_pages((unsigned long)__va(dma_handle), order);
}

-static dma_addr_t pa11_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction direction, unsigned long attrs)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- void *addr = page_address(page) + offset;
- BUG_ON(direction == DMA_NONE);
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- flush_kernel_dcache_range((unsigned long) addr, size);
-
- return virt_to_phys(addr);
+ flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
}

-static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
- size_t size, enum dma_data_direction direction,
- unsigned long attrs)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- BUG_ON(direction == DMA_NONE);
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- return;
-
- if (direction == DMA_TO_DEVICE)
+ if (dir == DMA_TO_DEVICE)
return;

/*
- * For PCI_DMA_FROMDEVICE this flush is not necessary for the
+ * For DMA_FROM_DEVICE this flush is not necessary for the
* simple map/unmap case. However, it IS necessary if if
- * pci_dma_sync_single_* has been called and the buffer reused.
+ * dma_sync_single_* has been called and the buffer reused.
*/

- flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle), size);
-}
-
-static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
- int i;
- struct scatterlist *sg;
-
- BUG_ON(direction == DMA_NONE);
-
- for_each_sg(sglist, sg, nents, i) {
- unsigned long vaddr = (unsigned long)sg_virt(sg);
-
- sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr);
- sg_dma_len(sg) = sg->length;
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- flush_kernel_dcache_range(vaddr, sg->length);
- }
- return nents;
+ flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
}

-static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
- int i;
- struct scatterlist *sg;
-
- BUG_ON(direction == DMA_NONE);
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- return;
-
- if (direction == DMA_TO_DEVICE)
- return;
-
- /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
-
- for_each_sg(sglist, sg, nents, i)
- flush_kernel_vmap_range(sg_virt(sg), sg->length);
-}
-
-static void pa11_dma_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- BUG_ON(direction == DMA_NONE);
-
- flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle),
- size);
-}
-
-static void pa11_dma_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- BUG_ON(direction == DMA_NONE);
-
- flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle),
- size);
-}
-
-static void pa11_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction)
-{
- int i;
- struct scatterlist *sg;
-
- /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
-
- for_each_sg(sglist, sg, nents, i)
- flush_kernel_vmap_range(sg_virt(sg), sg->length);
-}
-
-static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction)
-{
- int i;
- struct scatterlist *sg;
-
- /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
-
- for_each_sg(sglist, sg, nents, i)
- flush_kernel_vmap_range(sg_virt(sg), sg->length);
-}
-
-static void pa11_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
{
flush_kernel_dcache_range((unsigned long)vaddr, size);
}

-const struct dma_map_ops pcxl_dma_ops = {
- .alloc = pa11_dma_alloc,
- .free = pa11_dma_free,
- .map_page = pa11_dma_map_page,
- .unmap_page = pa11_dma_unmap_page,
- .map_sg = pa11_dma_map_sg,
- .unmap_sg = pa11_dma_unmap_sg,
- .sync_single_for_cpu = pa11_dma_sync_single_for_cpu,
- .sync_single_for_device = pa11_dma_sync_single_for_device,
- .sync_sg_for_cpu = pa11_dma_sync_sg_for_cpu,
- .sync_sg_for_device = pa11_dma_sync_sg_for_device,
- .cache_sync = pa11_dma_cache_sync,
-};
-
-static void *pcx_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
- void *addr;
-
- if ((attrs & DMA_ATTR_NON_CONSISTENT) == 0)
- return NULL;
-
- addr = (void *)__get_free_pages(flag, get_order(size));
- if (addr)
- *dma_handle = (dma_addr_t)virt_to_phys(addr);
-
- return addr;
+ if (boot_cpu_data.cpu_type == pcxl2 || boot_cpu_data.cpu_type == pcxl)
+ return pa11_dma_alloc(dev, size, dma_handle, gfp, attrs);
+ if (attrs & DMA_ATTR_NON_CONSISTENT)
+ return dma_direct_alloc(dev, size, dma_handle, gfp, attrs);
+ return NULL;
}

-static void pcx_dma_free(struct device *dev, size_t size, void *vaddr,
- dma_addr_t iova, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_addr, unsigned long attrs)
{
- free_pages((unsigned long)vaddr, get_order(size));
- return;
+ if (boot_cpu_data.cpu_type == pcxl2 || boot_cpu_data.cpu_type == pcxl)
+ pa11_dma_free(dev, size, cpu_addr, dma_addr, attrs);
+ else
+ dma_direct_free(dev, size, cpu_addr, dma_addr, attrs);
}
-
-const struct dma_map_ops pcx_dma_ops = {
- .alloc = pcx_dma_alloc,
- .free = pcx_dma_free,
- .map_page = pa11_dma_map_page,
- .unmap_page = pa11_dma_unmap_page,
- .map_sg = pa11_dma_map_sg,
- .unmap_sg = pa11_dma_unmap_sg,
- .sync_single_for_cpu = pa11_dma_sync_single_for_cpu,
- .sync_single_for_device = pa11_dma_sync_single_for_device,
- .sync_sg_for_cpu = pa11_dma_sync_sg_for_cpu,
- .sync_sg_for_device = pa11_dma_sync_sg_for_device,
- .cache_sync = pa11_dma_cache_sync,
-};
diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
index 8d3a7b80ac42..4e87c35c22b7 100644
--- a/arch/parisc/kernel/setup.c
+++ b/arch/parisc/kernel/setup.c
@@ -97,14 +97,12 @@ void __init dma_ops_init(void)
panic( "PA-RISC Linux currently only supports machines that conform to\n"
"the PA-RISC 1.1 or 2.0 architecture specification.\n");

- case pcxs:
- case pcxt:
- hppa_dma_ops = &pcx_dma_ops;
- break;
case pcxl2:
pa7300lc_init();
case pcxl: /* falls through */
- hppa_dma_ops = &pcxl_dma_ops;
+ case pcxs:
+ case pcxt:
+ hppa_dma_ops = &dma_noncoherent_ops;
break;
default:
break;
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index cab32ee824d2..4ad91c28ecbe 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -19,7 +19,6 @@
#include <linux/gfp.h>
#include <linux/delay.h>
#include <linux/init.h>
-#include <linux/pci.h> /* for hppa_dma_ops and pcxl_dma_ops */
#include <linux/initrd.h>
#include <linux/swap.h>
#include <linux/unistd.h>
@@ -616,17 +615,13 @@ void __init mem_init(void)
free_all_bootmem();

#ifdef CONFIG_PA11
- if (hppa_dma_ops == &pcxl_dma_ops) {
+ if (boot_cpu_data.cpu_type == pcxl2 || boot_cpu_data.cpu_type == pcxl) {
pcxl_dma_start = (unsigned long)SET_MAP_OFFSET(MAP_START);
parisc_vmalloc_start = SET_MAP_OFFSET(pcxl_dma_start
+ PCXL_DMA_MAP_SIZE);
- } else {
- pcxl_dma_start = 0;
- parisc_vmalloc_start = SET_MAP_OFFSET(MAP_START);
- }
-#else
- parisc_vmalloc_start = SET_MAP_OFFSET(MAP_START);
+ } else
#endif
+ parisc_vmalloc_start = SET_MAP_OFFSET(MAP_START);

mem_init_print_info(NULL);

--
2.17.0


2018-04-20 08:07:20

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 18/22] sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case

This is a slight change in behavior as we avoid the detour through the
virtual mapping for the coherent allocator, but if this CPU really is
coherent that should be the right thing to do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/Kconfig | 1 +
arch/sh/include/asm/dma-mapping.h | 4 ++++
arch/sh/kernel/Makefile | 4 ++--
arch/sh/kernel/dma-nommu.c | 4 ----
4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 9417f70e008e..23e7432302b0 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -158,6 +158,7 @@ config SWAP_IO_SPACE
bool

config DMA_COHERENT
+ select DMA_DIRECT_OPS
bool

config DMA_NONCOHERENT
diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
index 149e71f95be7..1ebc6a4eb1c5 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -6,7 +6,11 @@ extern const struct dma_map_ops nommu_dma_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
+#ifdef CONFIG_DMA_NONCOHERENT
return &nommu_dma_ops;
+#else
+ return &dma_direct_ops;
+#endif
}

extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index dc80041f7363..cb5f1bfb52de 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -12,7 +12,7 @@ endif

CFLAGS_REMOVE_return_address.o = -pg

-obj-y := debugtraps.o dma-nommu.o dumpstack.o \
+obj-y := debugtraps.o dumpstack.o \
idle.o io.o irq.o irq_$(BITS).o kdebugfs.o \
machvec.o nmi_debug.o process.o \
process_$(BITS).o ptrace.o ptrace_$(BITS).o \
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE) += disassemble.o
obj-$(CONFIG_HIBERNATION) += swsusp.o
obj-$(CONFIG_DWARF_UNWINDER) += dwarf.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_callchain.o
-
+obj-$(CONFIG_DMA_NONCOHERENT) += dma-nommu.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o

ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index c0fff676e2e4..442922a9f8c0 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -48,7 +48,6 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg,
return nents;
}

-#ifdef CONFIG_DMA_NONCOHERENT
static void nommu_sync_single_for_device(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir)
{
@@ -64,16 +63,13 @@ static void nommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
for_each_sg(sg, s, nelems, i)
sh_sync_dma_for_device(sg_virt(s), s->length, dir);
}
-#endif

const struct dma_map_ops nommu_dma_ops = {
.alloc = dma_generic_alloc_coherent,
.free = dma_generic_free_coherent,
.map_page = nommu_map_page,
.map_sg = nommu_map_sg,
-#ifdef CONFIG_DMA_NONCOHERENT
.sync_single_for_device = nommu_sync_single_for_device,
.sync_sg_for_device = nommu_sync_sg_for_device,
-#endif
};
EXPORT_SYMBOL(nommu_dma_ops);
--
2.17.0


2018-04-20 08:09:20

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 21/22] sparc: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

This removes the previous sync_single_for_device implementation, which
looks bogus given that no syncing is happening in the similar but more
important map_single case.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sparc/Kconfig | 2 +
arch/sparc/include/asm/dma-mapping.h | 5 +-
arch/sparc/kernel/ioport.c | 151 ++-------------------------
3 files changed, 14 insertions(+), 144 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index c1cfc17eb504..5680eece3014 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -49,6 +49,8 @@ config SPARC

config SPARC32
def_bool !64BIT
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select DMA_NONCOHERENT_OPS
select GENERIC_ATOMIC64
select CLZ_TAB
select HAVE_UID16
diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
index 12ae33daf52f..e17566376934 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -7,7 +7,6 @@
#include <linux/dma-debug.h>

extern const struct dma_map_ops *dma_ops;
-extern const struct dma_map_ops pci32_dma_ops;

extern struct bus_type pci_bus_type;

@@ -15,11 +14,11 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
#ifdef CONFIG_SPARC_LEON
if (sparc_cpu_model == sparc_leon)
- return &pci32_dma_ops;
+ return &dma_noncoherent_ops;
#endif
#if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
if (bus == &pci_bus_type)
- return &pci32_dma_ops;
+ return &dma_noncoherent_ops;
#endif
return dma_ops;
}
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index 3bcef9ce74df..7954512c42e7 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -38,6 +38,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/scatterlist.h>
+#include <linux/dma-noncoherent.h>
#include <linux/of_device.h>

#include <asm/io.h>
@@ -434,9 +435,8 @@ arch_initcall(sparc_register_ioport);
/* Allocate and map kernel buffer using consistent mode DMA for a device.
* hwdev should be valid struct pci_dev pointer for PCI devices.
*/
-static void *pci32_alloc_coherent(struct device *dev, size_t len,
- dma_addr_t *pba, gfp_t gfp,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t len, dma_addr_t *pba, gfp_t gfp,
+ unsigned long attrs)
{
unsigned long len_total = PAGE_ALIGN(len);
void *va;
@@ -488,8 +488,8 @@ static void *pci32_alloc_coherent(struct device *dev, size_t len,
* References to the memory and mappings associated with cpu_addr/dma_addr
* past this call are illegal.
*/
-static void pci32_free_coherent(struct device *dev, size_t n, void *p,
- dma_addr_t ba, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t n, void *p, dma_addr_t ba,
+ unsigned long attrs)
{
struct resource *res;

@@ -519,146 +519,15 @@ static void pci32_free_coherent(struct device *dev, size_t n, void *p,
free_pages((unsigned long)phys_to_virt(ba), get_order(n));
}

-/*
- * Same as pci_map_single, but with pages.
- */
-static dma_addr_t pci32_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- /* IIep is write-through, not flushing. */
- return page_to_phys(page) + offset;
-}
-
-static void pci32_unmap_page(struct device *dev, dma_addr_t ba, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
-{
- if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- dma_make_coherent(ba, PAGE_ALIGN(size));
-}
-
-/* Map a set of buffers described by scatterlist in streaming
- * mode for DMA. This is the scatter-gather version of the
- * above pci_map_single interface. Here the scatter gather list
- * elements are each tagged with the appropriate dma address
- * and length. They are obtained via sg_dma_{address,length}(SG).
- *
- * NOTE: An implementation may be able to use a smaller number of
- * DMA address/length pairs than there are SG table elements.
- * (for example via virtual mapping capabilities)
- * The routine returns the number of addr/length pairs actually
- * used, at most nents.
- *
- * Device ownership issues as mentioned above for pci_map_single are
- * the same here.
- */
-static int pci32_map_sg(struct device *device, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *sg;
- int n;
-
- /* IIep is write-through, not flushing. */
- for_each_sg(sgl, sg, nents, n) {
- sg->dma_address = sg_phys(sg);
- sg->dma_length = sg->length;
- }
- return nents;
-}
-
-/* Unmap a set of streaming mode DMA translations.
- * Again, cpu read rules concerning calls here are the same as for
- * pci_unmap_single() above.
- */
-static void pci32_unmap_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *sg;
- int n;
-
- if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
- for_each_sg(sgl, sg, nents, n) {
- dma_make_coherent(sg_phys(sg), PAGE_ALIGN(sg->length));
- }
- }
-}
+/* IIep is write-through, not flushing on cpu to device transfer. */

-/* Make physical memory consistent for a single
- * streaming mode DMA translation before or after a transfer.
- *
- * If you perform a pci_map_single() but wish to interrogate the
- * buffer using the cpu, yet do not wish to teardown the PCI dma
- * mapping, you must call this function before doing so. At the
- * next point you give the PCI dma address back to the card, you
- * must first perform a pci_dma_sync_for_device, and then the
- * device again owns the buffer.
- */
-static void pci32_sync_single_for_cpu(struct device *dev, dma_addr_t ba,
- size_t size, enum dma_data_direction dir)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- if (dir != PCI_DMA_TODEVICE) {
- dma_make_coherent(ba, PAGE_ALIGN(size));
- }
-}
-
-static void pci32_sync_single_for_device(struct device *dev, dma_addr_t ba,
- size_t size, enum dma_data_direction dir)
-{
- if (dir != PCI_DMA_TODEVICE) {
- dma_make_coherent(ba, PAGE_ALIGN(size));
- }
+ if (dir != PCI_DMA_TODEVICE)
+ dma_make_coherent(paddr, PAGE_ALIGN(size));
}

-/* Make physical memory consistent for a set of streaming
- * mode DMA translations after a transfer.
- *
- * The same as pci_dma_sync_single_* but for a scatter-gather list,
- * same rules and usage.
- */
-static void pci32_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir)
-{
- struct scatterlist *sg;
- int n;
-
- if (dir != PCI_DMA_TODEVICE) {
- for_each_sg(sgl, sg, nents, n) {
- dma_make_coherent(sg_phys(sg), PAGE_ALIGN(sg->length));
- }
- }
-}
-
-static void pci32_sync_sg_for_device(struct device *device, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir)
-{
- struct scatterlist *sg;
- int n;
-
- if (dir != PCI_DMA_TODEVICE) {
- for_each_sg(sgl, sg, nents, n) {
- dma_make_coherent(sg_phys(sg), PAGE_ALIGN(sg->length));
- }
- }
-}
-
-/* note: leon re-uses pci32_dma_ops */
-const struct dma_map_ops pci32_dma_ops = {
- .alloc = pci32_alloc_coherent,
- .free = pci32_free_coherent,
- .map_page = pci32_map_page,
- .unmap_page = pci32_unmap_page,
- .map_sg = pci32_map_sg,
- .unmap_sg = pci32_unmap_sg,
- .sync_single_for_cpu = pci32_sync_single_for_cpu,
- .sync_single_for_device = pci32_sync_single_for_device,
- .sync_sg_for_cpu = pci32_sync_sg_for_cpu,
- .sync_sg_for_device = pci32_sync_sg_for_device,
-};
-EXPORT_SYMBOL(pci32_dma_ops);
-
const struct dma_map_ops *dma_ops = &sbus_dma_ops;
EXPORT_SYMBOL(dma_ops);

--
2.17.0


2018-04-20 08:09:32

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 19/22] sh: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/Kconfig | 3 +-
arch/sh/include/asm/Kbuild | 1 +
arch/sh/include/asm/dma-mapping.h | 26 -----------
arch/sh/kernel/Makefile | 1 -
arch/sh/kernel/dma-nommu.c | 75 -------------------------------
arch/sh/mm/consistent.c | 24 +++++-----
6 files changed, 13 insertions(+), 117 deletions(-)
delete mode 100644 arch/sh/include/asm/dma-mapping.h
delete mode 100644 arch/sh/kernel/dma-nommu.c

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 23e7432302b0..23ed245407b1 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -50,7 +50,6 @@ config SUPERH
select HAVE_ARCH_AUDITSYSCALL
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_NMI
- select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH

help
@@ -163,6 +162,8 @@ config DMA_COHERENT

config DMA_NONCOHERENT
def_bool !DMA_COHERENT
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select DMA_NONCOHERENT_OPS

config PGTABLE_LEVELS
default 3 if X2TLB
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 1efcce74997b..50f7e878ea1b 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -1,6 +1,7 @@
generic-y += current.h
generic-y += delay.h
generic-y += div64.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += irq_regs.h
diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
deleted file mode 100644
index 1ebc6a4eb1c5..000000000000
--- a/arch/sh/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_DMA_MAPPING_H
-#define __ASM_SH_DMA_MAPPING_H
-
-extern const struct dma_map_ops nommu_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-#ifdef CONFIG_DMA_NONCOHERENT
- return &nommu_dma_ops;
-#else
- return &dma_direct_ops;
-#endif
-}
-
-extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_addr, gfp_t flag,
- unsigned long attrs);
-extern void dma_generic_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs);
-
-void sh_sync_dma_for_device(void *vaddr, size_t size,
- enum dma_data_direction dir);
-
-#endif /* __ASM_SH_DMA_MAPPING_H */
diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index cb5f1bfb52de..985106e13614 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -45,7 +45,6 @@ obj-$(CONFIG_DUMP_CODE) += disassemble.o
obj-$(CONFIG_HIBERNATION) += swsusp.o
obj-$(CONFIG_DWARF_UNWINDER) += dwarf.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_callchain.o
-obj-$(CONFIG_DMA_NONCOHERENT) += dma-nommu.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o

ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
deleted file mode 100644
index 442922a9f8c0..000000000000
--- a/arch/sh/kernel/dma-nommu.c
+++ /dev/null
@@ -1,75 +0,0 @@
-/*
- * DMA mapping support for platforms lacking IOMMUs.
- *
- * Copyright (C) 2009 Paul Mundt
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- */
-#include <linux/dma-mapping.h>
-#include <linux/io.h>
-#include <asm/cacheflush.h>
-
-static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t addr = page_to_phys(page) + offset;
-
- WARN_ON(size == 0);
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- sh_sync_dma_for_device(page_address(page) + offset, size, dir);
-
- return addr;
-}
-
-static int nommu_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- WARN_ON(nents == 0 || sg[0].length == 0);
-
- for_each_sg(sg, s, nents, i) {
- BUG_ON(!sg_page(s));
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- sh_sync_dma_for_device(sg_virt(s), s->length, dir);
-
- s->dma_address = sg_phys(s);
- s->dma_length = s->length;
- }
-
- return nents;
-}
-
-static void nommu_sync_single_for_device(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir)
-{
- sh_sync_dma_for_device(phys_to_virt(addr), size, dir);
-}
-
-static void nommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
- int nelems, enum dma_data_direction dir)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nelems, i)
- sh_sync_dma_for_device(sg_virt(s), s->length, dir);
-}
-
-const struct dma_map_ops nommu_dma_ops = {
- .alloc = dma_generic_alloc_coherent,
- .free = dma_generic_free_coherent,
- .map_page = nommu_map_page,
- .map_sg = nommu_map_sg,
- .sync_single_for_device = nommu_sync_single_for_device,
- .sync_sg_for_device = nommu_sync_sg_for_device,
-};
-EXPORT_SYMBOL(nommu_dma_ops);
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index 516d481525e1..aaeeb74fdcc3 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -12,17 +12,15 @@
#include <linux/mm.h>
#include <linux/init.h>
#include <linux/platform_device.h>
-#include <linux/dma-mapping.h>
-#include <linux/dma-debug.h>
+#include <linux/dma-noncoherent.h>
#include <linux/io.h>
#include <linux/module.h>
#include <linux/gfp.h>
#include <asm/cacheflush.h>
#include <asm/addrspace.h>

-void *dma_generic_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
void *ret, *ret_nocache;
int order = get_order(size);
@@ -37,7 +35,7 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
* Pages from the page allocator may have data present in
* cache. So flush the cache before using uncached memory.
*/
- sh_sync_dma_for_device(ret, size, DMA_BIDIRECTIONAL);
+ __flush_purge_region(sh_cacheop_vaddr(ret), size);

ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
if (!ret_nocache) {
@@ -52,9 +50,8 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
return ret_nocache;
}

-void dma_generic_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
{
int order = get_order(size);
unsigned long pfn = dma_handle >> PAGE_SHIFT;
@@ -66,12 +63,12 @@ void dma_generic_free_coherent(struct device *dev, size_t size,
iounmap(vaddr);
}

-void sh_sync_dma_for_device(void *vaddr, size_t size,
- enum dma_data_direction direction)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- void *addr = sh_cacheop_vaddr(vaddr);
+ void *addr = sh_cacheop_vaddr(phys_to_virt(paddr));

- switch (direction) {
+ switch (dir) {
case DMA_FROM_DEVICE: /* invalidate only */
__flush_invalidate_region(addr, size);
break;
@@ -85,7 +82,6 @@ void sh_sync_dma_for_device(void *vaddr, size_t size,
BUG();
}
}
-EXPORT_SYMBOL(sh_sync_dma_for_device);

static int __init memchunk_setup(char *str)
{
--
2.17.0


2018-04-20 08:10:10

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 17/22] sh: introduce a sh_cacheop_vaddr helper

And use it in the maple bus code to avoid a dma API dependency.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/include/asm/cacheflush.h | 7 +++++++
arch/sh/mm/consistent.c | 5 +----
drivers/sh/maple/maple.c | 7 ++++---
3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index d103ab5a4e4b..b932e42ef028 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -101,5 +101,12 @@ void kunmap_coherent(void *kvaddr);

void cpu_cache_init(void);

+static inline void *sh_cacheop_vaddr(void *vaddr)
+{
+ if (__in_29bit_mode())
+ vaddr = (void *)CAC_ADDR((unsigned long)vaddr);
+ return vaddr;
+}
+
#endif /* __KERNEL__ */
#endif /* __ASM_SH_CACHEFLUSH_H */
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index af2f28572119..516d481525e1 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -69,10 +69,7 @@ void dma_generic_free_coherent(struct device *dev, size_t size,
void sh_sync_dma_for_device(void *vaddr, size_t size,
enum dma_data_direction direction)
{
- void *addr;
-
- addr = __in_29bit_mode() ?
- (void *)CAC_ADDR((unsigned long)vaddr) : vaddr;
+ void *addr = sh_cacheop_vaddr(vaddr);

switch (direction) {
case DMA_FROM_DEVICE: /* invalidate only */
diff --git a/drivers/sh/maple/maple.c b/drivers/sh/maple/maple.c
index 7525039d812c..c9c354bd713a 100644
--- a/drivers/sh/maple/maple.c
+++ b/drivers/sh/maple/maple.c
@@ -300,8 +300,8 @@ static void maple_send(void)
mutex_unlock(&maple_wlist_lock);
if (maple_packets > 0) {
for (i = 0; i < (1 << MAPLE_DMA_PAGES); i++)
- sh_sync_dma_for_device(maple_sendbuf + i * PAGE_SIZE,
- PAGE_SIZE, DMA_BIDIRECTIONAL);
+ __flush_purge_region(maple_sendbuf + i * PAGE_SIZE,
+ PAGE_SIZE);
}

finish:
@@ -642,7 +642,8 @@ static void maple_dma_handler(struct work_struct *work)
list_for_each_entry_safe(mq, nmq, &maple_sentq, list) {
mdev = mq->dev;
recvbuf = mq->recvbuf->buf;
- sh_sync_dma_for_device(recvbuf, 0x400, DMA_FROM_DEVICE);
+ __flush_invalidate_region(sh_cacheop_vaddr(recvbuf),
+ 0x400);
code = recvbuf[0];
kfree(mq->sendbuf);
list_del_init(&mq->list);
--
2.17.0


2018-04-20 08:10:23

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

This makes sure kmap_atomic_pfn is consistently used for access to
virtual addresses instead of either using the slower plain kmap
or blindly expecting page_address() to work.

This makes sure the cache_sync routines is called in the unmap_sg
case, to match the unmap_single and sync_{single,sg}_to_cpu cases.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/nds32/Kconfig | 3 +
arch/nds32/include/asm/Kbuild | 1 +
arch/nds32/include/asm/dma-mapping.h | 14 ---
arch/nds32/kernel/dma.c | 182 ++++++---------------------
4 files changed, 39 insertions(+), 161 deletions(-)
delete mode 100644 arch/nds32/include/asm/dma-mapping.h

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 249f38d3388f..67d0ac0a989c 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -5,10 +5,13 @@

config NDS32
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_WANT_FRAME_POINTERS if FTRACE
select CLKSRC_MMIO
select CLONE_BACKWARDS
select COMMON_CLK
+ select DMA_NONCOHERENT_OPS
select GENERIC_ATOMIC64
select GENERIC_CPU_DEVICES
select GENERIC_CLOCKEVENTS
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index 06bdf8167f5a..b3e951f805f8 100644
--- a/arch/nds32/include/asm/Kbuild
+++ b/arch/nds32/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += cputime.h
generic-y += device.h
generic-y += div64.h
generic-y += dma.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += errno.h
generic-y += exec.h
diff --git a/arch/nds32/include/asm/dma-mapping.h b/arch/nds32/include/asm/dma-mapping.h
deleted file mode 100644
index 2dd47d245c25..000000000000
--- a/arch/nds32/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,14 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (C) 2005-2017 Andes Technology Corporation
-
-#ifndef ASMNDS32_DMA_MAPPING_H
-#define ASMNDS32_DMA_MAPPING_H
-
-extern struct dma_map_ops nds32_dma_ops;
-
-static inline struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &nds32_dma_ops;
-}
-
-#endif
diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
index d291800fc621..688f1a03dee6 100644
--- a/arch/nds32/kernel/dma.c
+++ b/arch/nds32/kernel/dma.c
@@ -3,17 +3,14 @@

#include <linux/types.h>
#include <linux/mm.h>
-#include <linux/export.h>
#include <linux/string.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/io.h>
#include <linux/cache.h>
#include <linux/highmem.h>
#include <linux/slab.h>
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
-#include <asm/dma-mapping.h>
#include <asm/proc-fns.h>

/*
@@ -22,11 +19,6 @@
static pte_t *consistent_pte;
static DEFINE_RAW_SPINLOCK(consistent_lock);

-enum master_type {
- FOR_CPU = 0,
- FOR_DEVICE = 1,
-};
-
/*
* VM region handling support.
*
@@ -124,10 +116,8 @@ static struct arch_vm_region *vm_region_find(struct arch_vm_region *head,
return c;
}

-/* FIXME: attrs is not used. */
-static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t * handle, gfp_t gfp,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+ gfp_t gfp, unsigned long attrs)
{
struct page *page;
struct arch_vm_region *c;
@@ -232,8 +222,8 @@ static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
return NULL;
}

-static void nds32_dma_free(struct device *dev, size_t size, void *cpu_addr,
- dma_addr_t handle, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t handle, unsigned long attrs)
{
struct arch_vm_region *c;
unsigned long flags, addr;
@@ -333,145 +323,43 @@ static int __init consistent_init(void)
}

core_initcall(consistent_init);
-static void consistent_sync(void *vaddr, size_t size, int direction, int master_type);
-static dma_addr_t nds32_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- consistent_sync((void *)(page_address(page) + offset), size, dir, FOR_DEVICE);
- return page_to_phys(page) + offset;
-}
-
-static void nds32_dma_unmap_page(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- consistent_sync(phys_to_virt(handle), size, dir, FOR_CPU);
-}

-/*
- * Make an area consistent for devices.
- */
-static void consistent_sync(void *vaddr, size_t size, int direction, int master_type)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- unsigned long start = (unsigned long)vaddr;
- unsigned long end = start + size;
-
- if (master_type == FOR_CPU) {
- switch (direction) {
- case DMA_TO_DEVICE:
- break;
- case DMA_FROM_DEVICE:
- case DMA_BIDIRECTIONAL:
- cpu_dma_inval_range(start, end);
- break;
- default:
- BUG();
- }
- } else {
- /* FOR_DEVICE */
- switch (direction) {
- case DMA_FROM_DEVICE:
- break;
- case DMA_TO_DEVICE:
- case DMA_BIDIRECTIONAL:
- cpu_dma_wb_range(start, end);
- break;
- default:
- BUG();
- }
+ void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
+ unsigned long start = (unsigned long)addr;
+
+ switch (direction) {
+ case DMA_FROM_DEVICE:
+ break;
+ case DMA_TO_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ cpu_dma_wb_range(start, start + size);
+ break;
+ default:
+ BUG();
}
-}

-static int nds32_dma_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- int i;
-
- for (i = 0; i < nents; i++, sg++) {
- void *virt;
- unsigned long pfn;
- struct page *page = sg_page(sg);
-
- sg->dma_address = sg_phys(sg);
- pfn = page_to_pfn(page) + sg->offset / PAGE_SIZE;
- page = pfn_to_page(pfn);
- if (PageHighMem(page)) {
- virt = kmap_atomic(page);
- consistent_sync(virt, sg->length, dir, FOR_CPU);
- kunmap_atomic(virt);
- } else {
- if (sg->offset > PAGE_SIZE)
- panic("sg->offset:%08x > PAGE_SIZE\n",
- sg->offset);
- virt = page_address(page) + sg->offset;
- consistent_sync(virt, sg->length, dir, FOR_CPU);
- }
- }
- return nents;
-}
-
-static void nds32_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nhwentries, enum dma_data_direction dir,
- unsigned long attrs)
-{
+ kunmap_atomic(addr);
}

-static void
-nds32_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_CPU);
-}
-
-static void
-nds32_dma_sync_single_for_device(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir)
-{
- consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_DEVICE);
-}
-
-static void
-nds32_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents,
- enum dma_data_direction dir)
-{
- int i;
-
- for (i = 0; i < nents; i++, sg++) {
- char *virt =
- page_address((struct page *)sg->page_link) + sg->offset;
- consistent_sync(virt, sg->length, dir, FOR_CPU);
+ void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
+ unsigned long start = (unsigned long)addr;
+
+ switch (direction) {
+ case DMA_TO_DEVICE:
+ break;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ cpu_dma_inval_range(start, end);
+ break;
+ default:
+ BUG();
}
-}
-
-static void
-nds32_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir)
-{
- int i;

- for (i = 0; i < nents; i++, sg++) {
- char *virt =
- page_address((struct page *)sg->page_link) + sg->offset;
- consistent_sync(virt, sg->length, dir, FOR_DEVICE);
- }
+ kunmap_atomic(addr);
}
-
-struct dma_map_ops nds32_dma_ops = {
- .alloc = nds32_dma_alloc_coherent,
- .free = nds32_dma_free,
- .map_page = nds32_dma_map_page,
- .unmap_page = nds32_dma_unmap_page,
- .map_sg = nds32_dma_map_sg,
- .unmap_sg = nds32_dma_unmap_sg,
- .sync_single_for_device = nds32_dma_sync_single_for_device,
- .sync_single_for_cpu = nds32_dma_sync_single_for_cpu,
- .sync_sg_for_cpu = nds32_dma_sync_sg_for_cpu,
- .sync_sg_for_device = nds32_dma_sync_sg_for_device,
-};
-
-EXPORT_SYMBOL(nds32_dma_ops);
--
2.17.0


2018-04-20 08:10:51

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 15/22] openrisc: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Fix sync_single_for_device to do the same cache coherency operations as
the more tested map_single path, as both should transfer ownership to
the device.

Remove the sync_single_for_cpu implementation as no cache coherency
operations are used in the more commonly used unmap_single case, both
of which transfer ownership to the CPU.

Implement the missing sync_sg_for_device operation, matching the cache
coherency operations in sync_single_for_device and map_{single,sg}.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/openrisc/Kconfig | 2 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/dma-mapping.h | 35 --------
arch/openrisc/kernel/dma.c | 109 +++---------------------
4 files changed, 13 insertions(+), 134 deletions(-)
delete mode 100644 arch/openrisc/include/asm/dma-mapping.h

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 9ecad05bfc73..65e3c574c9d3 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -6,6 +6,8 @@

config OPENRISC
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select DMA_NONCOHERENT_OPS
select OF
select OF_EARLY_FLATTREE
select IRQ_DOMAIN
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index f05c722a21f8..e663a996b612 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -6,6 +6,7 @@ generic-y += current.h
generic-y += device.h
generic-y += div64.h
generic-y += dma.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/openrisc/include/asm/dma-mapping.h b/arch/openrisc/include/asm/dma-mapping.h
deleted file mode 100644
index e212a1f0b6d2..000000000000
--- a/arch/openrisc/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,35 +0,0 @@
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others. All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2010-2011 Jonas Bonn <[email protected]>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#ifndef __ASM_OPENRISC_DMA_MAPPING_H
-#define __ASM_OPENRISC_DMA_MAPPING_H
-
-/*
- * See Documentation/DMA-API-HOWTO.txt and
- * Documentation/DMA-API.txt for documentation.
- */
-
-#include <linux/dma-debug.h>
-#include <linux/dma-mapping.h>
-
-extern const struct dma_map_ops or1k_dma_map_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &or1k_dma_map_ops;
-}
-
-#endif /* __ASM_OPENRISC_DMA_MAPPING_H */
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index ec7fd45704d2..cce99405edf4 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -19,9 +19,7 @@
* the only thing implemented properly. The rest need looking into...
*/

-#include <linux/dma-mapping.h>
-#include <linux/dma-debug.h>
-#include <linux/export.h>
+#include <linux/dma-noncoherent.h>

#include <asm/cpuinfo.h>
#include <asm/spr_defs.h>
@@ -80,10 +78,9 @@ page_clear_nocache(pte_t *pte, unsigned long addr,
* is being ignored for now; uncached but write-combined memory is a
* missing feature of the OR1K.
*/
-static void *
-or1k_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp,
- unsigned long attrs)
+void *
+arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
unsigned long va;
void *page;
@@ -115,9 +112,9 @@ or1k_dma_alloc(struct device *dev, size_t size,
return (void *)va;
}

-static void
-or1k_dma_free(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_handle, unsigned long attrs)
+void
+arch_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
{
unsigned long va = (unsigned long)vaddr;
struct mm_walk walk = {
@@ -133,18 +130,11 @@ or1k_dma_free(struct device *dev, size_t size, void *vaddr,
free_pages_exact(vaddr, size);
}

-static dma_addr_t
-or1k_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t addr, size_t size,
+ enum dma_data_direction dir)
{
- unsigned long cl;
- dma_addr_t addr = page_to_phys(page) + offset;
struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()];
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- return addr;
+ unsigned long cl;

switch (dir) {
case DMA_TO_DEVICE:
@@ -167,83 +157,4 @@ or1k_map_page(struct device *dev, struct page *page,
*/
break;
}
-
- return addr;
}
-
-static void
-or1k_unmap_page(struct device *dev, dma_addr_t dma_handle,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- /* Nothing special to do here... */
-}
-
-static int
-or1k_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- s->dma_address = or1k_map_page(dev, sg_page(s), s->offset,
- s->length, dir, 0);
- }
-
- return nents;
-}
-
-static void
-or1k_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- or1k_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, 0);
- }
-}
-
-static void
-or1k_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction dir)
-{
- unsigned long cl;
- dma_addr_t addr = dma_handle;
- struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()];
-
- /* Invalidate the dcache for the requested range */
- for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size)
- mtspr(SPR_DCBIR, cl);
-}
-
-static void
-or1k_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction dir)
-{
- unsigned long cl;
- dma_addr_t addr = dma_handle;
- struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()];
-
- /* Flush the dcache for the requested range */
- for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size)
- mtspr(SPR_DCBFR, cl);
-}
-
-const struct dma_map_ops or1k_dma_map_ops = {
- .alloc = or1k_dma_alloc,
- .free = or1k_dma_free,
- .map_page = or1k_map_page,
- .unmap_page = or1k_unmap_page,
- .map_sg = or1k_map_sg,
- .unmap_sg = or1k_unmap_sg,
- .sync_single_for_cpu = or1k_sync_single_for_cpu,
- .sync_single_for_device = or1k_sync_single_for_device,
-};
-EXPORT_SYMBOL(or1k_dma_map_ops);
--
2.17.0


2018-04-20 08:11:11

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 16/22] sh: simplify get_arch_dma_ops

Remove the indirection through the dma_ops variable, and just return
nommu_dma_ops directly from get_arch_dma_ops.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/include/asm/dma-mapping.h | 5 ++---
arch/sh/kernel/dma-nommu.c | 8 +-------
arch/sh/mm/consistent.c | 3 ---
arch/sh/mm/init.c | 10 ----------
4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
index 41167931e5d9..149e71f95be7 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -2,12 +2,11 @@
#ifndef __ASM_SH_DMA_MAPPING_H
#define __ASM_SH_DMA_MAPPING_H

-extern const struct dma_map_ops *dma_ops;
-extern void no_iommu_init(void);
+extern const struct dma_map_ops nommu_dma_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
- return dma_ops;
+ return &nommu_dma_ops;
}

extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 2077cfe73cc6..c0fff676e2e4 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -76,10 +76,4 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
#endif
};
-
-void __init no_iommu_init(void)
-{
- if (dma_ops)
- return;
- dma_ops = &nommu_dma_ops;
-}
+EXPORT_SYMBOL(nommu_dma_ops);
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index a7bff3f29c2b..af2f28572119 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -20,9 +20,6 @@
#include <asm/cacheflush.h>
#include <asm/addrspace.h>

-const struct dma_map_ops *dma_ops;
-EXPORT_SYMBOL(dma_ops);
-
void *dma_generic_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs)
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index ce0bbaa7e404..32e09f03e6bf 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -395,22 +395,12 @@ void __init paging_init(void)
free_area_init_nodes(max_zone_pfns);
}

-/*
- * Early initialization for any I/O MMUs we might have.
- */
-static void __init iommu_init(void)
-{
- no_iommu_init();
-}
-
unsigned int mem_init_done = 0;

void __init mem_init(void)
{
pg_data_t *pgdat;

- iommu_init();
-
high_memory = NULL;
for_each_online_pgdat(pgdat)
high_memory = max_t(void *, high_memory,
--
2.17.0


2018-04-20 08:11:51

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 14/22] nios2: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/nios2/Kconfig | 3 +
arch/nios2/include/asm/Kbuild | 1 +
arch/nios2/include/asm/dma-mapping.h | 20 ----
arch/nios2/mm/dma-mapping.c | 139 +++------------------------
4 files changed, 17 insertions(+), 146 deletions(-)
delete mode 100644 arch/nios2/include/asm/dma-mapping.h

diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index 3d4ec88f1db1..92035042cf62 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -1,6 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
config NIOS2
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select DMA_NONCOHERENT_OPS
select TIMER_OF
select GENERIC_ATOMIC64
select GENERIC_CLOCKEVENTS
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index d232da2cbb38..24f6ee1ee69b 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -8,6 +8,7 @@ generic-y += current.h
generic-y += device.h
generic-y += div64.h
generic-y += dma.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/nios2/include/asm/dma-mapping.h b/arch/nios2/include/asm/dma-mapping.h
deleted file mode 100644
index 6ceb92251da0..000000000000
--- a/arch/nios2/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- * Copyright (C) 2011 Tobias Klauser <[email protected]>
- * Copyright (C) 2009 Wind River Systems Inc
- *
- * This file is subject to the terms and conditions of the GNU General
- * Public License. See the file COPYING in the main directory of this
- * archive for more details.
- */
-
-#ifndef _ASM_NIOS2_DMA_MAPPING_H
-#define _ASM_NIOS2_DMA_MAPPING_H
-
-extern const struct dma_map_ops nios2_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &nios2_dma_ops;
-}
-
-#endif /* _ASM_NIOS2_DMA_MAPPING_H */
diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c
index 4be815519dd4..4af9e5b5ba1c 100644
--- a/arch/nios2/mm/dma-mapping.c
+++ b/arch/nios2/mm/dma-mapping.c
@@ -12,18 +12,18 @@

#include <linux/types.h>
#include <linux/mm.h>
-#include <linux/export.h>
#include <linux/string.h>
-#include <linux/scatterlist.h>
#include <linux/dma-mapping.h>
#include <linux/io.h>
#include <linux/cache.h>
#include <asm/cacheflush.h>

-static inline void __dma_sync_for_device(void *vaddr, size_t size,
- enum dma_data_direction direction)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- switch (direction) {
+ void *vaddr = phys_to_virt(paddr);
+
+ switch (dir) {
case DMA_FROM_DEVICE:
invalidate_dcache_range((unsigned long)vaddr,
(unsigned long)(vaddr + size));
@@ -42,10 +42,12 @@ static inline void __dma_sync_for_device(void *vaddr, size_t size,
}
}

-static inline void __dma_sync_for_cpu(void *vaddr, size_t size,
- enum dma_data_direction direction)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- switch (direction) {
+ void *vaddr = phys_to_virt(paddr);
+
+ switch (dir) {
case DMA_BIDIRECTIONAL:
case DMA_FROM_DEVICE:
invalidate_dcache_range((unsigned long)vaddr,
@@ -58,8 +60,8 @@ static inline void __dma_sync_for_cpu(void *vaddr, size_t size,
}
}

-static void *nios2_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
void *ret;

@@ -80,125 +82,10 @@ static void *nios2_dma_alloc(struct device *dev, size_t size,
return ret;
}

-static void nios2_dma_free(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
unsigned long addr = (unsigned long) CAC_ADDR((unsigned long) vaddr);

free_pages(addr, get_order(size));
}
-
-static int nios2_dma_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
- int i;
-
- for_each_sg(sg, sg, nents, i) {
- void *addr = sg_virt(sg);
-
- if (!addr)
- continue;
-
- sg->dma_address = sg_phys(sg);
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- __dma_sync_for_device(addr, sg->length, direction);
- }
-
- return nents;
-}
-
-static dma_addr_t nios2_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction direction,
- unsigned long attrs)
-{
- void *addr = page_address(page) + offset;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- __dma_sync_for_device(addr, size, direction);
-
- return page_to_phys(page) + offset;
-}
-
-static void nios2_dma_unmap_page(struct device *dev, dma_addr_t dma_address,
- size_t size, enum dma_data_direction direction,
- unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction);
-}
-
-static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nhwentries, enum dma_data_direction direction,
- unsigned long attrs)
-{
- void *addr;
- int i;
-
- if (direction == DMA_TO_DEVICE)
- return;
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- return;
-
- for_each_sg(sg, sg, nhwentries, i) {
- addr = sg_virt(sg);
- if (addr)
- __dma_sync_for_cpu(addr, sg->length, direction);
- }
-}
-
-static void nios2_dma_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- __dma_sync_for_cpu(phys_to_virt(dma_handle), size, direction);
-}
-
-static void nios2_dma_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- __dma_sync_for_device(phys_to_virt(dma_handle), size, direction);
-}
-
-static void nios2_dma_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sg, int nelems,
- enum dma_data_direction direction)
-{
- int i;
-
- /* Make sure that gcc doesn't leave the empty loop body. */
- for_each_sg(sg, sg, nelems, i)
- __dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
-}
-
-static void nios2_dma_sync_sg_for_device(struct device *dev,
- struct scatterlist *sg, int nelems,
- enum dma_data_direction direction)
-{
- int i;
-
- /* Make sure that gcc doesn't leave the empty loop body. */
- for_each_sg(sg, sg, nelems, i)
- __dma_sync_for_device(sg_virt(sg), sg->length, direction);
-
-}
-
-const struct dma_map_ops nios2_dma_ops = {
- .alloc = nios2_dma_alloc,
- .free = nios2_dma_free,
- .map_page = nios2_dma_map_page,
- .unmap_page = nios2_dma_unmap_page,
- .map_sg = nios2_dma_map_sg,
- .unmap_sg = nios2_dma_unmap_sg,
- .sync_single_for_device = nios2_dma_sync_single_for_device,
- .sync_single_for_cpu = nios2_dma_sync_single_for_cpu,
- .sync_sg_for_cpu = nios2_dma_sync_sg_for_cpu,
- .sync_sg_for_device = nios2_dma_sync_sg_for_device,
-};
-EXPORT_SYMBOL(nios2_dma_ops);
--
2.17.0


2018-04-20 08:12:42

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/22] microblaze: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

This removes the direction-based optimizations in
sync_{single,sg}_for_{cpu,device} which were marked untestested and
do not match the usually very well tested {un,}map_{single,sg}
implementations.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/microblaze/Kconfig | 4 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/dma-mapping.h | 28 -----
arch/microblaze/include/asm/pgtable.h | 2 -
arch/microblaze/kernel/dma.c | 144 ++--------------------
arch/microblaze/mm/consistent.c | 9 +-
6 files changed, 22 insertions(+), 166 deletions(-)
delete mode 100644 arch/microblaze/include/asm/dma-mapping.h

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 3817a3e2146c..91b9b999eb18 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -1,6 +1,8 @@
config MICROBLAZE
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_GCOV_PROFILE_ALL
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select ARCH_WANT_IPC_PARSE_VERSION
@@ -8,6 +10,8 @@ config MICROBLAZE
select TIMER_OF
select CLONE_BACKWARDS3
select COMMON_CLK
+ select DMA_NONCOHERENT_OPS
+ select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_DEVICES
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 3c80a5a308ed..8d3e71f43a3e 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += bug.h
generic-y += bugs.h
generic-y += device.h
generic-y += div64.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/microblaze/include/asm/dma-mapping.h b/arch/microblaze/include/asm/dma-mapping.h
deleted file mode 100644
index add50c1373bf..000000000000
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Implements the generic device dma API for microblaze and the pci
- *
- * Copyright (C) 2009-2010 Michal Simek <[email protected]>
- * Copyright (C) 2009-2010 PetaLogix
- *
- * This file is subject to the terms and conditions of the GNU General
- * Public License. See the file COPYING in the main directory of this
- * archive for more details.
- *
- * This file is base on powerpc and x86 dma-mapping.h versions
- * Copyright (C) 2004 IBM
- */
-
-#ifndef _ASM_MICROBLAZE_DMA_MAPPING_H
-#define _ASM_MICROBLAZE_DMA_MAPPING_H
-
-/*
- * Available generic sets of operations
- */
-extern const struct dma_map_ops dma_nommu_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &dma_nommu_ops;
-}
-
-#endif /* _ASM_MICROBLAZE_DMA_MAPPING_H */
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index e53b8532353c..5174733fb489 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -551,8 +551,6 @@ void __init *early_get_page(void);

extern unsigned long ioremap_bot, ioremap_base;

-void *consistent_alloc(gfp_t gfp, size_t size, dma_addr_t *dma_handle);
-void consistent_free(size_t size, void *vaddr);
void consistent_sync(void *vaddr, size_t size, int direction);
void consistent_sync_page(struct page *page, unsigned long offset,
size_t size, int direction);
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index 3145e7dc8ab1..71032cf64669 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -8,29 +8,15 @@
*/

#include <linux/device.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/gfp.h>
#include <linux/dma-debug.h>
#include <linux/export.h>
#include <linux/bug.h>
#include <asm/cacheflush.h>

-static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
- return consistent_alloc(flag, size, dma_handle);
-}
-
-static void dma_nommu_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- consistent_free(size, vaddr);
-}
-
-static inline void __dma_sync(unsigned long paddr,
- size_t size, enum dma_data_direction direction)
+static void __dma_sync(struct device *dev, phys_addr_t paddr, size_t size,
+ enum dma_data_direction direction)
{
switch (direction) {
case DMA_TO_DEVICE:
@@ -45,113 +31,21 @@ static inline void __dma_sync(unsigned long paddr,
}
}

-static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- struct scatterlist *sg;
- int i;
-
- /* FIXME this part of code is untested */
- for_each_sg(sgl, sg, nents, i) {
- sg->dma_address = sg_phys(sg);
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- __dma_sync(sg_phys(sg), sg->length, direction);
- }
-
- return nents;
-}
-
-static inline dma_addr_t dma_nommu_map_page(struct device *dev,
- struct page *page,
- unsigned long offset,
- size_t size,
- enum dma_data_direction direction,
- unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- __dma_sync(page_to_phys(page) + offset, size, direction);
- return page_to_phys(page) + offset;
+ __dma_sync(dev, paddr, size, dir);
}

-static inline void dma_nommu_unmap_page(struct device *dev,
- dma_addr_t dma_address,
- size_t size,
- enum dma_data_direction direction,
- unsigned long attrs)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
-/* There is not necessary to do cache cleanup
- *
- * phys_to_virt is here because in __dma_sync_page is __virt_to_phys and
- * dma_address is physical address
- */
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- __dma_sync(dma_address, size, direction);
+ __dma_sync(dev, paddr, size, dir);
}

-static inline void
-dma_nommu_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- /*
- * It's pointless to flush the cache as the memory segment
- * is given to the CPU
- */
-
- if (direction == DMA_FROM_DEVICE)
- __dma_sync(dma_handle, size, direction);
-}
-
-static inline void
-dma_nommu_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- /*
- * It's pointless to invalidate the cache if the device isn't
- * supposed to write to the relevant region
- */
-
- if (direction == DMA_TO_DEVICE)
- __dma_sync(dma_handle, size, direction);
-}
-
-static inline void
-dma_nommu_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sgl, int nents,
- enum dma_data_direction direction)
-{
- struct scatterlist *sg;
- int i;
-
- /* FIXME this part of code is untested */
- if (direction == DMA_FROM_DEVICE)
- for_each_sg(sgl, sg, nents, i)
- __dma_sync(sg->dma_address, sg->length, direction);
-}
-
-static inline void
-dma_nommu_sync_sg_for_device(struct device *dev,
- struct scatterlist *sgl, int nents,
- enum dma_data_direction direction)
-{
- struct scatterlist *sg;
- int i;
-
- /* FIXME this part of code is untested */
- if (direction == DMA_TO_DEVICE)
- for_each_sg(sgl, sg, nents, i)
- __dma_sync(sg->dma_address, sg->length, direction);
-}
-
-static
-int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
- void *cpu_addr, dma_addr_t handle, size_t size,
- unsigned long attrs)
+int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t handle, size_t size,
+ unsigned long attrs)
{
#ifdef CONFIG_MMU
unsigned long user_count = vma_pages(vma);
@@ -170,17 +64,3 @@ int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
return -ENXIO;
#endif
}
-
-const struct dma_map_ops dma_nommu_ops = {
- .alloc = dma_nommu_alloc_coherent,
- .free = dma_nommu_free_coherent,
- .mmap = dma_nommu_mmap_coherent,
- .map_sg = dma_nommu_map_sg,
- .map_page = dma_nommu_map_page,
- .unmap_page = dma_nommu_unmap_page,
- .sync_single_for_cpu = dma_nommu_sync_single_for_cpu,
- .sync_single_for_device = dma_nommu_sync_single_for_device,
- .sync_sg_for_cpu = dma_nommu_sync_sg_for_cpu,
- .sync_sg_for_device = dma_nommu_sync_sg_for_device,
-};
-EXPORT_SYMBOL(dma_nommu_ops);
diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c
index b06c3a7faf20..b9a9c8c3397b 100644
--- a/arch/microblaze/mm/consistent.c
+++ b/arch/microblaze/mm/consistent.c
@@ -33,6 +33,7 @@
#include <linux/pci.h>
#include <linux/interrupt.h>
#include <linux/gfp.h>
+#include <linux/dma-noncoherent.h>

#include <asm/pgalloc.h>
#include <linux/io.h>
@@ -59,7 +60,8 @@
* uncached region. This will no doubt cause big problems if memory allocated
* here is not also freed properly. -- JW
*/
-void *consistent_alloc(gfp_t gfp, size_t size, dma_addr_t *dma_handle)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
unsigned long order, vaddr;
void *ret;
@@ -154,7 +156,6 @@ void *consistent_alloc(gfp_t gfp, size_t size, dma_addr_t *dma_handle)

return ret;
}
-EXPORT_SYMBOL(consistent_alloc);

#ifdef CONFIG_MMU
static pte_t *consistent_virt_to_pte(void *vaddr)
@@ -178,7 +179,8 @@ unsigned long consistent_virt_to_pfn(void *vaddr)
/*
* free page(s) as defined by the above mapping.
*/
-void consistent_free(size_t size, void *vaddr)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_addr, unsigned long attrs)
{
struct page *page;

@@ -218,7 +220,6 @@ void consistent_free(size_t size, void *vaddr)
flush_tlb_all();
#endif
}
-EXPORT_SYMBOL(consistent_free);

/*
* make an area consistent.
--
2.17.0


2018-04-20 08:13:24

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 09/22] hexagon: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

This removes the previous sync_single_for_cpu implementation, which looks
bogus given that no syncing is happening in the similar but more
important unmap_single case.

This adds the previously missing sync_sg_for_device implementation that
matches the pre-existing sync_single_for_device implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/hexagon/Kconfig | 2 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/dma-mapping.h | 40 -------
arch/hexagon/kernel/dma.c | 143 ++-----------------------
4 files changed, 11 insertions(+), 175 deletions(-)
delete mode 100644 arch/hexagon/include/asm/dma-mapping.h

diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 37adb2003033..bcbdcb32935c 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -4,6 +4,7 @@ comment "Linux Kernel Configuration for Hexagon"

config HEXAGON
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select HAVE_OPROFILE
# Other pending projects/to-do items.
# select HAVE_REGS_AND_STACK_ACCESS_API
@@ -28,6 +29,7 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
+ select DMA_NONCOHERENT_OPS
---help---
Qualcomm Hexagon is a processor architecture designed for high
performance and low power across a wide variety of applications.
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index e9743f689fb8..843a8086e980 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += bugs.h
generic-y += current.h
generic-y += device.h
generic-y += div64.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += extable.h
generic-y += fb.h
diff --git a/arch/hexagon/include/asm/dma-mapping.h b/arch/hexagon/include/asm/dma-mapping.h
deleted file mode 100644
index 263f6acbfb0f..000000000000
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * DMA operations for the Hexagon architecture
- *
- * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
- * 02110-1301, USA.
- */
-
-#ifndef _ASM_DMA_MAPPING_H
-#define _ASM_DMA_MAPPING_H
-
-#include <linux/types.h>
-#include <linux/cache.h>
-#include <linux/mm.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-debug.h>
-#include <asm/io.h>
-
-struct device;
-
-extern const struct dma_map_ops *dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return dma_ops;
-}
-
-#endif
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index 77459df34e2e..ffc4ae8e126f 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -18,32 +18,19 @@
* 02110-1301, USA.
*/

-#include <linux/dma-mapping.h>
-#include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
#include <linux/bootmem.h>
#include <linux/genalloc.h>
-#include <asm/dma-mapping.h>
#include <linux/module.h>
#include <asm/page.h>

-#define HEXAGON_MAPPING_ERROR 0
-
-const struct dma_map_ops *dma_ops;
-EXPORT_SYMBOL(dma_ops);
-
-static inline void *dma_addr_to_virt(dma_addr_t dma_addr)
-{
- return phys_to_virt((unsigned long) dma_addr);
-}
-
static struct gen_pool *coherent_pool;


/* Allocates from a pool of uncached memory that was reserved at boot time */

-static void *hexagon_dma_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_addr, gfp_t flag,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_addr,
+ gfp_t flag, unsigned long attrs)
{
void *ret;

@@ -75,58 +62,17 @@ static void *hexagon_dma_alloc_coherent(struct device *dev, size_t size,
return ret;
}

-static void hexagon_free_coherent(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_addr, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_addr, unsigned long attrs)
{
gen_pool_free(coherent_pool, (unsigned long) vaddr, size);
}

-static int check_addr(const char *name, struct device *hwdev,
- dma_addr_t bus, size_t size)
-{
- if (hwdev && hwdev->dma_mask && !dma_capable(hwdev, bus, size)) {
- if (*hwdev->dma_mask >= DMA_BIT_MASK(32))
- printk(KERN_ERR
- "%s: overflow %Lx+%zu of device mask %Lx\n",
- name, (long long)bus, size,
- (long long)*hwdev->dma_mask);
- return 0;
- }
- return 1;
-}
-
-static int hexagon_map_sg(struct device *hwdev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- struct scatterlist *s;
- int i;
-
- WARN_ON(nents == 0 || sg[0].length == 0);
-
- for_each_sg(sg, s, nents, i) {
- s->dma_address = sg_phys(s);
- if (!check_addr("map_sg", hwdev, s->dma_address, s->length))
- return 0;
-
- s->dma_length = s->length;
+ void *addr = phys_to_virt(paddr);

- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- flush_dcache_range(dma_addr_to_virt(s->dma_address),
- dma_addr_to_virt(s->dma_address + s->length));
- }
-
- return nents;
-}
-
-/*
- * address is virtual
- */
-static inline void dma_sync(void *addr, size_t size,
- enum dma_data_direction dir)
-{
switch (dir) {
case DMA_TO_DEVICE:
hexagon_clean_dcache_range((unsigned long) addr,
@@ -144,76 +90,3 @@ static inline void dma_sync(void *addr, size_t size,
BUG();
}
}
-
-/**
- * hexagon_map_page() - maps an address for device DMA
- * @dev: pointer to DMA device
- * @page: pointer to page struct of DMA memory
- * @offset: offset within page
- * @size: size of memory to map
- * @dir: transfer direction
- * @attrs: pointer to DMA attrs (not used)
- *
- * Called to map a memory address to a DMA address prior
- * to accesses to/from device.
- *
- * We don't particularly have many hoops to jump through
- * so far. Straight translation between phys and virtual.
- *
- * DMA is not cache coherent so sync is necessary; this
- * seems to be a convenient place to do it.
- *
- */
-static dma_addr_t hexagon_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t bus = page_to_phys(page) + offset;
- WARN_ON(size == 0);
-
- if (!check_addr("map_single", dev, bus, size))
- return HEXAGON_MAPPING_ERROR;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- dma_sync(dma_addr_to_virt(bus), size, dir);
-
- return bus;
-}
-
-static void hexagon_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction dir)
-{
- dma_sync(dma_addr_to_virt(dma_handle), size, dir);
-}
-
-static void hexagon_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction dir)
-{
- dma_sync(dma_addr_to_virt(dma_handle), size, dir);
-}
-
-static int hexagon_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
- return dma_addr == HEXAGON_MAPPING_ERROR;
-}
-
-const struct dma_map_ops hexagon_dma_ops = {
- .alloc = hexagon_dma_alloc_coherent,
- .free = hexagon_free_coherent,
- .map_sg = hexagon_map_sg,
- .map_page = hexagon_map_page,
- .sync_single_for_cpu = hexagon_sync_single_for_cpu,
- .sync_single_for_device = hexagon_sync_single_for_device,
- .mapping_error = hexagon_mapping_error,
-};
-
-void __init hexagon_dma_init(void)
-{
- if (dma_ops)
- return;
-
- dma_ops = &hexagon_dma_ops;
-}
--
2.17.0


2018-04-20 08:13:36

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 12/22] microblaze: remove the consistent_sync and consistent_sync_page

Both unused.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/microblaze/include/asm/pgtable.h | 3 --
arch/microblaze/mm/consistent.c | 45 ---------------------------
2 files changed, 48 deletions(-)

diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index 5174733fb489..ac6a37bd41ec 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -551,9 +551,6 @@ void __init *early_get_page(void);

extern unsigned long ioremap_bot, ioremap_base;

-void consistent_sync(void *vaddr, size_t size, int direction);
-void consistent_sync_page(struct page *page, unsigned long offset,
- size_t size, int direction);
unsigned long consistent_virt_to_pfn(void *vaddr);

void setup_memory(void);
diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c
index b9a9c8c3397b..c9a278ac795a 100644
--- a/arch/microblaze/mm/consistent.c
+++ b/arch/microblaze/mm/consistent.c
@@ -220,48 +220,3 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
flush_tlb_all();
#endif
}
-
-/*
- * make an area consistent.
- */
-void consistent_sync(void *vaddr, size_t size, int direction)
-{
- unsigned long start;
- unsigned long end;
-
- start = (unsigned long)vaddr;
-
- /* Convert start address back down to unshadowed memory region */
-#ifdef CONFIG_XILINX_UNCACHED_SHADOW
- start &= ~UNCACHED_SHADOW_MASK;
-#endif
- end = start + size;
-
- switch (direction) {
- case PCI_DMA_NONE:
- BUG();
- case PCI_DMA_FROMDEVICE: /* invalidate only */
- invalidate_dcache_range(start, end);
- break;
- case PCI_DMA_TODEVICE: /* writeback only */
- flush_dcache_range(start, end);
- break;
- case PCI_DMA_BIDIRECTIONAL: /* writeback and invalidate */
- flush_dcache_range(start, end);
- break;
- }
-}
-EXPORT_SYMBOL(consistent_sync);
-
-/*
- * consistent_sync_page makes memory consistent. identical
- * to consistent_sync, but takes a struct page instead of a
- * virtual address
- */
-void consistent_sync_page(struct page *page, unsigned long offset,
- size_t size, int direction)
-{
- unsigned long start = (unsigned long)page_address(page) + offset;
- consistent_sync((void *)start, size, direction);
-}
-EXPORT_SYMBOL(consistent_sync_page);
--
2.17.0


2018-04-20 08:14:56

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 10/22] m68k: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/m68k/Kconfig | 2 +
arch/m68k/include/asm/Kbuild | 1 +
arch/m68k/include/asm/dma-mapping.h | 12 -----
arch/m68k/kernel/dma.c | 68 ++++-------------------------
4 files changed, 11 insertions(+), 72 deletions(-)
delete mode 100644 arch/m68k/include/asm/dma-mapping.h

diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 785612b576f7..3f61327da2d5 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -2,6 +2,7 @@
config M68K
bool
default y
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE if HAS_DMA
select ARCH_MIGHT_HAVE_PC_PARPORT if ISA
select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select HAVE_IDE
@@ -24,6 +25,7 @@ config M68K
select MODULES_USE_ELF_RELA
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select DMA_NONCOHERENT_OPS if HAS_DMA

config CPU_BIG_ENDIAN
def_bool y
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 88a9d27df1ac..a853c00f1374 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -1,5 +1,6 @@
generic-y += barrier.h
generic-y += device.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/m68k/include/asm/dma-mapping.h b/arch/m68k/include/asm/dma-mapping.h
deleted file mode 100644
index e3722ed04fbb..000000000000
--- a/arch/m68k/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _M68K_DMA_MAPPING_H
-#define _M68K_DMA_MAPPING_H
-
-extern const struct dma_map_ops m68k_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &m68k_dma_ops;
-}
-
-#endif /* _M68K_DMA_MAPPING_H */
diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c
index c01b9b8f97bf..3d561c577d35 100644
--- a/arch/m68k/kernel/dma.c
+++ b/arch/m68k/kernel/dma.c
@@ -6,7 +6,7 @@

#undef DEBUG

-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/scatterlist.h>
@@ -18,7 +18,7 @@

#if defined(CONFIG_MMU) && !defined(CONFIG_COLDFIRE)

-static void *m68k_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
gfp_t flag, unsigned long attrs)
{
struct page *page, **map;
@@ -61,7 +61,7 @@ static void *m68k_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
return addr;
}

-static void m68k_dma_free(struct device *dev, size_t size, void *addr,
+void arch_dma_free(struct device *dev, size_t size, void *addr,
dma_addr_t handle, unsigned long attrs)
{
pr_debug("dma_free_coherent: %p, %x\n", addr, handle);
@@ -72,8 +72,8 @@ static void m68k_dma_free(struct device *dev, size_t size, void *addr,

#include <asm/cacheflush.h>

-static void *m68k_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
void *ret;

@@ -88,7 +88,7 @@ static void *m68k_dma_alloc(struct device *dev, size_t size,
return ret;
}

-static void m68k_dma_free(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
free_pages((unsigned long)vaddr, get_order(size));
@@ -96,8 +96,8 @@ static void m68k_dma_free(struct device *dev, size_t size, void *vaddr,

#endif /* CONFIG_MMU && !CONFIG_COLDFIRE */

-static void m68k_dma_sync_single_for_device(struct device *dev,
- dma_addr_t handle, size_t size, enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t handle,
+ size_t size, enum dma_data_direction dir)
{
switch (dir) {
case DMA_BIDIRECTIONAL:
@@ -113,55 +113,3 @@ static void m68k_dma_sync_single_for_device(struct device *dev,
break;
}
}
-
-static void m68k_dma_sync_sg_for_device(struct device *dev,
- struct scatterlist *sglist, int nents, enum dma_data_direction dir)
-{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sglist, sg, nents, i) {
- dma_sync_single_for_device(dev, sg->dma_address, sg->length,
- dir);
- }
-}
-
-static dma_addr_t m68k_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t handle = page_to_phys(page) + offset;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- dma_sync_single_for_device(dev, handle, size, dir);
-
- return handle;
-}
-
-static int m68k_dma_map_sg(struct device *dev, struct scatterlist *sglist,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sglist, sg, nents, i) {
- sg->dma_address = sg_phys(sg);
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- dma_sync_single_for_device(dev, sg->dma_address, sg->length,
- dir);
- }
- return nents;
-}
-
-const struct dma_map_ops m68k_dma_ops = {
- .alloc = m68k_dma_alloc,
- .free = m68k_dma_free,
- .map_page = m68k_dma_map_page,
- .map_sg = m68k_dma_map_sg,
- .sync_single_for_device = m68k_dma_sync_single_for_device,
- .sync_sg_for_device = m68k_dma_sync_sg_for_device,
-};
-EXPORT_SYMBOL(m68k_dma_ops);
--
2.17.0


2018-04-20 08:16:41

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 07/22] arm-nommu: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation for
the nommu dma map implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/arc/Kconfig | 1 +
arch/arm/Kconfig | 4 +
arch/arm/mm/dma-mapping-nommu.c | 139 +++++---------------------------
3 files changed, 23 insertions(+), 121 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 89d47eac18b2..3a492a9aeaad 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -9,6 +9,7 @@
config ARC
def_bool y
select ARC_TIMERS
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SG_CHAIN
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f91f69174630..eacc45ac4e77 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -12,6 +12,8 @@ config ARM
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU
+ select ARCH_HAS_SYNC_DMA_FOR_CPU if !MMU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE if !MMU
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
@@ -27,6 +29,8 @@ config ARM
select CPU_PM if (SUSPEND || CPU_IDLE)
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
select DMA_DIRECT_OPS if !MMU
+ select DMA_NONCOHERENT_OPS if !MMU
+ select DMA_NONCOHERENT_MMAP if !MMU
select EDAC_SUPPORT
select EDAC_ATOMIC_SCRUB
select GENERIC_ALLOCATOR
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index f448a0663b10..a74ed6632982 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -12,6 +12,7 @@
#include <linux/export.h>
#include <linux/mm.h>
#include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
#include <linux/scatterlist.h>

#include <asm/cachetype.h>
@@ -26,18 +27,16 @@
* - MMU/MPU is off
* - cpu is v7m w/o cache support
* - device is coherent
- * otherwise arm_nommu_dma_ops is used.
+ * otherwise dma_noncoherent_ops is used.
*
- * arm_nommu_dma_ops rely on consistent DMA memory (please, refer to
+ * dma_noncoherent_ops rely on consistent DMA memory (please, refer to
* [1] on how to declare such memory).
*
* [1] Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
*/

-static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp,
- unsigned long attrs)
-
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
void *ret;

@@ -65,9 +64,8 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
return ret;
}

-static void arm_nommu_dma_free(struct device *dev, size_t size,
- void *cpu_addr, dma_addr_t dma_addr,
- unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_addr, unsigned long attrs)
{
if (attrs & DMA_ATTR_NON_CONSISTENT) {
dma_direct_free(dev, size, cpu_addr, dma_addr, attrs);
@@ -81,9 +79,9 @@ static void arm_nommu_dma_free(struct device *dev, size_t size,
return;
}

-static int arm_nommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
- void *cpu_addr, dma_addr_t dma_addr, size_t size,
- unsigned long attrs)
+int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size,
+ unsigned long attrs)
{
int ret;

@@ -93,9 +91,8 @@ static int arm_nommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
}

-
-static void __dma_page_cpu_to_dev(phys_addr_t paddr, size_t size,
- enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
dmac_map_area(__va(paddr), size, dir);

@@ -105,8 +102,8 @@ static void __dma_page_cpu_to_dev(phys_addr_t paddr, size_t size,
outer_clean_range(paddr, paddr + size);
}

-static void __dma_page_dev_to_cpu(phys_addr_t paddr, size_t size,
- enum dma_data_direction dir)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
if (dir != DMA_TO_DEVICE) {
outer_inv_range(paddr, paddr + size);
@@ -114,110 +111,9 @@ static void __dma_page_dev_to_cpu(phys_addr_t paddr, size_t size,
}
}

-static dma_addr_t arm_nommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t handle = page_to_phys(page) + offset;
-
- __dma_page_cpu_to_dev(handle, size, dir);
-
- return handle;
-}
-
-static void arm_nommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- __dma_page_dev_to_cpu(handle, size, dir);
-}
-
-
-static int arm_nommu_dma_map_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sgl, sg, nents, i) {
- sg_dma_address(sg) = sg_phys(sg);
- sg_dma_len(sg) = sg->length;
- __dma_page_cpu_to_dev(sg_dma_address(sg), sg_dma_len(sg), dir);
- }
-
- return nents;
-}
-
-static void arm_nommu_dma_unmap_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sgl, sg, nents, i)
- __dma_page_dev_to_cpu(sg_dma_address(sg), sg_dma_len(sg), dir);
-}
-
-static void arm_nommu_dma_sync_single_for_device(struct device *dev,
- dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
- __dma_page_cpu_to_dev(handle, size, dir);
-}
-
-static void arm_nommu_dma_sync_single_for_cpu(struct device *dev,
- dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
- __dma_page_cpu_to_dev(handle, size, dir);
-}
-
-static void arm_nommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sgl, sg, nents, i)
- __dma_page_cpu_to_dev(sg_dma_address(sg), sg_dma_len(sg), dir);
-}
-
-static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sgl, sg, nents, i)
- __dma_page_dev_to_cpu(sg_dma_address(sg), sg_dma_len(sg), dir);
-}
-
-const struct dma_map_ops arm_nommu_dma_ops = {
- .alloc = arm_nommu_dma_alloc,
- .free = arm_nommu_dma_free,
- .mmap = arm_nommu_dma_mmap,
- .map_page = arm_nommu_dma_map_page,
- .unmap_page = arm_nommu_dma_unmap_page,
- .map_sg = arm_nommu_dma_map_sg,
- .unmap_sg = arm_nommu_dma_unmap_sg,
- .sync_single_for_device = arm_nommu_dma_sync_single_for_device,
- .sync_single_for_cpu = arm_nommu_dma_sync_single_for_cpu,
- .sync_sg_for_device = arm_nommu_dma_sync_sg_for_device,
- .sync_sg_for_cpu = arm_nommu_dma_sync_sg_for_cpu,
-};
-EXPORT_SYMBOL(arm_nommu_dma_ops);
-
-static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
-{
- return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
-}
-
void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
{
- const struct dma_map_ops *dma_ops;
-
if (IS_ENABLED(CONFIG_CPU_V7M)) {
/*
* Cache support for v7m is optional, so can be treated as
@@ -233,9 +129,10 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : true;
}

- dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
-
- set_dma_ops(dev, dma_ops);
+ if (dev->archdata.dma_coherent)
+ set_dma_ops(dev, &dma_direct_ops);
+ else
+ set_dma_ops(dev, &dma_noncoherent_ops);
}

void arch_teardown_dma_ops(struct device *dev)
--
2.17.0


2018-04-20 08:16:57

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 08/22] c6x: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/c6x/Kconfig | 3 +
arch/c6x/include/asm/Kbuild | 1 +
arch/c6x/include/asm/dma-mapping.h | 28 ------
arch/c6x/include/asm/setup.h | 2 +
arch/c6x/kernel/Makefile | 2 +-
arch/c6x/kernel/dma.c | 138 -----------------------------
arch/c6x/mm/dma-coherent.c | 40 ++++++++-
7 files changed, 44 insertions(+), 170 deletions(-)
delete mode 100644 arch/c6x/include/asm/dma-mapping.h
delete mode 100644 arch/c6x/kernel/dma.c

diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index c6b4dd1418b4..8d7a3b38810f 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -6,7 +6,10 @@

config C6X
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select CLKDEV_LOOKUP
+ select DMA_NONCOHERENT_OPS
select GENERIC_ATOMIC64
select GENERIC_IRQ_SHOW
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index fd4c840de837..434600e47662 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += current.h
generic-y += device.h
generic-y += div64.h
generic-y += dma.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/c6x/include/asm/dma-mapping.h b/arch/c6x/include/asm/dma-mapping.h
deleted file mode 100644
index 05daf1038111..000000000000
--- a/arch/c6x/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Port on Texas Instruments TMS320C6x architecture
- *
- * Copyright (C) 2004, 2009, 2010, 2011 Texas Instruments Incorporated
- * Author: Aurelien Jacquiot <[email protected]>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- */
-#ifndef _ASM_C6X_DMA_MAPPING_H
-#define _ASM_C6X_DMA_MAPPING_H
-
-extern const struct dma_map_ops c6x_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &c6x_dma_ops;
-}
-
-extern void coherent_mem_init(u32 start, u32 size);
-void *c6x_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
- gfp_t gfp, unsigned long attrs);
-void c6x_dma_free(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_handle, unsigned long attrs);
-
-#endif /* _ASM_C6X_DMA_MAPPING_H */
diff --git a/arch/c6x/include/asm/setup.h b/arch/c6x/include/asm/setup.h
index 852afb209afb..350f34debb19 100644
--- a/arch/c6x/include/asm/setup.h
+++ b/arch/c6x/include/asm/setup.h
@@ -28,5 +28,7 @@ extern unsigned char c6x_fuse_mac[6];
extern void machine_init(unsigned long dt_ptr);
extern void time_init(void);

+extern void coherent_mem_init(u32 start, u32 size);
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_C6X_SETUP_H */
diff --git a/arch/c6x/kernel/Makefile b/arch/c6x/kernel/Makefile
index 02f340d7b8fe..fbe74174de87 100644
--- a/arch/c6x/kernel/Makefile
+++ b/arch/c6x/kernel/Makefile
@@ -8,6 +8,6 @@ extra-y := head.o vmlinux.lds
obj-y := process.o traps.o irq.o signal.o ptrace.o
obj-y += setup.o sys_c6x.o time.o devicetree.o
obj-y += switch_to.o entry.o vectors.o c6x_ksyms.o
-obj-y += soc.o dma.o
+obj-y += soc.o

obj-$(CONFIG_MODULES) += module.o
diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c
deleted file mode 100644
index 31e1a9ec3a9c..000000000000
--- a/arch/c6x/kernel/dma.c
+++ /dev/null
@@ -1,138 +0,0 @@
-/*
- * Copyright (C) 2011 Texas Instruments Incorporated
- * Author: Mark Salter <[email protected]>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/module.h>
-#include <linux/dma-mapping.h>
-#include <linux/mm.h>
-#include <linux/mm_types.h>
-#include <linux/scatterlist.h>
-
-#include <asm/cacheflush.h>
-
-static void c6x_dma_sync(dma_addr_t handle, size_t size,
- enum dma_data_direction dir)
-{
- unsigned long paddr = handle;
-
- BUG_ON(!valid_dma_direction(dir));
-
- switch (dir) {
- case DMA_FROM_DEVICE:
- L2_cache_block_invalidate(paddr, paddr + size);
- break;
- case DMA_TO_DEVICE:
- L2_cache_block_writeback(paddr, paddr + size);
- break;
- case DMA_BIDIRECTIONAL:
- L2_cache_block_writeback_invalidate(paddr, paddr + size);
- break;
- default:
- break;
- }
-}
-
-static dma_addr_t c6x_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t handle = virt_to_phys(page_address(page) + offset);
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- c6x_dma_sync(handle, size, dir);
-
- return handle;
-}
-
-static void c6x_dma_unmap_page(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- c6x_dma_sync(handle, size, dir);
-}
-
-static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sglist, sg, nents, i) {
- sg->dma_address = sg_phys(sg);
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- c6x_dma_sync(sg->dma_address, sg->length, dir);
- }
-
- return nents;
-}
-
-static void c6x_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- struct scatterlist *sg;
- int i;
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- return;
-
- for_each_sg(sglist, sg, nents, i)
- c6x_dma_sync(sg_dma_address(sg), sg->length, dir);
-}
-
-static void c6x_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir)
-{
- c6x_dma_sync(handle, size, dir);
-
-}
-
-static void c6x_dma_sync_single_for_device(struct device *dev,
- dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
- c6x_dma_sync(handle, size, dir);
-
-}
-
-static void c6x_dma_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sglist, int nents,
- enum dma_data_direction dir)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sglist, sg, nents, i)
- c6x_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
- sg->length, dir);
-
-}
-
-static void c6x_dma_sync_sg_for_device(struct device *dev,
- struct scatterlist *sglist, int nents,
- enum dma_data_direction dir)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sglist, sg, nents, i)
- c6x_dma_sync_single_for_device(dev, sg_dma_address(sg),
- sg->length, dir);
-
-}
-
-const struct dma_map_ops c6x_dma_ops = {
- .alloc = c6x_dma_alloc,
- .free = c6x_dma_free,
- .map_page = c6x_dma_map_page,
- .unmap_page = c6x_dma_unmap_page,
- .map_sg = c6x_dma_map_sg,
- .unmap_sg = c6x_dma_unmap_sg,
- .sync_single_for_device = c6x_dma_sync_single_for_device,
- .sync_single_for_cpu = c6x_dma_sync_single_for_cpu,
- .sync_sg_for_device = c6x_dma_sync_sg_for_device,
- .sync_sg_for_cpu = c6x_dma_sync_sg_for_cpu,
-};
-EXPORT_SYMBOL(c6x_dma_ops);
diff --git a/arch/c6x/mm/dma-coherent.c b/arch/c6x/mm/dma-coherent.c
index 95e38ad27c69..d0a8e0c4b27e 100644
--- a/arch/c6x/mm/dma-coherent.c
+++ b/arch/c6x/mm/dma-coherent.c
@@ -19,10 +19,12 @@
#include <linux/bitops.h>
#include <linux/module.h>
#include <linux/interrupt.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/memblock.h>

+#include <asm/cacheflush.h>
#include <asm/page.h>
+#include <asm/setup.h>

/*
* DMA coherent memory management, can be redefined using the memdma=
@@ -73,7 +75,7 @@ static void __free_dma_pages(u32 addr, int order)
* Allocate DMA coherent memory space and return both the kernel
* virtual and DMA address for that space.
*/
-void *c6x_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
gfp_t gfp, unsigned long attrs)
{
u32 paddr;
@@ -98,7 +100,7 @@ void *c6x_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
/*
* Free DMA coherent memory as defined by the above mapping.
*/
-void c6x_dma_free(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
int order;
@@ -139,3 +141,35 @@ void __init coherent_mem_init(phys_addr_t start, u32 size)
dma_bitmap = phys_to_virt(bitmap_phys);
memset(dma_bitmap, 0, dma_pages * PAGE_SIZE);
}
+
+static void c6x_dma_sync(struct device *dev, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir)
+{
+ BUG_ON(!valid_dma_direction(dir));
+
+ switch (dir) {
+ case DMA_FROM_DEVICE:
+ L2_cache_block_invalidate(paddr, paddr + size);
+ break;
+ case DMA_TO_DEVICE:
+ L2_cache_block_writeback(paddr, paddr + size);
+ break;
+ case DMA_BIDIRECTIONAL:
+ L2_cache_block_writeback_invalidate(paddr, paddr + size);
+ break;
+ default:
+ break;
+ }
+}
+
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
+{
+ return c6x_dma_sync(dev, paddr, size, dir);
+}
+
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
+{
+ return c6x_dma_sync(dev, paddr, size, dir);
+}
--
2.17.0


2018-04-20 08:17:23

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 06/22] arc: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/arc/Kconfig | 4 +
arch/arc/include/asm/Kbuild | 1 +
arch/arc/include/asm/dma-mapping.h | 21 -----
arch/arc/mm/dma.c | 141 +++--------------------------
4 files changed, 19 insertions(+), 148 deletions(-)
delete mode 100644 arch/arc/include/asm/dma-mapping.h

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 7498aca4b887..89d47eac18b2 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -9,11 +9,15 @@
config ARC
def_bool y
select ARC_TIMERS
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SG_CHAIN
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
+ select DMA_NONCOHERENT_OPS
+ select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
select GENERIC_CLOCKEVENTS
select GENERIC_FIND_FIRST_BIT
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4bd5d4369e05..bbdcb955e18f 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -2,6 +2,7 @@
generic-y += bugs.h
generic-y += device.h
generic-y += div64.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += extable.h
generic-y += fb.h
diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h
deleted file mode 100644
index 7a16824bfe98..000000000000
--- a/arch/arc/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * DMA Mapping glue for ARC
- *
- * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (http://www.synopsys.com)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#ifndef ASM_ARC_DMA_MAPPING_H
-#define ASM_ARC_DMA_MAPPING_H
-
-extern const struct dma_map_ops arc_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &arc_dma_ops;
-}
-
-#endif
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 1dcc404b5aec..ecfeb5585cc7 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -16,13 +16,12 @@
* The default DMA address == Phy address which is 0x8000_0000 based.
*/

-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <asm/cache.h>
#include <asm/cacheflush.h>

-
-static void *arc_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
unsigned long order = get_order(size);
struct page *page;
@@ -89,7 +88,7 @@ static void *arc_dma_alloc(struct device *dev, size_t size,
return kvaddr;
}

-static void arc_dma_free(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
phys_addr_t paddr = dma_handle;
@@ -105,9 +104,9 @@ static void arc_dma_free(struct device *dev, size_t size, void *vaddr,
__free_pages(page, get_order(size));
}

-static int arc_dma_mmap(struct device *dev, struct vm_area_struct *vma,
- void *cpu_addr, dma_addr_t dma_addr, size_t size,
- unsigned long attrs)
+int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size,
+ unsigned long attrs)
{
unsigned long user_count = vma_pages(vma);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
@@ -135,7 +134,7 @@ static int arc_dma_mmap(struct device *dev, struct vm_area_struct *vma,
* CPU accesses page via normal paddr, thus needs to explicitly made
* consistent before each use
*/
-static void _dma_cache_sync(phys_addr_t paddr, size_t size,
+static void _dma_cache_sync(struct device *dev, phys_addr_t paddr size_t size,
enum dma_data_direction dir)
{
switch (dir) {
@@ -153,126 +152,14 @@ static void _dma_cache_sync(phys_addr_t paddr, size_t size,
}
}

-/*
- * arc_dma_map_page - map a portion of a page for streaming DMA
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed. The CPU
- * can regain ownership by calling dma_unmap_page().
- *
- * Note: while it takes struct page as arg, caller can "abuse" it to pass
- * a region larger than PAGE_SIZE, provided it is physically contiguous
- * and this still works correctly
- */
-static dma_addr_t arc_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- phys_addr_t paddr = page_to_phys(page) + offset;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- _dma_cache_sync(paddr, size, dir);
-
- return paddr;
-}
-
-/*
- * arc_dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- *
- * Note: historically this routine was not implemented for ARC
- */
-static void arc_dma_unmap_page(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- phys_addr_t paddr = handle;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- _dma_cache_sync(paddr, size, dir);
-}
-
-static int arc_dma_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i)
- s->dma_address = dma_map_page(dev, sg_page(s), s->offset,
- s->length, dir);
-
- return nents;
-}
-
-static void arc_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i)
- arc_dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir,
- attrs);
-}
-
-static void arc_dma_sync_single_for_cpu(struct device *dev,
- dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
-{
- _dma_cache_sync(dma_handle, size, DMA_FROM_DEVICE);
-}
-
-static void arc_dma_sync_single_for_device(struct device *dev,
- dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- _dma_cache_sync(dma_handle, size, DMA_TO_DEVICE);
+ return _dma_cache_sync(dev, paddr, size, dir);
}

-static void arc_dma_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sglist, int nelems,
- enum dma_data_direction dir)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sglist, sg, nelems, i)
- _dma_cache_sync(sg_phys(sg), sg->length, dir);
+ return _dma_cache_sync(dev, paddr, size, dir);
}
-
-static void arc_dma_sync_sg_for_device(struct device *dev,
- struct scatterlist *sglist, int nelems,
- enum dma_data_direction dir)
-{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sglist, sg, nelems, i)
- _dma_cache_sync(sg_phys(sg), sg->length, dir);
-}
-
-static int arc_dma_supported(struct device *dev, u64 dma_mask)
-{
- /* Support 32 bit DMA mask exclusively */
- return dma_mask == DMA_BIT_MASK(32);
-}
-
-const struct dma_map_ops arc_dma_ops = {
- .alloc = arc_dma_alloc,
- .free = arc_dma_free,
- .mmap = arc_dma_mmap,
- .map_page = arc_dma_map_page,
- .unmap_page = arc_dma_unmap_page,
- .map_sg = arc_dma_map_sg,
- .unmap_sg = arc_dma_unmap_sg,
- .sync_single_for_device = arc_dma_sync_single_for_device,
- .sync_single_for_cpu = arc_dma_sync_single_for_cpu,
- .sync_sg_for_cpu = arc_dma_sync_sg_for_cpu,
- .sync_sg_for_device = arc_dma_sync_sg_for_device,
- .dma_supported = arc_dma_supported,
-};
-EXPORT_SYMBOL(arc_dma_ops);
--
2.17.0


2018-04-20 08:17:35

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/22] dma-mapping: provide a generic dma-noncoherent implementation

Add a new dma_map_ops implementation that uses dma-direct for the
address mapping of streaming mappings, and which requires arch-specific
implemenations of coherent allocate/free.

Architectures have to provide flushing helpers to ownership trasnfers
to the device and/or CPU, and can provide optional implementations of
the coherent mmap functionality, and the cache_flush routines for
non-coherent long term allocations.

Signed-off-by: Christoph Hellwig <[email protected]>
---
MAINTAINERS | 2 +
include/asm-generic/dma-mapping.h | 9 +++
include/linux/dma-direct.h | 7 ++-
include/linux/dma-mapping.h | 1 +
include/linux/dma-noncoherent.h | 47 ++++++++++++++
lib/Kconfig | 20 ++++++
lib/Makefile | 1 +
lib/dma-direct.c | 8 +--
lib/dma-noncoherent.c | 101 ++++++++++++++++++++++++++++++
9 files changed, 191 insertions(+), 5 deletions(-)
create mode 100644 include/linux/dma-noncoherent.h
create mode 100644 lib/dma-noncoherent.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b60179d948bb..bbaf5459d297 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4327,12 +4327,14 @@ W: http://git.infradead.org/users/hch/dma-mapping.git
S: Supported
F: lib/dma-debug.c
F: lib/dma-direct.c
+F: lib/dma-noncoherent.c
F: lib/dma-virt.c
F: drivers/base/dma-mapping.c
F: drivers/base/dma-coherent.c
F: include/asm-generic/dma-mapping.h
F: include/linux/dma-direct.h
F: include/linux/dma-mapping.h
+F: include/linux/dma-noncoherent.h

DME1737 HARDWARE MONITOR DRIVER
M: Juerg Haefliger <[email protected]>
diff --git a/include/asm-generic/dma-mapping.h b/include/asm-generic/dma-mapping.h
index 880a292d792f..ad2868263867 100644
--- a/include/asm-generic/dma-mapping.h
+++ b/include/asm-generic/dma-mapping.h
@@ -4,7 +4,16 @@

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
+ /*
+ * Use the non-coherent ops if available. If an architecture wants a
+ * more fine-grained selection of operations it will have to implement
+ * get_arch_dma_ops itself or use the per-device dma_ops.
+ */
+#ifdef CONFIG_DMA_NONCOHERENT_OPS
+ return &dma_noncoherent_ops;
+#else
return &dma_direct_ops;
+#endif
}

#endif /* _ASM_GENERIC_DMA_MAPPING_H */
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 53ad6a47f513..8d9f33febde5 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -59,6 +59,11 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_addr, unsigned long attrs);
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+ enum dma_data_direction dir, unsigned long attrs);
int dma_direct_supported(struct device *dev, u64 mask);
-
+int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr);
#endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 25a9a2b04f78..4be070df5fc5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -136,6 +136,7 @@ struct dma_map_ops {
};

extern const struct dma_map_ops dma_direct_ops;
+extern const struct dma_map_ops dma_noncoherent_ops;
extern const struct dma_map_ops dma_virt_ops;

#define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
new file mode 100644
index 000000000000..10b2654d549b
--- /dev/null
+++ b/include/linux/dma-noncoherent.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_DMA_NONCOHERENT_H
+#define _LINUX_DMA_NONCOHERENT_H 1
+
+#include <linux/dma-mapping.h>
+
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs);
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_addr, unsigned long attrs);
+
+#ifdef CONFIG_DMA_NONCOHERENT_MMAP
+int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size,
+ unsigned long attrs);
+#else
+#define arch_dma_mmap NULL
+#endif /* CONFIG_DMA_NONCOHERENT_MMAP */
+
+#ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC
+void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+ enum dma_data_direction direction);
+#else
+#define arch_dma_cache_sync NULL
+#endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
+
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir);
+#else
+static inline void arch_sync_dma_for_device(struct device *dev,
+ phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+}
+#endif /* ARCH_HAS_SYNC_DMA_FOR_DEVICE */
+
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir);
+#else
+static inline void arch_sync_dma_for_cpu(struct device *dev,
+ phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+}
+#endif /* ARCH_HAS_SYNC_DMA_FOR_CPU */
+
+#endif /* _LINUX_DMA_NONCOHERENT_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 726b0562caa7..ae4bf1d8af11 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -441,10 +441,30 @@ config ARCH_DMA_ADDR_T_64BIT
config IOMMU_HELPER
bool

+config ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ bool
+
+config ARCH_HAS_SYNC_DMA_FOR_CPU
+ bool
+ select NEED_DMA_MAP_STATE
+
config DMA_DIRECT_OPS
bool
depends on HAS_DMA

+config DMA_NONCOHERENT_OPS
+ bool
+ depends on HAS_DMA
+ select DMA_DIRECT_OPS
+
+config DMA_NONCOHERENT_MMAP
+ bool
+ depends on DMA_NONCOHERENT_OPS
+
+config DMA_NONCOHERENT_CACHE_SYNC
+ bool
+ depends on DMA_NONCOHERENT_OPS
+
config DMA_VIRT_OPS
bool
depends on HAS_DMA
diff --git a/lib/Makefile b/lib/Makefile
index 94203b5eecd4..9f18c8152281 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -30,6 +30,7 @@ lib-$(CONFIG_PRINTK) += dump_stack.o
lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
lib-$(CONFIG_DMA_DIRECT_OPS) += dma-direct.o
+lib-$(CONFIG_DMA_NONCOHERENT_OPS) += dma-noncoherent.o
lib-$(CONFIG_DMA_VIRT_OPS) += dma-virt.o

lib-y += kobject.o klist.o
diff --git a/lib/dma-direct.c b/lib/dma-direct.c
index 199ae4cdd28f..9abb93541dfe 100644
--- a/lib/dma-direct.c
+++ b/lib/dma-direct.c
@@ -120,7 +120,7 @@ void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
free_pages((unsigned long)cpu_addr, page_order);
}

-static dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
@@ -131,8 +131,8 @@ static dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
return dma_addr;
}

-static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir, unsigned long attrs)
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+ enum dma_data_direction dir, unsigned long attrs)
{
int i;
struct scatterlist *sg;
@@ -167,7 +167,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
return 1;
}

-static int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr)
+int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr)
{
return dma_addr == DIRECT_MAPPING_ERROR;
}
diff --git a/lib/dma-noncoherent.c b/lib/dma-noncoherent.c
new file mode 100644
index 000000000000..f4b8532c20ac
--- /dev/null
+++ b/lib/dma-noncoherent.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Christoph Hellwig.
+ *
+ * DMA operations that map physical memory directly without providing cache
+ * coherence.
+ */
+#include <linux/export.h>
+#include <linux/mm.h>
+#include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
+#include <linux/scatterlist.h>
+
+static void dma_noncoherent_sync_single_for_device(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+ arch_sync_dma_for_device(dev, dma_to_phys(dev, addr), size, dir);
+}
+
+static void dma_noncoherent_sync_sg_for_device(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ for_each_sg(sgl, sg, nents, i)
+ arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
+}
+
+static dma_addr_t dma_noncoherent_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ dma_addr_t addr;
+
+ addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+ if (!dma_mapping_error(dev, addr) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ arch_sync_dma_for_device(dev, page_to_phys(page), size, dir);
+ return addr;
+}
+
+static int dma_noncoherent_map_sg(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+ nents = dma_direct_map_sg(dev, sgl, nents, dir, attrs);
+ if (nents > 0 && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_noncoherent_sync_sg_for_device(dev, sgl, nents, dir);
+ return nents;
+}
+
+#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
+static void dma_noncoherent_sync_single_for_cpu(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+ arch_sync_dma_for_cpu(dev, dma_to_phys(dev, addr), size, dir);
+}
+
+static void dma_noncoherent_sync_sg_for_cpu(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ for_each_sg(sgl, sg, nents, i)
+ arch_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir);
+}
+
+static void dma_noncoherent_unmap_page(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_noncoherent_sync_single_for_cpu(dev, addr, size, dir);
+}
+
+static void dma_noncoherent_unmap_sg(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_noncoherent_sync_sg_for_cpu(dev, sgl, nents, dir);
+}
+#endif
+
+const struct dma_map_ops dma_noncoherent_ops = {
+ .alloc = arch_dma_alloc,
+ .free = arch_dma_free,
+ .mmap = arch_dma_mmap,
+ .sync_single_for_device = dma_noncoherent_sync_single_for_device,
+ .sync_sg_for_device = dma_noncoherent_sync_sg_for_device,
+ .map_page = dma_noncoherent_map_page,
+ .map_sg = dma_noncoherent_map_sg,
+#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
+ .sync_single_for_cpu = dma_noncoherent_sync_single_for_cpu,
+ .sync_sg_for_cpu = dma_noncoherent_sync_sg_for_cpu,
+ .unmap_page = dma_noncoherent_unmap_page,
+ .unmap_sg = dma_noncoherent_unmap_sg,
+#endif
+ .dma_supported = dma_direct_supported,
+ .mapping_error = dma_direct_mapping_error,
+ .cache_sync = arch_dma_cache_sync,
+};
+EXPORT_SYMBOL(dma_noncoherent_ops);
--
2.17.0


2018-04-20 08:17:58

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/22] alpha: use dma_direct_ops for jensen

The generic dma_direct implementation does the same thing as the alpha
pci-noop implementation, just with more bells and whistles. And unlike
the current code it at least has a theoretical chance to actually compile.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/alpha/Kconfig | 1 +
arch/alpha/include/asm/dma-mapping.h | 4 ++++
arch/alpha/kernel/pci-noop.c | 33 ----------------------------
3 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index aa7df1a36fd0..94af0c7f494a 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -204,6 +204,7 @@ config ALPHA_EIGER
config ALPHA_JENSEN
bool "Jensen"
depends on BROKEN
+ select DMA_DIRECT_OPS
help
DEC PC 150 AXP (aka Jensen): This is a very old Digital system - one
of the first-generation Alpha systems. A number of these systems
diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
index b78f61f20796..76ce923ecca1 100644
--- a/arch/alpha/include/asm/dma-mapping.h
+++ b/arch/alpha/include/asm/dma-mapping.h
@@ -6,7 +6,11 @@ extern const struct dma_map_ops *dma_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
+#ifdef CONFIG_ALPHA_JENSEN
+ return &dma_direct_ops;
+#else
return dma_ops;
+#endif
}

#endif /* _ALPHA_DMA_MAPPING_H */
diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c
index b6ebb65127a8..c7c5879869d3 100644
--- a/arch/alpha/kernel/pci-noop.c
+++ b/arch/alpha/kernel/pci-noop.c
@@ -102,36 +102,3 @@ SYSCALL_DEFINE5(pciconfig_write, unsigned long, bus, unsigned long, dfn,
else
return -ENODEV;
}
-
-static void *alpha_noop_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp,
- unsigned long attrs)
-{
- void *ret;
-
- if (!dev || *dev->dma_mask >= 0xffffffffUL)
- gfp &= ~GFP_DMA;
- ret = (void *)__get_free_pages(gfp, get_order(size));
- if (ret) {
- memset(ret, 0, size);
- *dma_handle = virt_to_phys(ret);
- }
- return ret;
-}
-
-static int alpha_noop_supported(struct device *dev, u64 mask)
-{
- return mask < 0x00ffffffUL ? 0 : 1;
-}
-
-const struct dma_map_ops alpha_noop_ops = {
- .alloc = alpha_noop_alloc_coherent,
- .free = dma_noop_free_coherent,
- .map_page = dma_noop_map_page,
- .map_sg = dma_noop_map_sg,
- .mapping_error = dma_noop_mapping_error,
- .dma_supported = alpha_noop_supported,
-};
-
-const struct dma_map_ops *dma_ops = &alpha_noop_ops;
-EXPORT_SYMBOL(dma_ops);
--
2.17.0


2018-04-20 08:18:21

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/22] alpha: simplify get_arch_dma_ops

Remove the dma_ops indirection.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/alpha/include/asm/dma-mapping.h | 4 ++--
arch/alpha/kernel/pci_iommu.c | 4 +---
2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
index 76ce923ecca1..8beeafd4f68e 100644
--- a/arch/alpha/include/asm/dma-mapping.h
+++ b/arch/alpha/include/asm/dma-mapping.h
@@ -2,14 +2,14 @@
#ifndef _ALPHA_DMA_MAPPING_H
#define _ALPHA_DMA_MAPPING_H

-extern const struct dma_map_ops *dma_ops;
+extern const struct dma_map_ops alpha_pci_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
#ifdef CONFIG_ALPHA_JENSEN
return &dma_direct_ops;
#else
- return dma_ops;
+ return &alpha_pci_ops;
#endif
}

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 83b34b9188ea..6923b0d9c1e1 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -950,6 +950,4 @@ const struct dma_map_ops alpha_pci_ops = {
.mapping_error = alpha_pci_mapping_error,
.dma_supported = alpha_pci_supported,
};
-
-const struct dma_map_ops *dma_ops = &alpha_pci_ops;
-EXPORT_SYMBOL(dma_ops);
+EXPORT_SYMBOL(alpha_pci_ops);
--
2.17.0


2018-04-20 10:25:23

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 01/22] dma-debug: move initialization to common code

Hi Christoph,

Nice cleanup! Looks good overall, just a couple of nits.

On 20/04/18 09:02, Christoph Hellwig wrote:
[...]
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 7f5cdc1e6b29..712a897174e4 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -41,6 +41,11 @@
> #define HASH_FN_SHIFT 13
> #define HASH_FN_MASK (HASH_SIZE - 1)
>
> +/* allow architectures to override this if absolutely required */
> +#ifndef PREALLOC_DMA_DEBUG_ENTRIES
> +#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
> +#endif
> +
> enum {
> dma_debug_single,
> dma_debug_page,
> @@ -1004,18 +1009,16 @@ void dma_debug_add_bus(struct bus_type *bus)
> bus_register_notifier(bus, nb);
> }
>
> -/*
> - * Let the architectures decide how many entries should be preallocated.
> - */
> -void dma_debug_init(u32 num_entries)
> +static int dma_debug_init(void)
> {
> + u32 num_entries;

Maybe initialise it to PREALLOC_DMA_DEBUG_ENTRIES?

> int i;
>
> /* Do not use dma_debug_initialized here, since we really want to be
> * called to set dma_debug_initialized
> */
> if (global_disable)
> - return;
> + return 0;
>
> for (i = 0; i < HASH_SIZE; ++i) {
> INIT_LIST_HEAD(&dma_entry_hash[i].list);
> @@ -1026,17 +1029,19 @@ void dma_debug_init(u32 num_entries)
> pr_err("DMA-API: error creating debugfs entries - disabling\n");
> global_disable = true;
>
> - return;
> + return 0;
> }
>
> if (req_entries)
> num_entries = req_entries;
> + else
> + num_entries = PREALLOC_DMA_DEBUG_ENTRIES;
>
> if (prealloc_memory(num_entries) != 0) {
> pr_err("DMA-API: debugging out of memory error - disabled\n");
> global_disable = true;
>
> - return;
> + return 0;
> }
>
> nr_total_entries = num_free_entries;
> @@ -1044,7 +1049,9 @@ void dma_debug_init(u32 num_entries)
> dma_debug_initialized = true;
>
> pr_info("DMA-API: debugging enabled by kernel config\n");
> + return 0;
> }
> +core_initcall(dma_debug_init);

I think it's worth noting that for most users this now happens much
earlier than before. In general that's probably good (e.g. on arm64 it
should prevent false-positives from the Arm SMMU drivers under ACPI),
and I can't imagine it's high-risk, but it is a behaviour change.

Robin.

>
> static __init int dma_debug_cmdline(char *str)
> {
>

2018-04-21 17:45:46

by Helge Deller

[permalink] [raw]
Subject: Re: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

On 20.04.2018 10:03, Christoph Hellwig wrote:
> Switch to the generic noncoherent direct mapping implementation.
>
> Parisc previously had two different non-coherent dma ops implementation
> that just different in the way coherent allocations were handled or not
> handled. The different behavior is not selected at runtime in the
> arch_dma_alloc and arch_dma_free routines. The non-coherent allocation
> in the pcx cases now uses the dma_direct helpers that are a little more
> sophisticated and used by a lot of other architectures.
>
> Fix sync_single_for_cpu to do skip the cache flush unless the transfer
> is to the device to match the more tested unmap_single path which should
> have the same cache coherency implications.
>
> This also now consistenly uses flush_kernel_dcache_range for cache
> flushing while previously some of the SG based operations used
> flush_kernel_vmap_range instead.


This patch breaks a 32bit kernel on a B160L machine (PA7300LC CPU, "pcxl2").
After applying this patch series the lasi82956 network driver works unreliable.
NIC gets IP, but ping doesn't work.
See drivers/net/ethernet/i825xx/lasi_82596.c, it uses dma*sync() functions.

Helge


> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> arch/parisc/Kconfig | 4 +
> arch/parisc/include/asm/dma-mapping.h | 5 -
> arch/parisc/kernel/pci-dma.c | 181 ++++----------------------
> arch/parisc/kernel/setup.c | 8 +-
> arch/parisc/mm/init.c | 11 +-
> 5 files changed, 35 insertions(+), 174 deletions(-)
>
> diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
> index 47047f0cbe35..80166a1cbcb7 100644
> --- a/arch/parisc/Kconfig
> +++ b/arch/parisc/Kconfig
> @@ -188,6 +188,10 @@ config PA20
> config PA11
> def_bool y
> depends on PA7000 || PA7100LC || PA7200 || PA7300LC
> + select ARCH_HAS_SYNC_DMA_FOR_CPU
> + select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> + select DMA_NONCOHERENT_OPS
> + select DMA_NONCOHERENT_CACHE_SYNC
>
> config PREFETCH
> def_bool y
> diff --git a/arch/parisc/include/asm/dma-mapping.h b/arch/parisc/include/asm/dma-mapping.h
> index 01e1fc057c83..44a9f97194aa 100644
> --- a/arch/parisc/include/asm/dma-mapping.h
> +++ b/arch/parisc/include/asm/dma-mapping.h
> @@ -21,11 +21,6 @@
> ** flush/purge and allocate "regular" cacheable pages for everything.
> */
>
> -#ifdef CONFIG_PA11
> -extern const struct dma_map_ops pcxl_dma_ops;
> -extern const struct dma_map_ops pcx_dma_ops;
> -#endif
> -
> extern const struct dma_map_ops *hppa_dma_ops;
>
> static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
> index 91bc0cac03a1..235e2e53959e 100644
> --- a/arch/parisc/kernel/pci-dma.c
> +++ b/arch/parisc/kernel/pci-dma.c
> @@ -21,13 +21,12 @@
> #include <linux/init.h>
> #include <linux/gfp.h>
> #include <linux/mm.h>
> -#include <linux/pci.h>
> #include <linux/proc_fs.h>
> #include <linux/seq_file.h>
> #include <linux/string.h>
> #include <linux/types.h>
> -#include <linux/scatterlist.h>
> -#include <linux/export.h>
> +#include <linux/dma-direct.h>
> +#include <linux/dma-noncoherent.h>
>
> #include <asm/cacheflush.h>
> #include <asm/dma.h> /* for DMA_CHUNK_SIZE */
> @@ -447,178 +446,48 @@ static void pa11_dma_free(struct device *dev, size_t size, void *vaddr,
> free_pages((unsigned long)__va(dma_handle), order);
> }
>
> -static dma_addr_t pa11_dma_map_page(struct device *dev, struct page *page,
> - unsigned long offset, size_t size,
> - enum dma_data_direction direction, unsigned long attrs)
> +void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> + size_t size, enum dma_data_direction dir)
> {
> - void *addr = page_address(page) + offset;
> - BUG_ON(direction == DMA_NONE);
> -
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> - flush_kernel_dcache_range((unsigned long) addr, size);
> -
> - return virt_to_phys(addr);
> + flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
> }
>
> -static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> - size_t size, enum dma_data_direction direction,
> - unsigned long attrs)
> +void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> + size_t size, enum dma_data_direction dir)
> {
> - BUG_ON(direction == DMA_NONE);
> -
> - if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
> - return;
> -
> - if (direction == DMA_TO_DEVICE)
> + if (dir == DMA_TO_DEVICE)
> return;
>
> /*
> - * For PCI_DMA_FROMDEVICE this flush is not necessary for the
> + * For DMA_FROM_DEVICE this flush is not necessary for the
> * simple map/unmap case. However, it IS necessary if if
> - * pci_dma_sync_single_* has been called and the buffer reused.
> + * dma_sync_single_* has been called and the buffer reused.
> */
>
> - flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle), size);
> -}
> -
> -static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist,
> - int nents, enum dma_data_direction direction,
> - unsigned long attrs)
> -{
> - int i;
> - struct scatterlist *sg;
> -
> - BUG_ON(direction == DMA_NONE);
> -
> - for_each_sg(sglist, sg, nents, i) {
> - unsigned long vaddr = (unsigned long)sg_virt(sg);
> -
> - sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr);
> - sg_dma_len(sg) = sg->length;
> -
> - if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
> - continue;
> -
> - flush_kernel_dcache_range(vaddr, sg->length);
> - }
> - return nents;
> + flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
> }
>
> -static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
> - int nents, enum dma_data_direction direction,
> - unsigned long attrs)
> -{
> - int i;
> - struct scatterlist *sg;
> -
> - BUG_ON(direction == DMA_NONE);
> -
> - if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
> - return;
> -
> - if (direction == DMA_TO_DEVICE)
> - return;
> -
> - /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
> -
> - for_each_sg(sglist, sg, nents, i)
> - flush_kernel_vmap_range(sg_virt(sg), sg->length);
> -}
> -
> -static void pa11_dma_sync_single_for_cpu(struct device *dev,
> - dma_addr_t dma_handle, size_t size,
> - enum dma_data_direction direction)
> -{
> - BUG_ON(direction == DMA_NONE);
> -
> - flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle),
> - size);
> -}
> -
> -static void pa11_dma_sync_single_for_device(struct device *dev,
> - dma_addr_t dma_handle, size_t size,
> - enum dma_data_direction direction)
> -{
> - BUG_ON(direction == DMA_NONE);
> -
> - flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle),
> - size);
> -}
> -
> -static void pa11_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction)
> -{
> - int i;
> - struct scatterlist *sg;
> -
> - /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
> -
> - for_each_sg(sglist, sg, nents, i)
> - flush_kernel_vmap_range(sg_virt(sg), sg->length);
> -}
> -
> -static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction)
> -{
> - int i;
> - struct scatterlist *sg;
> -
> - /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
> -
> - for_each_sg(sglist, sg, nents, i)
> - flush_kernel_vmap_range(sg_virt(sg), sg->length);
> -}
> -
> -static void pa11_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
> +void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
> enum dma_data_direction direction)
> {
> flush_kernel_dcache_range((unsigned long)vaddr, size);
> }
>
> -const struct dma_map_ops pcxl_dma_ops = {
> - .alloc = pa11_dma_alloc,
> - .free = pa11_dma_free,
> - .map_page = pa11_dma_map_page,
> - .unmap_page = pa11_dma_unmap_page,
> - .map_sg = pa11_dma_map_sg,
> - .unmap_sg = pa11_dma_unmap_sg,
> - .sync_single_for_cpu = pa11_dma_sync_single_for_cpu,
> - .sync_single_for_device = pa11_dma_sync_single_for_device,
> - .sync_sg_for_cpu = pa11_dma_sync_sg_for_cpu,
> - .sync_sg_for_device = pa11_dma_sync_sg_for_device,
> - .cache_sync = pa11_dma_cache_sync,
> -};
> -
> -static void *pcx_dma_alloc(struct device *dev, size_t size,
> - dma_addr_t *dma_handle, gfp_t flag, unsigned long attrs)
> +void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> + gfp_t gfp, unsigned long attrs)
> {
> - void *addr;
> -
> - if ((attrs & DMA_ATTR_NON_CONSISTENT) == 0)
> - return NULL;
> -
> - addr = (void *)__get_free_pages(flag, get_order(size));
> - if (addr)
> - *dma_handle = (dma_addr_t)virt_to_phys(addr);
> -
> - return addr;
> + if (boot_cpu_data.cpu_type == pcxl2 || boot_cpu_data.cpu_type == pcxl)
> + return pa11_dma_alloc(dev, size, dma_handle, gfp, attrs);
> + if (attrs & DMA_ATTR_NON_CONSISTENT)
> + return dma_direct_alloc(dev, size, dma_handle, gfp, attrs);
> + return NULL;
> }
>
> -static void pcx_dma_free(struct device *dev, size_t size, void *vaddr,
> - dma_addr_t iova, unsigned long attrs)
> +void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
> + dma_addr_t dma_addr, unsigned long attrs)
> {
> - free_pages((unsigned long)vaddr, get_order(size));
> - return;
> + if (boot_cpu_data.cpu_type == pcxl2 || boot_cpu_data.cpu_type == pcxl)
> + pa11_dma_free(dev, size, cpu_addr, dma_addr, attrs);
> + else
> + dma_direct_free(dev, size, cpu_addr, dma_addr, attrs);
> }
> -
> -const struct dma_map_ops pcx_dma_ops = {
> - .alloc = pcx_dma_alloc,
> - .free = pcx_dma_free,
> - .map_page = pa11_dma_map_page,
> - .unmap_page = pa11_dma_unmap_page,
> - .map_sg = pa11_dma_map_sg,
> - .unmap_sg = pa11_dma_unmap_sg,
> - .sync_single_for_cpu = pa11_dma_sync_single_for_cpu,
> - .sync_single_for_device = pa11_dma_sync_single_for_device,
> - .sync_sg_for_cpu = pa11_dma_sync_sg_for_cpu,
> - .sync_sg_for_device = pa11_dma_sync_sg_for_device,
> - .cache_sync = pa11_dma_cache_sync,
> -};
> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
> index 8d3a7b80ac42..4e87c35c22b7 100644
> --- a/arch/parisc/kernel/setup.c
> +++ b/arch/parisc/kernel/setup.c
> @@ -97,14 +97,12 @@ void __init dma_ops_init(void)
> panic( "PA-RISC Linux currently only supports machines that conform to\n"
> "the PA-RISC 1.1 or 2.0 architecture specification.\n");
>
> - case pcxs:
> - case pcxt:
> - hppa_dma_ops = &pcx_dma_ops;
> - break;
> case pcxl2:
> pa7300lc_init();
> case pcxl: /* falls through */
> - hppa_dma_ops = &pcxl_dma_ops;
> + case pcxs:
> + case pcxt:
> + hppa_dma_ops = &dma_noncoherent_ops;
> break;
> default:
> break;
> diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> index cab32ee824d2..4ad91c28ecbe 100644
> --- a/arch/parisc/mm/init.c
> +++ b/arch/parisc/mm/init.c
> @@ -19,7 +19,6 @@
> #include <linux/gfp.h>
> #include <linux/delay.h>
> #include <linux/init.h>
> -#include <linux/pci.h> /* for hppa_dma_ops and pcxl_dma_ops */
> #include <linux/initrd.h>
> #include <linux/swap.h>
> #include <linux/unistd.h>
> @@ -616,17 +615,13 @@ void __init mem_init(void)
> free_all_bootmem();
>
> #ifdef CONFIG_PA11
> - if (hppa_dma_ops == &pcxl_dma_ops) {
> + if (boot_cpu_data.cpu_type == pcxl2 || boot_cpu_data.cpu_type == pcxl) {
> pcxl_dma_start = (unsigned long)SET_MAP_OFFSET(MAP_START);
> parisc_vmalloc_start = SET_MAP_OFFSET(pcxl_dma_start
> + PCXL_DMA_MAP_SIZE);
> - } else {
> - pcxl_dma_start = 0;
> - parisc_vmalloc_start = SET_MAP_OFFSET(MAP_START);
> - }
> -#else
> - parisc_vmalloc_start = SET_MAP_OFFSET(MAP_START);
> + } else
> #endif
> + parisc_vmalloc_start = SET_MAP_OFFSET(MAP_START);
>
> mem_init_print_info(NULL);
>
>


2018-04-21 21:45:07

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

On Fri, 2018-04-20 at 10:03 +0200, Christoph Hellwig wrote:
> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
> index 8d3a7b80ac42..4e87c35c22b7 100644
> --- a/arch/parisc/kernel/setup.c
> +++ b/arch/parisc/kernel/setup.c
> @@ -97,14 +97,12 @@ void __init dma_ops_init(void)
>   panic( "PA-RISC Linux currently only supports
> machines that conform to\n"
>   "the PA-RISC 1.1 or 2.0 architecture
> specification.\n");
>  
> - case pcxs:
> - case pcxt:
> - hppa_dma_ops = &pcx_dma_ops;
> - break;
>   case pcxl2:
>   pa7300lc_init();
>   case pcxl: /* falls through */
> - hppa_dma_ops = &pcxl_dma_ops;
> + case pcxs:
> + case pcxt:
> + hppa_dma_ops = &dma_noncoherent_ops;
>   break;
>   default:
>   break;

Well, this is wrong: you just made every 32 bit parisc system
unnecessarily use non-coherent. We actually only have a small small
set of non-coherent systems. The pxcs and pcxt systems (which are
about 99% of the user base) can use coherent dma ops. The problem
seems to be in your new world you only have one dma_noncoherent_ops
pointer ... we definitely need two on parisc, so whether
arch_dma_cache_sync is present or not needs to be dynamic not config
defined.

James


2018-04-21 21:58:34

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

On Sat, 2018-04-21 at 19:43 +0200, Helge Deller wrote:
> On 20.04.2018 10:03, Christoph Hellwig wrote:
> > Switch to the generic noncoherent direct mapping implementation.
> >
> > Parisc previously had two different non-coherent dma ops
> > implementation that just different in the way coherent allocations
> > were handled or not handled.  The different behavior is not
> > selected at runtime in the arch_dma_alloc and arch_dma_free
> > routines.  The non-coherent allocation in the pcx cases now uses
> > the dma_direct helpers that are a little more sophisticated and
> > used by a lot of other architectures.
> >
> > Fix sync_single_for_cpu to do skip the cache flush unless the
> > transfer is to the device to match the more tested unmap_single
> > path which should have the same cache coherency implications.
> >
> > This also now consistenly uses flush_kernel_dcache_range for cache
> > flushing while previously some of the SG based operations used
> > flush_kernel_vmap_range instead.
>
>
> This patch breaks a 32bit kernel on a B160L machine (PA7300LC CPU,
> "pcxl2"). After applying this patch series the lasi82956 network
> driver works unreliable.  NIC gets IP, but ping doesn't work.
> See drivers/net/ethernet/i825xx/lasi_82596.c, it uses dma*sync()
> functions.

That's actually a weird result. The 32 bit machines have two cases:
those that can make uncached memory by setting the U bit (and thus
don't need the sync operations in the lasi and D700 drivers) and those
that can't. The latter is basically only the old 700 series. The B180
is in the class of can set pages to uncached, so it sounds like
something in our uncached memory allocation for dma areas is failing
after this patch set.

I still have an old 700 in my box of curiosities, so I can try to dust
it off and plug it back in when I get home to see what it makes of the
series when it gets fixed.

James

James


2018-04-23 06:51:56

by Greentime Hu

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

2018-04-20 16:03 GMT+08:00 Christoph Hellwig <[email protected]>:
> Switch to the generic noncoherent direct mapping implementation.
>
> This makes sure kmap_atomic_pfn is consistently used for access to
> virtual addresses instead of either using the slower plain kmap
> or blindly expecting page_address() to work.
>
> This makes sure the cache_sync routines is called in the unmap_sg
> case, to match the unmap_single and sync_{single,sg}_to_cpu cases.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> arch/nds32/Kconfig | 3 +
> arch/nds32/include/asm/Kbuild | 1 +
> arch/nds32/include/asm/dma-mapping.h | 14 ---
> arch/nds32/kernel/dma.c | 182 ++++++---------------------
> 4 files changed, 39 insertions(+), 161 deletions(-)
> delete mode 100644 arch/nds32/include/asm/dma-mapping.h
>
> diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
> index 249f38d3388f..67d0ac0a989c 100644
> --- a/arch/nds32/Kconfig
> +++ b/arch/nds32/Kconfig
> @@ -5,10 +5,13 @@
>
> config NDS32
> def_bool y
> + select ARCH_HAS_SYNC_DMA_FOR_CPU
> + select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> select ARCH_WANT_FRAME_POINTERS if FTRACE
> select CLKSRC_MMIO
> select CLONE_BACKWARDS
> select COMMON_CLK
> + select DMA_NONCOHERENT_OPS
> select GENERIC_ATOMIC64
> select GENERIC_CPU_DEVICES
> select GENERIC_CLOCKEVENTS
> diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
> index 06bdf8167f5a..b3e951f805f8 100644
> --- a/arch/nds32/include/asm/Kbuild
> +++ b/arch/nds32/include/asm/Kbuild
> @@ -13,6 +13,7 @@ generic-y += cputime.h
> generic-y += device.h
> generic-y += div64.h
> generic-y += dma.h
> +generic-y += dma-mapping.h
> generic-y += emergency-restart.h
> generic-y += errno.h
> generic-y += exec.h
> diff --git a/arch/nds32/include/asm/dma-mapping.h b/arch/nds32/include/asm/dma-mapping.h
> deleted file mode 100644
> index 2dd47d245c25..000000000000
> --- a/arch/nds32/include/asm/dma-mapping.h
> +++ /dev/null
> @@ -1,14 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -// Copyright (C) 2005-2017 Andes Technology Corporation
> -
> -#ifndef ASMNDS32_DMA_MAPPING_H
> -#define ASMNDS32_DMA_MAPPING_H
> -
> -extern struct dma_map_ops nds32_dma_ops;
> -
> -static inline struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
> -{
> - return &nds32_dma_ops;
> -}
> -
> -#endif
> diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
> index d291800fc621..688f1a03dee6 100644
> --- a/arch/nds32/kernel/dma.c
> +++ b/arch/nds32/kernel/dma.c
> @@ -3,17 +3,14 @@
>
> #include <linux/types.h>
> #include <linux/mm.h>
> -#include <linux/export.h>
> #include <linux/string.h>
> -#include <linux/scatterlist.h>
> -#include <linux/dma-mapping.h>
> +#include <linux/dma-noncoherent.h>
> #include <linux/io.h>
> #include <linux/cache.h>
> #include <linux/highmem.h>
> #include <linux/slab.h>
> #include <asm/cacheflush.h>
> #include <asm/tlbflush.h>
> -#include <asm/dma-mapping.h>
> #include <asm/proc-fns.h>
>
> /*
> @@ -22,11 +19,6 @@
> static pte_t *consistent_pte;
> static DEFINE_RAW_SPINLOCK(consistent_lock);
>
> -enum master_type {
> - FOR_CPU = 0,
> - FOR_DEVICE = 1,
> -};
> -
> /*
> * VM region handling support.
> *
> @@ -124,10 +116,8 @@ static struct arch_vm_region *vm_region_find(struct arch_vm_region *head,
> return c;
> }
>
> -/* FIXME: attrs is not used. */
> -static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
> - dma_addr_t * handle, gfp_t gfp,
> - unsigned long attrs)
> +void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> + gfp_t gfp, unsigned long attrs)
> {
> struct page *page;
> struct arch_vm_region *c;
> @@ -232,8 +222,8 @@ static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
> return NULL;
> }
>
> -static void nds32_dma_free(struct device *dev, size_t size, void *cpu_addr,
> - dma_addr_t handle, unsigned long attrs)
> +void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
> + dma_addr_t handle, unsigned long attrs)
> {
> struct arch_vm_region *c;
> unsigned long flags, addr;
> @@ -333,145 +323,43 @@ static int __init consistent_init(void)
> }
>
> core_initcall(consistent_init);
> -static void consistent_sync(void *vaddr, size_t size, int direction, int master_type);
> -static dma_addr_t nds32_dma_map_page(struct device *dev, struct page *page,
> - unsigned long offset, size_t size,
> - enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> - consistent_sync((void *)(page_address(page) + offset), size, dir, FOR_DEVICE);
> - return page_to_phys(page) + offset;
> -}
> -
> -static void nds32_dma_unmap_page(struct device *dev, dma_addr_t handle,
> - size_t size, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> - consistent_sync(phys_to_virt(handle), size, dir, FOR_CPU);
> -}
>
> -/*
> - * Make an area consistent for devices.
> - */
> -static void consistent_sync(void *vaddr, size_t size, int direction, int master_type)
> +void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> + size_t size, enum dma_data_direction dir)
> {
> - unsigned long start = (unsigned long)vaddr;
> - unsigned long end = start + size;
> -
> - if (master_type == FOR_CPU) {
> - switch (direction) {
> - case DMA_TO_DEVICE:
> - break;
> - case DMA_FROM_DEVICE:
> - case DMA_BIDIRECTIONAL:
> - cpu_dma_inval_range(start, end);
> - break;
> - default:
> - BUG();
> - }
> - } else {
> - /* FOR_DEVICE */
> - switch (direction) {
> - case DMA_FROM_DEVICE:
> - break;
> - case DMA_TO_DEVICE:
> - case DMA_BIDIRECTIONAL:
> - cpu_dma_wb_range(start, end);
> - break;
> - default:
> - BUG();
> - }
> + void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
> + unsigned long start = (unsigned long)addr;
> +
> + switch (direction) {
> + case DMA_FROM_DEVICE:
> + break;
> + case DMA_TO_DEVICE:
> + case DMA_BIDIRECTIONAL:
> + cpu_dma_wb_range(start, start + size);
> + break;
> + default:
> + BUG();
> }
> -}
>
> -static int nds32_dma_map_sg(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - int i;
> -
> - for (i = 0; i < nents; i++, sg++) {
> - void *virt;
> - unsigned long pfn;
> - struct page *page = sg_page(sg);
> -
> - sg->dma_address = sg_phys(sg);
> - pfn = page_to_pfn(page) + sg->offset / PAGE_SIZE;
> - page = pfn_to_page(pfn);
> - if (PageHighMem(page)) {
> - virt = kmap_atomic(page);
> - consistent_sync(virt, sg->length, dir, FOR_CPU);
> - kunmap_atomic(virt);
> - } else {
> - if (sg->offset > PAGE_SIZE)
> - panic("sg->offset:%08x > PAGE_SIZE\n",
> - sg->offset);
> - virt = page_address(page) + sg->offset;
> - consistent_sync(virt, sg->length, dir, FOR_CPU);
> - }
> - }
> - return nents;
> -}
> -
> -static void nds32_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
> - int nhwentries, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> + kunmap_atomic(addr);
> }
>
> -static void
> -nds32_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
> - size_t size, enum dma_data_direction dir)
> +void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> + size_t size, enum dma_data_direction dir)
> {
> - consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_CPU);
> -}
> -
> -static void
> -nds32_dma_sync_single_for_device(struct device *dev, dma_addr_t handle,
> - size_t size, enum dma_data_direction dir)
> -{
> - consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_DEVICE);
> -}
> -
> -static void
> -nds32_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents,
> - enum dma_data_direction dir)
> -{
> - int i;
> -
> - for (i = 0; i < nents; i++, sg++) {
> - char *virt =
> - page_address((struct page *)sg->page_link) + sg->offset;
> - consistent_sync(virt, sg->length, dir, FOR_CPU);
> + void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
> + unsigned long start = (unsigned long)addr;
> +
> + switch (direction) {
> + case DMA_TO_DEVICE:
> + break;
> + case DMA_FROM_DEVICE:
> + case DMA_BIDIRECTIONAL:
> + cpu_dma_inval_range(start, end);
> + break;
> + default:
> + BUG();
> }

CC arch/nds32/kernel/dma.o
arch/nds32/kernel/dma.c: In function 'arch_sync_dma_for_device':
arch/nds32/kernel/dma.c:333:10: error: 'direction' undeclared (first
use in this function)
switch (direction) {
^~~~~~~~~
arch/nds32/kernel/dma.c:333:10: note: each undeclared identifier is
reported only once for each function it appears in
arch/nds32/kernel/dma.c: In function 'arch_sync_dma_for_cpu':
arch/nds32/kernel/dma.c:353:10: error: 'direction' undeclared (first
use in this function)
switch (direction) {
^~~~~~~~~
arch/nds32/kernel/dma.c:358:30: error: 'end' undeclared (first use in
this function)
cpu_dma_inval_range(start, end);
^~~
make[1]: *** [arch/nds32/kernel/dma.o] Error 1
make: *** [arch/nds32/kernel] Error 2

After this building error, the ftmac100.c driver is broken. Not sure
what happened.

2018-04-24 07:53:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 01/22] dma-debug: move initialization to common code

On Fri, Apr 20, 2018 at 11:23:43AM +0100, Robin Murphy wrote:
>> -void dma_debug_init(u32 num_entries)
>> +static int dma_debug_init(void)
>> {
>> + u32 num_entries;
>
> Maybe initialise it to PREALLOC_DMA_DEBUG_ENTRIES?

We initialize it down in an if/else clause which seems a little more clear
to me, at the cost of two extra lines of code. But I suspect I should
just go a little further and merge the global req_entries and the local
num_entries into a single variable with a better name.

>> +core_initcall(dma_debug_init);
>
> I think it's worth noting that for most users this now happens much earlier
> than before. In general that's probably good (e.g. on arm64 it should
> prevent false-positives from the Arm SMMU drivers under ACPI), and I can't
> imagine it's high-risk, but it is a behaviour change.

I'll mention this in the changelog, thanks!

2018-04-24 10:26:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

On Sat, Apr 21, 2018 at 10:42:47PM +0100, James Bottomley wrote:
> Well, this is wrong: you just made every 32 bit parisc system
> unnecessarily use non-coherent. We actually only have a small small
> set of non-coherent systems. The pxcs and pcxt systems (which are
> about 99% of the user base) can use coherent dma ops. The problem
> seems to be in your new world you only have one dma_noncoherent_ops
> pointer ... we definitely need two on parisc, so whether
> arch_dma_cache_sync is present or not needs to be dynamic not config
> defined.

The changelog explicitly mentions merging the two noncoherent
implementations, they only differ in the alloc and free callsbacks,
and we now runtime switch between them. Before the pcxs and pcxt
cases used pcx_dma_ops, and pcxl and pxcl2 used pcxl_dma_ops, now
all four use dma_noncoherent_ops and arch_dma_alloc/arch_dma_free
branch out to different behavior.

Both pcx_dma_ops and pcxl_dma_ops do define the cache_sync
method in the existing code, so that isn't the issue.

I'll take a deeper look at what sort of behavior change might have
been introduced.

>
> James
---end quoted text---

2018-04-24 19:16:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

Hi Greentime,

thanks for testing the patch!

It looks like nds32 doesn't have a buildbot yet, so this code didn't
even get syntax checkin, sorry.

Below is the incremental fixes based on this thread.

Can you check if my tree works if you just revert the
"nds32: use generic dma_noncoherent_ops" commit?

diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
index 688f1a03dee6..48018275e7f4 100644
--- a/arch/nds32/kernel/dma.c
+++ b/arch/nds32/kernel/dma.c
@@ -330,7 +330,7 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
unsigned long start = (unsigned long)addr;

- switch (direction) {
+ switch (dir) {
case DMA_FROM_DEVICE:
break;
case DMA_TO_DEVICE:
@@ -350,12 +350,12 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
unsigned long start = (unsigned long)addr;

- switch (direction) {
+ switch (dir) {
case DMA_TO_DEVICE:
break;
case DMA_FROM_DEVICE:
case DMA_BIDIRECTIONAL:
- cpu_dma_inval_range(start, end);
+ cpu_dma_inval_range(start, start + size);
break;
default:
BUG();

2018-04-25 01:45:50

by Greentime Hu

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

2018-04-25 3:16 GMT+08:00 Christoph Hellwig <[email protected]>:
> Hi Greentime,
>
> thanks for testing the patch!
>
> It looks like nds32 doesn't have a buildbot yet, so this code didn't
> even get syntax checkin, sorry.
>
> Below is the incremental fixes based on this thread.
>
> Can you check if my tree works if you just revert the
> "nds32: use generic dma_noncoherent_ops" commit?
>
> diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
> index 688f1a03dee6..48018275e7f4 100644
> --- a/arch/nds32/kernel/dma.c
> +++ b/arch/nds32/kernel/dma.c
> @@ -330,7 +330,7 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
> unsigned long start = (unsigned long)addr;
>
> - switch (direction) {
> + switch (dir) {
> case DMA_FROM_DEVICE:
> break;
> case DMA_TO_DEVICE:
> @@ -350,12 +350,12 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> void *addr = kmap_atomic_pfn(PHYS_PFN(paddr));
> unsigned long start = (unsigned long)addr;
>
> - switch (direction) {
> + switch (dir) {
> case DMA_TO_DEVICE:
> break;
> case DMA_FROM_DEVICE:
> case DMA_BIDIRECTIONAL:
> - cpu_dma_inval_range(start, end);
> + cpu_dma_inval_range(start, start + size);
> break;
> default:
> BUG();

Hi Crhistoph,

The ftmac100 works if I revert this commit.

commit de46b9ba5298aafc47284735a4f21baa8e4ed4b7
Author: Greentime Hu <[email protected]>
Date: Wed Apr 25 09:33:51 2018 +0800

Revert "nds32: use generic dma_noncoherent_ops"

This reverts commit 0489ce952072e7542456e0d962437062916ce0df.

2018-04-25 06:40:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

On Wed, Apr 25, 2018 at 09:43:43AM +0800, Greentime Hu wrote:
> Hi Crhistoph,
>
> The ftmac100 works if I revert this commit.

Thanks. ftmac100 only use dma_map_page, which in the old nds32 code
is just doing a plain page_address and never kmaps. Can you apply
the patch below on the tree with the origin "nds32: use generic
dma_noncoherent_ops" reverted? This always just uses page_address,
although that, just like the original code is broken if you actually
have highmem that needs to be mapped:

---
From 1dc5d1cae4cd7b9ce03d0e2943364ed4cca938d7 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Mon, 16 Apr 2018 19:20:30 +0200
Subject: nds32: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

This makes sure the cache_sync routines is called in the unmap_sg
case, to match the unmap_single and sync_{single,sg}_to_cpu cases.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/nds32/Kconfig | 3 +
arch/nds32/include/asm/Kbuild | 1 +
arch/nds32/include/asm/dma-mapping.h | 14 ---
arch/nds32/kernel/dma.c | 182 +++++----------------------
4 files changed, 37 insertions(+), 163 deletions(-)
delete mode 100644 arch/nds32/include/asm/dma-mapping.h

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 249f38d3388f..67d0ac0a989c 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -5,10 +5,13 @@

config NDS32
def_bool y
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_WANT_FRAME_POINTERS if FTRACE
select CLKSRC_MMIO
select CLONE_BACKWARDS
select COMMON_CLK
+ select DMA_NONCOHERENT_OPS
select GENERIC_ATOMIC64
select GENERIC_CPU_DEVICES
select GENERIC_CLOCKEVENTS
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index 06bdf8167f5a..b3e951f805f8 100644
--- a/arch/nds32/include/asm/Kbuild
+++ b/arch/nds32/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += cputime.h
generic-y += device.h
generic-y += div64.h
generic-y += dma.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += errno.h
generic-y += exec.h
diff --git a/arch/nds32/include/asm/dma-mapping.h b/arch/nds32/include/asm/dma-mapping.h
deleted file mode 100644
index 2dd47d245c25..000000000000
--- a/arch/nds32/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,14 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (C) 2005-2017 Andes Technology Corporation
-
-#ifndef ASMNDS32_DMA_MAPPING_H
-#define ASMNDS32_DMA_MAPPING_H
-
-extern struct dma_map_ops nds32_dma_ops;
-
-static inline struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &nds32_dma_ops;
-}
-
-#endif
diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
index d291800fc621..78311a1e6fd1 100644
--- a/arch/nds32/kernel/dma.c
+++ b/arch/nds32/kernel/dma.c
@@ -3,17 +3,14 @@

#include <linux/types.h>
#include <linux/mm.h>
-#include <linux/export.h>
#include <linux/string.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/io.h>
#include <linux/cache.h>
#include <linux/highmem.h>
#include <linux/slab.h>
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
-#include <asm/dma-mapping.h>
#include <asm/proc-fns.h>

/*
@@ -22,11 +19,6 @@
static pte_t *consistent_pte;
static DEFINE_RAW_SPINLOCK(consistent_lock);

-enum master_type {
- FOR_CPU = 0,
- FOR_DEVICE = 1,
-};
-
/*
* VM region handling support.
*
@@ -124,10 +116,8 @@ static struct arch_vm_region *vm_region_find(struct arch_vm_region *head,
return c;
}

-/* FIXME: attrs is not used. */
-static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t * handle, gfp_t gfp,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+ gfp_t gfp, unsigned long attrs)
{
struct page *page;
struct arch_vm_region *c;
@@ -232,8 +222,8 @@ static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
return NULL;
}

-static void nds32_dma_free(struct device *dev, size_t size, void *cpu_addr,
- dma_addr_t handle, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t handle, unsigned long attrs)
{
struct arch_vm_region *c;
unsigned long flags, addr;
@@ -333,145 +323,39 @@ static int __init consistent_init(void)
}

core_initcall(consistent_init);
-static void consistent_sync(void *vaddr, size_t size, int direction, int master_type);
-static dma_addr_t nds32_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- consistent_sync((void *)(page_address(page) + offset), size, dir, FOR_DEVICE);
- return page_to_phys(page) + offset;
-}

-static void nds32_dma_unmap_page(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- consistent_sync(phys_to_virt(handle), size, dir, FOR_CPU);
-}
-
-/*
- * Make an area consistent for devices.
- */
-static void consistent_sync(void *vaddr, size_t size, int direction, int master_type)
-{
- unsigned long start = (unsigned long)vaddr;
- unsigned long end = start + size;
-
- if (master_type == FOR_CPU) {
- switch (direction) {
- case DMA_TO_DEVICE:
- break;
- case DMA_FROM_DEVICE:
- case DMA_BIDIRECTIONAL:
- cpu_dma_inval_range(start, end);
- break;
- default:
- BUG();
- }
- } else {
- /* FOR_DEVICE */
- switch (direction) {
- case DMA_FROM_DEVICE:
- break;
- case DMA_TO_DEVICE:
- case DMA_BIDIRECTIONAL:
- cpu_dma_wb_range(start, end);
- break;
- default:
- BUG();
- }
+ void *addr = phys_to_virt(paddr);
+ unsigned long start = (unsigned long)addr;
+
+ switch (dir) {
+ case DMA_FROM_DEVICE:
+ break;
+ case DMA_TO_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ cpu_dma_wb_range(start, start + size);
+ break;
+ default:
+ BUG();
}
}

-static int nds32_dma_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
{
- int i;
-
- for (i = 0; i < nents; i++, sg++) {
- void *virt;
- unsigned long pfn;
- struct page *page = sg_page(sg);
-
- sg->dma_address = sg_phys(sg);
- pfn = page_to_pfn(page) + sg->offset / PAGE_SIZE;
- page = pfn_to_page(pfn);
- if (PageHighMem(page)) {
- virt = kmap_atomic(page);
- consistent_sync(virt, sg->length, dir, FOR_CPU);
- kunmap_atomic(virt);
- } else {
- if (sg->offset > PAGE_SIZE)
- panic("sg->offset:%08x > PAGE_SIZE\n",
- sg->offset);
- virt = page_address(page) + sg->offset;
- consistent_sync(virt, sg->length, dir, FOR_CPU);
- }
+ void *addr = phys_to_virt(paddr);
+ unsigned long start = (unsigned long)addr;
+
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ break;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ cpu_dma_inval_range(start, start + size);
+ break;
+ default:
+ BUG();
}
- return nents;
}
-
-static void nds32_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nhwentries, enum dma_data_direction dir,
- unsigned long attrs)
-{
-}
-
-static void
-nds32_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir)
-{
- consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_CPU);
-}
-
-static void
-nds32_dma_sync_single_for_device(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir)
-{
- consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_DEVICE);
-}
-
-static void
-nds32_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents,
- enum dma_data_direction dir)
-{
- int i;
-
- for (i = 0; i < nents; i++, sg++) {
- char *virt =
- page_address((struct page *)sg->page_link) + sg->offset;
- consistent_sync(virt, sg->length, dir, FOR_CPU);
- }
-}
-
-static void
-nds32_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir)
-{
- int i;
-
- for (i = 0; i < nents; i++, sg++) {
- char *virt =
- page_address((struct page *)sg->page_link) + sg->offset;
- consistent_sync(virt, sg->length, dir, FOR_DEVICE);
- }
-}
-
-struct dma_map_ops nds32_dma_ops = {
- .alloc = nds32_dma_alloc_coherent,
- .free = nds32_dma_free,
- .map_page = nds32_dma_map_page,
- .unmap_page = nds32_dma_unmap_page,
- .map_sg = nds32_dma_map_sg,
- .unmap_sg = nds32_dma_unmap_sg,
- .sync_single_for_device = nds32_dma_sync_single_for_device,
- .sync_single_for_cpu = nds32_dma_sync_single_for_cpu,
- .sync_sg_for_cpu = nds32_dma_sync_sg_for_cpu,
- .sync_sg_for_device = nds32_dma_sync_sg_for_device,
-};
-
-EXPORT_SYMBOL(nds32_dma_ops);
--
2.17.0


2018-04-25 07:21:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

On Sat, Apr 21, 2018 at 07:43:46PM +0200, Helge Deller wrote:
> This patch breaks a 32bit kernel on a B160L machine (PA7300LC CPU, "pcxl2").
> After applying this patch series the lasi82956 network driver works unreliable.
> NIC gets IP, but ping doesn't work.
> See drivers/net/ethernet/i825xx/lasi_82596.c, it uses dma*sync() functions.

Just to confirm: Without the series it is known to actually work?

2018-04-25 11:28:26

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [PATCH 06/22] arc: use generic dma_noncoherent_ops

Hi Christoph,

On Fri, 2018-04-20 at 10:02 +0200, Christoph Hellwig wrote:
> Switch to the generic noncoherent direct mapping implementation.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> arch/arc/Kconfig | 4 +
> arch/arc/include/asm/Kbuild | 1 +
> arch/arc/include/asm/dma-mapping.h | 21 -----
> arch/arc/mm/dma.c | 141 +++--------------------------
> 4 files changed, 19 insertions(+), 148 deletions(-)
> delete mode 100644 arch/arc/include/asm/dma-mapping.h
>

[snip]

> @@ -135,7 +134,7 @@ static int arc_dma_mmap(struct device *dev, struct vm_area_struct *vma,
> * CPU accesses page via normal paddr, thus needs to explicitly made
> * consistent before each use
> */
> -static void _dma_cache_sync(phys_addr_t paddr, size_t size,
> +static void _dma_cache_sync(struct device *dev, phys_addr_t paddr size_t size,
> enum dma_data_direction dir)

Seems like there's a missing comma:
----------------------------------->8------------------------------------
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -134,7 +134,7 @@ int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
* CPU accesses page via normal paddr, thus needs to explicitly made
* consistent before each use
*/
-static void _dma_cache_sync(struct device *dev, phys_addr_t paddr size_t size,
+static void _dma_cache_sync(struct device *dev, phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
{
switch (dir) {
----------------------------------->8------------------------------------

Which is actually strange as I would expect ARC code to be built by bots.

Anyways with above fix I do see problems with both USB and Ethernet controllers
on ARC HSDK board.
----------------------------------->8------------------------------------
usb 1-1: new high-speed USB device number 2 using ehci-platform
usb 1-1: device descriptor read/64, error -32
usb 1-1: device descriptor read/64, error -32
usb 1-1: new high-speed USB device number 3 using ehci-platform
usb 1-1: device descriptor read/64, error -32
usb 1-1: device descriptor read/64, error -32
usb usb1-port1: attempt power cycle
usb 1-1: new high-speed USB device number 4 using ehci-platform
usb 1-1: device not accepting address 4, error -32
usb 1-1: new high-speed USB device number 5 using ehci-platform
usb 1-1: device not accepting address 5, error -32

...

# wget ftp://ftp.denx.de/pub/u-boot/u-boot-1.0.0.tar.bz2
Connecting to ftp.denx.de (81.169.202.6:21)
wget: can't connect to remote host (81.169.202.6): No route to host
----------------------------------->8------------------------------------

Will all patches from the series reverted (i.e. with
your base-line) all issues go away.

I'll need to spend more time on checking what's actually wrong.

-Alexey

P.S. Note to my ARC colleagues - it's required to disable IO Coherency
to get DMA ops really used, it could be obviously done with:
----------------------------------->8------------------------------------
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -27,7 +27,7 @@

static int l2_line_sz;
static int ioc_exists;
-int slc_enable = 1, ioc_enable = 1;
+int slc_enable = 1, ioc_enable = 0;
unsigned long perip_base = ARC_UNCACHED_ADDR_SPACE; /* legacy value for boot */
unsigned long perip_end = 0xFFFFFFFF; /* legacy value */
----------------------------------->8------------------------------------

2018-04-25 12:28:16

by Greentime Hu

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

2018-04-25 14:40 GMT+08:00 Christoph Hellwig <[email protected]>:
> On Wed, Apr 25, 2018 at 09:43:43AM +0800, Greentime Hu wrote:
>> Hi Crhistoph,
>>
>> The ftmac100 works if I revert this commit.
>
> Thanks. ftmac100 only use dma_map_page, which in the old nds32 code
> is just doing a plain page_address and never kmaps. Can you apply
> the patch below on the tree with the origin "nds32: use generic
> dma_noncoherent_ops" reverted? This always just uses page_address,
> although that, just like the original code is broken if you actually
> have highmem that needs to be mapped:
>

Hi, Christoph,

It still failed.

> ---
> From 1dc5d1cae4cd7b9ce03d0e2943364ed4cca938d7 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <[email protected]>
> Date: Mon, 16 Apr 2018 19:20:30 +0200
> Subject: nds32: use generic dma_noncoherent_ops
>
> Switch to the generic noncoherent direct mapping implementation.
>
> This makes sure the cache_sync routines is called in the unmap_sg
> case, to match the unmap_single and sync_{single,sg}_to_cpu cases.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> arch/nds32/Kconfig | 3 +
> arch/nds32/include/asm/Kbuild | 1 +
> arch/nds32/include/asm/dma-mapping.h | 14 ---
> arch/nds32/kernel/dma.c | 182 +++++----------------------
> 4 files changed, 37 insertions(+), 163 deletions(-)
> delete mode 100644 arch/nds32/include/asm/dma-mapping.h
>
> diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
> index 249f38d3388f..67d0ac0a989c 100644
> --- a/arch/nds32/Kconfig
> +++ b/arch/nds32/Kconfig
> @@ -5,10 +5,13 @@
>
> config NDS32
> def_bool y
> + select ARCH_HAS_SYNC_DMA_FOR_CPU
> + select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> select ARCH_WANT_FRAME_POINTERS if FTRACE
> select CLKSRC_MMIO
> select CLONE_BACKWARDS
> select COMMON_CLK
> + select DMA_NONCOHERENT_OPS
> select GENERIC_ATOMIC64
> select GENERIC_CPU_DEVICES
> select GENERIC_CLOCKEVENTS
> diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
> index 06bdf8167f5a..b3e951f805f8 100644
> --- a/arch/nds32/include/asm/Kbuild
> +++ b/arch/nds32/include/asm/Kbuild
> @@ -13,6 +13,7 @@ generic-y += cputime.h
> generic-y += device.h
> generic-y += div64.h
> generic-y += dma.h
> +generic-y += dma-mapping.h
> generic-y += emergency-restart.h
> generic-y += errno.h
> generic-y += exec.h
> diff --git a/arch/nds32/include/asm/dma-mapping.h b/arch/nds32/include/asm/dma-mapping.h
> deleted file mode 100644
> index 2dd47d245c25..000000000000
> --- a/arch/nds32/include/asm/dma-mapping.h
> +++ /dev/null
> @@ -1,14 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -// Copyright (C) 2005-2017 Andes Technology Corporation
> -
> -#ifndef ASMNDS32_DMA_MAPPING_H
> -#define ASMNDS32_DMA_MAPPING_H
> -
> -extern struct dma_map_ops nds32_dma_ops;
> -
> -static inline struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
> -{
> - return &nds32_dma_ops;
> -}
> -
> -#endif
> diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
> index d291800fc621..78311a1e6fd1 100644
> --- a/arch/nds32/kernel/dma.c
> +++ b/arch/nds32/kernel/dma.c
> @@ -3,17 +3,14 @@
>
> #include <linux/types.h>
> #include <linux/mm.h>
> -#include <linux/export.h>
> #include <linux/string.h>
> -#include <linux/scatterlist.h>
> -#include <linux/dma-mapping.h>
> +#include <linux/dma-noncoherent.h>
> #include <linux/io.h>
> #include <linux/cache.h>
> #include <linux/highmem.h>
> #include <linux/slab.h>
> #include <asm/cacheflush.h>
> #include <asm/tlbflush.h>
> -#include <asm/dma-mapping.h>
> #include <asm/proc-fns.h>
>
> /*
> @@ -22,11 +19,6 @@
> static pte_t *consistent_pte;
> static DEFINE_RAW_SPINLOCK(consistent_lock);
>
> -enum master_type {
> - FOR_CPU = 0,
> - FOR_DEVICE = 1,
> -};
> -
> /*
> * VM region handling support.
> *
> @@ -124,10 +116,8 @@ static struct arch_vm_region *vm_region_find(struct arch_vm_region *head,
> return c;
> }
>
> -/* FIXME: attrs is not used. */
> -static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
> - dma_addr_t * handle, gfp_t gfp,
> - unsigned long attrs)
> +void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> + gfp_t gfp, unsigned long attrs)
> {
> struct page *page;
> struct arch_vm_region *c;
> @@ -232,8 +222,8 @@ static void *nds32_dma_alloc_coherent(struct device *dev, size_t size,
> return NULL;
> }
>
> -static void nds32_dma_free(struct device *dev, size_t size, void *cpu_addr,
> - dma_addr_t handle, unsigned long attrs)
> +void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
> + dma_addr_t handle, unsigned long attrs)
> {
> struct arch_vm_region *c;
> unsigned long flags, addr;
> @@ -333,145 +323,39 @@ static int __init consistent_init(void)
> }
>
> core_initcall(consistent_init);
> -static void consistent_sync(void *vaddr, size_t size, int direction, int master_type);
> -static dma_addr_t nds32_dma_map_page(struct device *dev, struct page *page,
> - unsigned long offset, size_t size,
> - enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> - consistent_sync((void *)(page_address(page) + offset), size, dir, FOR_DEVICE);
> - return page_to_phys(page) + offset;
> -}
>
> -static void nds32_dma_unmap_page(struct device *dev, dma_addr_t handle,
> - size_t size, enum dma_data_direction dir,
> - unsigned long attrs)
> +void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> + size_t size, enum dma_data_direction dir)
> {
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> - consistent_sync(phys_to_virt(handle), size, dir, FOR_CPU);
> -}
> -
> -/*
> - * Make an area consistent for devices.
> - */
> -static void consistent_sync(void *vaddr, size_t size, int direction, int master_type)
> -{
> - unsigned long start = (unsigned long)vaddr;
> - unsigned long end = start + size;
> -
> - if (master_type == FOR_CPU) {
> - switch (direction) {
> - case DMA_TO_DEVICE:
> - break;
> - case DMA_FROM_DEVICE:
> - case DMA_BIDIRECTIONAL:
> - cpu_dma_inval_range(start, end);
> - break;
> - default:
> - BUG();
> - }
> - } else {
> - /* FOR_DEVICE */
> - switch (direction) {
> - case DMA_FROM_DEVICE:
> - break;
> - case DMA_TO_DEVICE:
> - case DMA_BIDIRECTIONAL:
> - cpu_dma_wb_range(start, end);
> - break;
> - default:
> - BUG();
> - }
> + void *addr = phys_to_virt(paddr);
> + unsigned long start = (unsigned long)addr;
> +
> + switch (dir) {
> + case DMA_FROM_DEVICE:
> + break;
> + case DMA_TO_DEVICE:
> + case DMA_BIDIRECTIONAL:
> + cpu_dma_wb_range(start, start + size);
> + break;
> + default:
> + BUG();
> }
> }
>
> -static int nds32_dma_map_sg(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir,
> - unsigned long attrs)
> +void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> + size_t size, enum dma_data_direction dir)
> {
> - int i;
> -
> - for (i = 0; i < nents; i++, sg++) {
> - void *virt;
> - unsigned long pfn;
> - struct page *page = sg_page(sg);
> -
> - sg->dma_address = sg_phys(sg);
> - pfn = page_to_pfn(page) + sg->offset / PAGE_SIZE;
> - page = pfn_to_page(pfn);
> - if (PageHighMem(page)) {
> - virt = kmap_atomic(page);
> - consistent_sync(virt, sg->length, dir, FOR_CPU);
> - kunmap_atomic(virt);
> - } else {
> - if (sg->offset > PAGE_SIZE)
> - panic("sg->offset:%08x > PAGE_SIZE\n",
> - sg->offset);
> - virt = page_address(page) + sg->offset;
> - consistent_sync(virt, sg->length, dir, FOR_CPU);
> - }
> + void *addr = phys_to_virt(paddr);
> + unsigned long start = (unsigned long)addr;
> +
> + switch (dir) {
> + case DMA_TO_DEVICE:
> + break;
> + case DMA_FROM_DEVICE:
> + case DMA_BIDIRECTIONAL:
> + cpu_dma_inval_range(start, start + size);
> + break;
> + default:
> + BUG();
> }
> - return nents;
> }
> -
> -static void nds32_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
> - int nhwentries, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> -}
> -
> -static void
> -nds32_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
> - size_t size, enum dma_data_direction dir)
> -{
> - consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_CPU);
> -}
> -
> -static void
> -nds32_dma_sync_single_for_device(struct device *dev, dma_addr_t handle,
> - size_t size, enum dma_data_direction dir)
> -{
> - consistent_sync((void *)phys_to_virt(handle), size, dir, FOR_DEVICE);
> -}
> -
> -static void
> -nds32_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents,
> - enum dma_data_direction dir)
> -{
> - int i;
> -
> - for (i = 0; i < nents; i++, sg++) {
> - char *virt =
> - page_address((struct page *)sg->page_link) + sg->offset;
> - consistent_sync(virt, sg->length, dir, FOR_CPU);
> - }
> -}
> -
> -static void
> -nds32_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir)
> -{
> - int i;
> -
> - for (i = 0; i < nents; i++, sg++) {
> - char *virt =
> - page_address((struct page *)sg->page_link) + sg->offset;
> - consistent_sync(virt, sg->length, dir, FOR_DEVICE);
> - }
> -}
> -
> -struct dma_map_ops nds32_dma_ops = {
> - .alloc = nds32_dma_alloc_coherent,
> - .free = nds32_dma_free,
> - .map_page = nds32_dma_map_page,
> - .unmap_page = nds32_dma_unmap_page,
> - .map_sg = nds32_dma_map_sg,
> - .unmap_sg = nds32_dma_unmap_sg,
> - .sync_single_for_device = nds32_dma_sync_single_for_device,
> - .sync_single_for_cpu = nds32_dma_sync_single_for_cpu,
> - .sync_sg_for_cpu = nds32_dma_sync_sg_for_cpu,
> - .sync_sg_for_device = nds32_dma_sync_sg_for_device,
> -};
> -
> -EXPORT_SYMBOL(nds32_dma_ops);
> --
> 2.17.0
>

2018-04-25 21:09:38

by Helge Deller

[permalink] [raw]
Subject: Re: [PATCH 22/22] parisc: use generic dma_noncoherent_ops

On 25.04.2018 09:21, Christoph Hellwig wrote:
> On Sat, Apr 21, 2018 at 07:43:46PM +0200, Helge Deller wrote:
>> This patch breaks a 32bit kernel on a B160L machine (PA7300LC CPU, "pcxl2").
>> After applying this patch series the lasi82956 network driver works unreliable.
>> NIC gets IP, but ping doesn't work.
>> See drivers/net/ethernet/i825xx/lasi_82596.c, it uses dma*sync() functions.
>
> Just to confirm: Without the series it is known to actually work?

Yes.
I reverted this series from my tree, rebuilt, and then the issue was gone.
I won't be able to test again earlier than next week.

Helge

2018-04-26 06:42:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

Can you try this patch ontop of either the new or original one?

---
diff --git a/lib/dma-noncoherent.c b/lib/dma-noncoherent.c
index f4b8532c20ac..a2c192b3508d 100644
--- a/lib/dma-noncoherent.c
+++ b/lib/dma-noncoherent.c
@@ -48,7 +48,7 @@ static int dma_noncoherent_map_sg(struct device *dev, struct scatterlist *sgl,
return nents;
}

-#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
static void dma_noncoherent_sync_single_for_cpu(struct device *dev,
dma_addr_t addr, size_t size, enum dma_data_direction dir)
{
@@ -88,7 +88,7 @@ const struct dma_map_ops dma_noncoherent_ops = {
.sync_sg_for_device = dma_noncoherent_sync_sg_for_device,
.map_page = dma_noncoherent_map_page,
.map_sg = dma_noncoherent_map_sg,
-#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
.sync_single_for_cpu = dma_noncoherent_sync_single_for_cpu,
.sync_sg_for_cpu = dma_noncoherent_sync_sg_for_cpu,
.unmap_page = dma_noncoherent_unmap_page,

2018-04-26 06:44:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 06/22] arc: use generic dma_noncoherent_ops

On Wed, Apr 25, 2018 at 11:17:01AM +0000, Alexey Brodkin wrote:
> Which is actually strange as I would expect ARC code to be built by bots.

I don't think I got any notification. Thank for the fixes!

I think I found the bug, based on the fact that so far all tests for
architectures that also need a cache op for device to cpu transitions
failed. I did a stupid typo when changing kconfig symbols, so please
try the patch below.

>
> static int l2_line_sz;
> static int ioc_exists;
> -int slc_enable = 1, ioc_enable = 1;
> +int slc_enable = 1, ioc_enable = 0;

Hmm. It seems if ioc_enable is 0 we should simply be using
dma_direct_ops on arc, but that is a different discussion.

---
diff --git a/lib/dma-noncoherent.c b/lib/dma-noncoherent.c
index f4b8532c20ac..a2c192b3508d 100644
--- a/lib/dma-noncoherent.c
+++ b/lib/dma-noncoherent.c
@@ -48,7 +48,7 @@ static int dma_noncoherent_map_sg(struct device *dev, struct scatterlist *sgl,
return nents;
}

-#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
static void dma_noncoherent_sync_single_for_cpu(struct device *dev,
dma_addr_t addr, size_t size, enum dma_data_direction dir)
{
@@ -88,7 +88,7 @@ const struct dma_map_ops dma_noncoherent_ops = {
.sync_sg_for_device = dma_noncoherent_sync_sg_for_device,
.map_page = dma_noncoherent_map_page,
.map_sg = dma_noncoherent_map_sg,
-#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
.sync_single_for_cpu = dma_noncoherent_sync_single_for_cpu,
.sync_sg_for_cpu = dma_noncoherent_sync_sg_for_cpu,
.unmap_page = dma_noncoherent_unmap_page,

2018-04-26 08:08:56

by Greentime Hu

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

2018-04-26 14:42 GMT+08:00 Christoph Hellwig <[email protected]>:
> Can you try this patch ontop of either the new or original one?
>
> ---
> diff --git a/lib/dma-noncoherent.c b/lib/dma-noncoherent.c
> index f4b8532c20ac..a2c192b3508d 100644
> --- a/lib/dma-noncoherent.c
> +++ b/lib/dma-noncoherent.c
> @@ -48,7 +48,7 @@ static int dma_noncoherent_map_sg(struct device *dev, struct scatterlist *sgl,
> return nents;
> }
>
> -#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
> +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
> static void dma_noncoherent_sync_single_for_cpu(struct device *dev,
> dma_addr_t addr, size_t size, enum dma_data_direction dir)
> {
> @@ -88,7 +88,7 @@ const struct dma_map_ops dma_noncoherent_ops = {
> .sync_sg_for_device = dma_noncoherent_sync_sg_for_device,
> .map_page = dma_noncoherent_map_page,
> .map_sg = dma_noncoherent_map_sg,
> -#ifdef CONFIG_DMA_NONCOHERENT_SYNC_FOR_CPU
> +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
> .sync_single_for_cpu = dma_noncoherent_sync_single_for_cpu,
> .sync_sg_for_cpu = dma_noncoherent_sync_sg_for_cpu,
> .unmap_page = dma_noncoherent_unmap_page,

It works!!!

2018-04-26 08:24:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

On Thu, Apr 26, 2018 at 04:06:34PM +0800, Greentime Hu wrote:
> It works!!!

Thanks!

Can you retest the updated tree here with all the fixes and give me
your Tested-by: for the generic and nds32 patches?

git://git.infradead.org/users/hch/misc.git generic-dma-noncoherent

2018-04-26 08:25:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 06/22] arc: use generic dma_noncoherent_ops

On Thu, Apr 26, 2018 at 08:45:00AM +0200, [email protected] wrote:
> On Wed, Apr 25, 2018 at 11:17:01AM +0000, Alexey Brodkin wrote:
> > Which is actually strange as I would expect ARC code to be built by bots.
>
> I don't think I got any notification. Thank for the fixes!
>
> I think I found the bug, based on the fact that so far all tests for
> architectures that also need a cache op for device to cpu transitions
> failed. I did a stupid typo when changing kconfig symbols, so please
> try the patch below.

Confirmed to work for nds32, so here is a git tree with the core, arc
and nds32 fixes folded in, feel free to test that one:

git://git.infradead.org/users/hch/misc.git generic-dma-noncoherent

2018-04-26 09:41:38

by Greentime Hu

[permalink] [raw]
Subject: Re: [PATCH 13/22] nds32: use generic dma_noncoherent_ops

2018-04-26 16:24 GMT+08:00 Christoph Hellwig <[email protected]>:
> On Thu, Apr 26, 2018 at 04:06:34PM +0800, Greentime Hu wrote:
>> It works!!!
>
> Thanks!
>
> Can you retest the updated tree here with all the fixes and give me
> your Tested-by: for the generic and nds32 patches?
>
> git://git.infradead.org/users/hch/misc.git generic-dma-noncoherent

Sorry Christoph. I found the previous mail I said it works was wrong
because I used a wrong vmlinux.
It still failed. >,<