2018-10-09 13:30:46

by Christoph Hellwig

[permalink] [raw]
Subject: use generic DMA mapping code in powerpc V3

Hi all,

this series switches the powerpc port to use the generic swiotlb and
noncoherent dma ops, and to use more generic code for the coherent
direct mapping, as well as removing a lot of dead code.

The changes since v1 are to big to list and v2 was not posted in public.

As this series is very large and depends on the dma-mapping tree I've
also published a git tree:

git://git.infradead.org/users/hch/misc.git powerpc-dma.3

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.3


2018-10-09 13:26:10

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 01/33] powerpc: use mm zones more sensibly

Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.

Move to a scheme closer to what other architectures use (and I dare to
say the intent of the system):

- ZONE_DMA: optionally for memory < 31-bit
- ZONE_NORMAL: everything addressable by the kernel
- ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels

Also provide information on how ZONE_DMA is used by defining
ARCH_ZONE_DMA_BITS.

Contains various fixes from Benjamin Herrenschmidt.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/Kconfig | 6 +--
arch/powerpc/include/asm/page.h | 2 +
arch/powerpc/include/asm/pgtable.h | 1 -
arch/powerpc/kernel/dma-swiotlb.c | 6 +--
arch/powerpc/kernel/dma.c | 7 +--
arch/powerpc/mm/mem.c | 50 +++++++------------
arch/powerpc/platforms/85xx/corenet_generic.c | 10 ----
arch/powerpc/platforms/85xx/qemu_e500.c | 9 ----
include/linux/mmzone.h | 2 +-
9 files changed, 24 insertions(+), 69 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a80669209155..06996df07cad 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -380,7 +380,7 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y

-config ZONE_DMA32
+config ZONE_DMA
bool
default y if PPC64

@@ -879,10 +879,6 @@ config ISA
have an IBM RS/6000 or pSeries machine, say Y. If you have an
embedded board, consult your board documentation.

-config ZONE_DMA
- bool
- default y
-
config GENERIC_ISA_DMA
bool
depends on ISA_DMA_API
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f6a1265face2..fc8c9ac0c6be 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -354,4 +354,6 @@ typedef struct page *pgtable_t;
#endif /* __ASSEMBLY__ */
#include <asm/slice.h>

+#define ARCH_ZONE_DMA_BITS 31
+
#endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 14c79a7dc855..9bafb38e959e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -37,7 +37,6 @@ extern unsigned long empty_zero_page[];

extern pgd_t swapper_pg_dir[];

-void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn);
int dma_pfn_limit_to_zone(u64 pfn_limit);
extern void paging_init(void);

diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index 88f3963ca30f..93a4622563c6 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -108,12 +108,8 @@ int __init swiotlb_setup_bus_notifier(void)

void __init swiotlb_detect_4g(void)
{
- if ((memblock_end_of_DRAM() - 1) > 0xffffffff) {
+ if ((memblock_end_of_DRAM() - 1) > 0xffffffff)
ppc_swiotlb_enable = 1;
-#ifdef CONFIG_ZONE_DMA32
- limit_zone_pfn(ZONE_DMA32, (1ULL << 32) >> PAGE_SHIFT);
-#endif
- }
}

static int __init check_swiotlb_enabled(void)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index dbfc7056d7df..6551685a4ed0 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -50,7 +50,7 @@ static int dma_nommu_dma_supported(struct device *dev, u64 mask)
return 1;

#ifdef CONFIG_FSL_SOC
- /* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however
+ /* Freescale gets another chance via ZONE_DMA, however
* that will have to be refined if/when they support iommus
*/
return 1;
@@ -94,13 +94,10 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
}

switch (zone) {
+#ifdef CONFIG_ZONE_DMA
case ZONE_DMA:
flag |= GFP_DMA;
break;
-#ifdef CONFIG_ZONE_DMA32
- case ZONE_DMA32:
- flag |= GFP_DMA32;
- break;
#endif
};
#endif /* CONFIG_FSL_SOC */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 5c8530d0c611..8bff7e893bde 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -69,15 +69,12 @@ pte_t *kmap_pte;
EXPORT_SYMBOL(kmap_pte);
pgprot_t kmap_prot;
EXPORT_SYMBOL(kmap_prot);
-#define TOP_ZONE ZONE_HIGHMEM

static inline pte_t *virt_to_kpte(unsigned long vaddr)
{
return pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(vaddr),
vaddr), vaddr), vaddr);
}
-#else
-#define TOP_ZONE ZONE_NORMAL
#endif

int page_is_ram(unsigned long pfn)
@@ -246,35 +243,19 @@ static int __init mark_nonram_nosave(void)
}
#endif

-static bool zone_limits_final;
-
-/*
- * The memory zones past TOP_ZONE are managed by generic mm code.
- * These should be set to zero since that's what every other
- * architecture does.
- */
-static unsigned long max_zone_pfns[MAX_NR_ZONES] = {
- [0 ... TOP_ZONE ] = ~0UL,
- [TOP_ZONE + 1 ... MAX_NR_ZONES - 1] = 0
-};
-
/*
- * Restrict the specified zone and all more restrictive zones
- * to be below the specified pfn. May not be called after
- * paging_init().
+ * Zones usage:
+ *
+ * We setup ZONE_DMA to be 31-bits on all platforms and ZONE_NORMAL to be
+ * everything else. GFP_DMA32 page allocations automatically fall back to
+ * ZONE_DMA.
+ *
+ * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
+ * inform the generic DMA mapping code. 32-bit only devices (if not handled
+ * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
+ * otherwise served by ZONE_DMA.
*/
-void __init limit_zone_pfn(enum zone_type zone, unsigned long pfn_limit)
-{
- int i;
-
- if (WARN_ON(zone_limits_final))
- return;
-
- for (i = zone; i >= 0; i--) {
- if (max_zone_pfns[i] > pfn_limit)
- max_zone_pfns[i] = pfn_limit;
- }
-}
+static unsigned long max_zone_pfns[MAX_NR_ZONES];

/*
* Find the least restrictive zone that is entirely below the
@@ -324,11 +305,14 @@ void __init paging_init(void)
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
(long int)((top_of_ram - total_ram) >> 20));

+#ifdef CONFIG_ZONE_DMA
+ max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffffffUL >> PAGE_SHIFT);
+#endif
+ max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
- limit_zone_pfn(ZONE_NORMAL, lowmem_end_addr >> PAGE_SHIFT);
+ max_zone_pfns[ZONE_HIGHMEM] = max_pfn
#endif
- limit_zone_pfn(TOP_ZONE, top_of_ram >> PAGE_SHIFT);
- zone_limits_final = true;
+
free_area_init_nodes(max_zone_pfns);

mark_nonram_nosave();
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c
index ac191a7a1337..b0dac307bebf 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -68,16 +68,6 @@ void __init corenet_gen_setup_arch(void)

swiotlb_detect_4g();

-#if defined(CONFIG_FSL_PCI) && defined(CONFIG_ZONE_DMA32)
- /*
- * Inbound windows don't cover the full lower 4 GiB
- * due to conflicts with PCICSRBAR and outbound windows,
- * so limit the DMA32 zone to 2 GiB, to allow consistent
- * allocations to succeed.
- */
- limit_zone_pfn(ZONE_DMA32, 1UL << (31 - PAGE_SHIFT));
-#endif
-
pr_info("%s board\n", ppc_md.name);

mpc85xx_qe_init();
diff --git a/arch/powerpc/platforms/85xx/qemu_e500.c b/arch/powerpc/platforms/85xx/qemu_e500.c
index b63a8548366f..27631c607f3d 100644
--- a/arch/powerpc/platforms/85xx/qemu_e500.c
+++ b/arch/powerpc/platforms/85xx/qemu_e500.c
@@ -45,15 +45,6 @@ static void __init qemu_e500_setup_arch(void)

fsl_pci_assign_primary();
swiotlb_detect_4g();
-#if defined(CONFIG_FSL_PCI) && defined(CONFIG_ZONE_DMA32)
- /*
- * Inbound windows don't cover the full lower 4 GiB
- * due to conflicts with PCICSRBAR and outbound windows,
- * so limit the DMA32 zone to 2 GiB, to allow consistent
- * allocations to succeed.
- */
- limit_zone_pfn(ZONE_DMA32, 1UL << (31 - PAGE_SHIFT));
-#endif
mpc85xx_smp_init();
}

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1e22d96734e0..68970340df1c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -312,7 +312,7 @@ enum zone_type {
* Architecture Limit
* ---------------------------
* parisc, ia64, sparc <4G
- * s390 <2G
+ * s390, powerpc <2G
* arm Various
* alpha Unlimited or 0-16MB.
*
--
2.19.0


2018-10-09 13:26:16

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/33] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/kernel/setup_32.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 8c507be12c3c..3cc2e449594c 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -59,7 +59,6 @@ unsigned long ISA_DMA_THRESHOLD;
unsigned int DMA_MODE_READ;
unsigned int DMA_MODE_WRITE;

-EXPORT_SYMBOL(ISA_DMA_THRESHOLD);
EXPORT_SYMBOL(DMA_MODE_READ);
EXPORT_SYMBOL(DMA_MODE_WRITE);

--
2.19.0


2018-10-09 13:26:22

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/33] powerpc/dma: split the two __dma_alloc_coherent implementations

The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
any code with the one for systems with coherent caches. Split it off
and merge it with the helpers in dma-noncoherent.c that have no other
callers.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/include/asm/dma-mapping.h | 5 -----
arch/powerpc/kernel/dma.c | 14 ++------------
arch/powerpc/mm/dma-noncoherent.c | 15 +++++++--------
arch/powerpc/platforms/44x/warp.c | 2 +-
4 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index f2a4a7142b1e..dacd0f93f2b2 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
* to ensure it is consistent.
*/
struct device;
-extern void *__dma_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *handle, gfp_t gfp);
-extern void __dma_free_coherent(size_t size, void *vaddr);
extern void __dma_sync(void *vaddr, size_t size, int direction);
extern void __dma_sync_page(struct page *page, unsigned long offset,
size_t size, int direction);
@@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
* Cache coherent cores.
*/

-#define __dma_alloc_coherent(dev, gfp, size, handle) NULL
-#define __dma_free_coherent(size, addr) ((void)0)
#define __dma_sync(addr, size, rw) ((void)0)
#define __dma_sync_page(pg, off, sz, rw) ((void)0)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 6551685a4ed0..d6deb458bb91 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, u64 mask)
#endif
}

+#ifndef CONFIG_NOT_COHERENT_CACHE
void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag,
unsigned long attrs)
{
void *ret;
-#ifdef CONFIG_NOT_COHERENT_CACHE
- ret = __dma_alloc_coherent(dev, size, dma_handle, flag);
- if (ret == NULL)
- return NULL;
- *dma_handle += get_dma_offset(dev);
- return ret;
-#else
struct page *page;
int node = dev_to_node(dev);
#ifdef CONFIG_FSL_SOC
@@ -110,19 +104,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
*dma_handle = __pa(ret) + get_dma_offset(dev);

return ret;
-#endif
}

void __dma_nommu_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
{
-#ifdef CONFIG_NOT_COHERENT_CACHE
- __dma_free_coherent(size, vaddr);
-#else
free_pages((unsigned long)vaddr, get_order(size));
-#endif
}
+#endif /* !CONFIG_NOT_COHERENT_CACHE */

static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag,
diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index 382528475433..965ce3d19f5a 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -29,7 +29,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/highmem.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
#include <linux/export.h>

#include <asm/tlbflush.h>
@@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct ppc_vm_region *head, unsi
* Allocate DMA-coherent memory space and return both the kernel remapped
* virtual and bus address for that space.
*/
-void *
-__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
struct page *page;
struct ppc_vm_region *c;
@@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t
/*
* Set the "dma handle"
*/
- *handle = page_to_phys(page);
+ *dma_handle = phys_to_dma(dev, page_to_phys(page));

do {
SetPageReserved(page);
@@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t
no_page:
return NULL;
}
-EXPORT_SYMBOL(__dma_alloc_coherent);

/*
* free a page as defined by the above mapping.
*/
-void __dma_free_coherent(size_t size, void *vaddr)
+void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
{
struct ppc_vm_region *c;
unsigned long flags, addr;
@@ -309,7 +309,6 @@ void __dma_free_coherent(size_t size, void *vaddr)
__func__, vaddr);
dump_stack();
}
-EXPORT_SYMBOL(__dma_free_coherent);

/*
* make an area consistent.
@@ -401,7 +400,7 @@ EXPORT_SYMBOL(__dma_sync_page);

/*
* Return the PFN for a given cpu virtual address returned by
- * __dma_alloc_coherent. This is used by dma_mmap_coherent()
+ * __dma_nommu_alloc_coherent. This is used by dma_mmap_coherent()
*/
unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr)
{
diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c
index a886c2c22097..7e4f8ca19ce8 100644
--- a/arch/powerpc/platforms/44x/warp.c
+++ b/arch/powerpc/platforms/44x/warp.c
@@ -47,7 +47,7 @@ static int __init warp_probe(void)
if (!of_machine_is_compatible("pika,warp"))
return 0;

- /* For __dma_alloc_coherent */
+ /* For __dma_nommu_alloc_coherent */
ISA_DMA_THRESHOLD = ~0L;

return 1;
--
2.19.0


2018-10-09 13:26:25

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 06/33] powerpc/dma: remove the no-op dma_nommu_unmap_{page,sg} routines

These methods are optional, no need to implement no-op versions.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/kernel/dma.c | 16 ----------------
1 file changed, 16 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index d6deb458bb91..7078d72baec2 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -197,12 +197,6 @@ static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
return nents;
}

-static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
-}
-
static u64 dma_nommu_get_required_mask(struct device *dev)
{
u64 end, mask;
@@ -228,14 +222,6 @@ static inline dma_addr_t dma_nommu_map_page(struct device *dev,
return page_to_phys(page) + offset + get_dma_offset(dev);
}

-static inline void dma_nommu_unmap_page(struct device *dev,
- dma_addr_t dma_address,
- size_t size,
- enum dma_data_direction direction,
- unsigned long attrs)
-{
-}
-
#ifdef CONFIG_NOT_COHERENT_CACHE
static inline void dma_nommu_sync_sg(struct device *dev,
struct scatterlist *sgl, int nents,
@@ -261,10 +247,8 @@ const struct dma_map_ops dma_nommu_ops = {
.free = dma_nommu_free_coherent,
.mmap = dma_nommu_mmap_coherent,
.map_sg = dma_nommu_map_sg,
- .unmap_sg = dma_nommu_unmap_sg,
.dma_supported = dma_nommu_dma_supported,
.map_page = dma_nommu_map_page,
- .unmap_page = dma_nommu_unmap_page,
.get_required_mask = dma_nommu_get_required_mask,
#ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_for_cpu = dma_nommu_sync_single,
--
2.19.0


2018-10-09 13:26:27

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 07/33] powerpc/dma: untangle vio_dma_mapping_ops from dma_iommu_ops

vio_dma_mapping_ops currently does a lot of indirect calls through
dma_iommu_ops, which not only make the code harder to follow but are
also expensive in the post-spectre world. Unwind the indirect calls
by calling the ppc_iommu_* or iommu_* APIs directly applicable, or
just use the dma_iommu_* methods directly where we can.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/iommu.h | 1 +
arch/powerpc/kernel/dma-iommu.c | 2 +-
arch/powerpc/platforms/pseries/vio.c | 87 ++++++++++++----------------
3 files changed, 38 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index ab3a4fba38e3..26b7cc176a99 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -244,6 +244,7 @@ static inline int __init tce_iommu_bus_notifier_init(void)
}
#endif /* !CONFIG_IOMMU_API */

+u64 dma_iommu_get_required_mask(struct device *dev);
int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr);

#else
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 2ca6cfaebf65..0613278abf9f 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -92,7 +92,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
return 1;
}

-static u64 dma_iommu_get_required_mask(struct device *dev)
+u64 dma_iommu_get_required_mask(struct device *dev)
{
struct iommu_table *tbl = get_iommu_table_base(dev);
u64 mask;
diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
index 49e04ec19238..1dfff53ebd7f 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -492,7 +492,9 @@ static void *vio_dma_iommu_alloc_coherent(struct device *dev, size_t size,
return NULL;
}

- ret = dma_iommu_ops.alloc(dev, size, dma_handle, flag, attrs);
+ ret = iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
+ dma_handle, dev->coherent_dma_mask, flag,
+ dev_to_node(dev));
if (unlikely(ret == NULL)) {
vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
atomic_inc(&viodev->cmo.allocs_failed);
@@ -507,8 +509,7 @@ static void vio_dma_iommu_free_coherent(struct device *dev, size_t size,
{
struct vio_dev *viodev = to_vio_dev(dev);

- dma_iommu_ops.free(dev, size, vaddr, dma_handle, attrs);
-
+ iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
}

@@ -518,22 +519,22 @@ static dma_addr_t vio_dma_iommu_map_page(struct device *dev, struct page *page,
unsigned long attrs)
{
struct vio_dev *viodev = to_vio_dev(dev);
- struct iommu_table *tbl;
+ struct iommu_table *tbl = get_iommu_table_base(dev);
dma_addr_t ret = IOMMU_MAPPING_ERROR;

- tbl = get_iommu_table_base(dev);
- if (vio_cmo_alloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)))) {
- atomic_inc(&viodev->cmo.allocs_failed);
- return ret;
- }
-
- ret = dma_iommu_ops.map_page(dev, page, offset, size, direction, attrs);
- if (unlikely(dma_mapping_error(dev, ret))) {
- vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
- atomic_inc(&viodev->cmo.allocs_failed);
- }
-
+ if (vio_cmo_alloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl))))
+ goto out_fail;
+ ret = iommu_map_page(dev, tbl, page, offset, size, device_to_mask(dev),
+ direction, attrs);
+ if (unlikely(ret == IOMMU_MAPPING_ERROR))
+ goto out_deallocate;
return ret;
+
+out_deallocate:
+ vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
+out_fail:
+ atomic_inc(&viodev->cmo.allocs_failed);
+ return IOMMU_MAPPING_ERROR;
}

static void vio_dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
@@ -542,11 +543,9 @@ static void vio_dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
unsigned long attrs)
{
struct vio_dev *viodev = to_vio_dev(dev);
- struct iommu_table *tbl;
-
- tbl = get_iommu_table_base(dev);
- dma_iommu_ops.unmap_page(dev, dma_handle, size, direction, attrs);
+ struct iommu_table *tbl = get_iommu_table_base(dev);

+ iommu_unmap_page(tbl, dma_handle, size, direction, attrs);
vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
}

@@ -555,34 +554,32 @@ static int vio_dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
unsigned long attrs)
{
struct vio_dev *viodev = to_vio_dev(dev);
- struct iommu_table *tbl;
+ struct iommu_table *tbl = get_iommu_table_base(dev);
struct scatterlist *sgl;
int ret, count;
size_t alloc_size = 0;

- tbl = get_iommu_table_base(dev);
for_each_sg(sglist, sgl, nelems, count)
alloc_size += roundup(sgl->length, IOMMU_PAGE_SIZE(tbl));

- if (vio_cmo_alloc(viodev, alloc_size)) {
- atomic_inc(&viodev->cmo.allocs_failed);
- return 0;
- }
-
- ret = dma_iommu_ops.map_sg(dev, sglist, nelems, direction, attrs);
-
- if (unlikely(!ret)) {
- vio_cmo_dealloc(viodev, alloc_size);
- atomic_inc(&viodev->cmo.allocs_failed);
- return ret;
- }
+ if (vio_cmo_alloc(viodev, alloc_size))
+ goto out_fail;
+ ret = ppc_iommu_map_sg(dev, tbl, sglist, nelems, device_to_mask(dev),
+ direction, attrs);
+ if (unlikely(!ret))
+ goto out_deallocate;

for_each_sg(sglist, sgl, ret, count)
alloc_size -= roundup(sgl->dma_length, IOMMU_PAGE_SIZE(tbl));
if (alloc_size)
vio_cmo_dealloc(viodev, alloc_size);
-
return ret;
+
+out_deallocate:
+ vio_cmo_dealloc(viodev, alloc_size);
+out_fail:
+ atomic_inc(&viodev->cmo.allocs_failed);
+ return 0;
}

static void vio_dma_iommu_unmap_sg(struct device *dev,
@@ -591,30 +588,18 @@ static void vio_dma_iommu_unmap_sg(struct device *dev,
unsigned long attrs)
{
struct vio_dev *viodev = to_vio_dev(dev);
- struct iommu_table *tbl;
+ struct iommu_table *tbl = get_iommu_table_base(dev);
struct scatterlist *sgl;
size_t alloc_size = 0;
int count;

- tbl = get_iommu_table_base(dev);
for_each_sg(sglist, sgl, nelems, count)
alloc_size += roundup(sgl->dma_length, IOMMU_PAGE_SIZE(tbl));

- dma_iommu_ops.unmap_sg(dev, sglist, nelems, direction, attrs);
-
+ ppc_iommu_unmap_sg(tbl, sglist, nelems, direction, attrs);
vio_cmo_dealloc(viodev, alloc_size);
}

-static int vio_dma_iommu_dma_supported(struct device *dev, u64 mask)
-{
- return dma_iommu_ops.dma_supported(dev, mask);
-}
-
-static u64 vio_dma_get_required_mask(struct device *dev)
-{
- return dma_iommu_ops.get_required_mask(dev);
-}
-
static const struct dma_map_ops vio_dma_mapping_ops = {
.alloc = vio_dma_iommu_alloc_coherent,
.free = vio_dma_iommu_free_coherent,
@@ -623,8 +608,8 @@ static const struct dma_map_ops vio_dma_mapping_ops = {
.unmap_sg = vio_dma_iommu_unmap_sg,
.map_page = vio_dma_iommu_map_page,
.unmap_page = vio_dma_iommu_unmap_page,
- .dma_supported = vio_dma_iommu_dma_supported,
- .get_required_mask = vio_dma_get_required_mask,
+ .dma_supported = dma_iommu_mapping_error,
+ .get_required_mask = dma_iommu_get_required_mask,
.mapping_error = dma_iommu_mapping_error,
};

--
2.19.0


2018-10-09 13:26:50

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 12/33] powerpc/cell: use the generic iommu bypass code

This gets rid of a lot of clumsy code and finally allows us to mark
dma_iommu_ops const.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/dma-mapping.h | 2 +-
arch/powerpc/include/asm/iommu.h | 6 ++
arch/powerpc/kernel/dma-iommu.c | 7 +-
arch/powerpc/platforms/cell/iommu.c | 143 ++-----------------------
4 files changed, 22 insertions(+), 136 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 27283eb68c50..59c090b7eaac 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -73,7 +73,7 @@ static inline unsigned long device_to_mask(struct device *dev)
* Available generic sets of operations
*/
#ifdef CONFIG_PPC64
-extern struct dma_map_ops dma_iommu_ops;
+extern const struct dma_map_ops dma_iommu_ops;
#endif
extern const struct dma_map_ops dma_nommu_ops;

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 26b7cc176a99..264d04f1dcd1 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -328,5 +328,11 @@ extern void iommu_release_ownership(struct iommu_table *tbl);
extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
extern unsigned long iommu_direction_to_tce_perm(enum dma_data_direction dir);

+#ifdef CONFIG_PPC_CELL_NATIVE
+extern bool iommu_fixed_is_weak;
+#else
+#define iommu_fixed_is_weak false
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index c865e15ad024..7fa3636636fa 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -20,14 +20,15 @@
*/
static inline bool dma_iommu_alloc_bypass(struct device *dev)
{
- return dev->archdata.iommu_bypass &&
+ return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
dma_nommu_dma_supported(dev, dev->coherent_dma_mask);
}

static inline bool dma_iommu_map_bypass(struct device *dev,
unsigned long attrs)
{
- return dev->archdata.iommu_bypass;
+ return dev->archdata.iommu_bypass &&
+ (!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
}

/* Allocates a contiguous real buffer and creates mappings over it.
@@ -168,7 +169,7 @@ int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
return dma_addr == IOMMU_MAPPING_ERROR;
}

-struct dma_map_ops dma_iommu_ops = {
+const struct dma_map_ops dma_iommu_ops = {
.alloc = dma_iommu_alloc_coherent,
.free = dma_iommu_free_coherent,
.mmap = dma_nommu_mmap_coherent,
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index cce5bf9515e5..fb51f78035ce 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -546,7 +546,7 @@ static unsigned long cell_dma_nommu_offset;
static unsigned long dma_iommu_fixed_base;

/* iommu_fixed_is_weak is set if booted with iommu_fixed=weak */
-static int iommu_fixed_is_weak;
+bool iommu_fixed_is_weak;

static struct iommu_table *cell_get_iommu_table(struct device *dev)
{
@@ -568,95 +568,6 @@ static struct iommu_table *cell_get_iommu_table(struct device *dev)
return &window->table;
}

-/* A coherent allocation implies strong ordering */
-
-static void *dma_fixed_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
- if (iommu_fixed_is_weak)
- return iommu_alloc_coherent(dev, cell_get_iommu_table(dev),
- size, dma_handle,
- device_to_mask(dev), flag,
- dev_to_node(dev));
- else
- return dma_nommu_ops.alloc(dev, size, dma_handle, flag,
- attrs);
-}
-
-static void dma_fixed_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- if (iommu_fixed_is_weak)
- iommu_free_coherent(cell_get_iommu_table(dev), size, vaddr,
- dma_handle);
- else
- dma_nommu_ops.free(dev, size, vaddr, dma_handle, attrs);
-}
-
-static dma_addr_t dma_fixed_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction direction,
- unsigned long attrs)
-{
- if (iommu_fixed_is_weak == (attrs & DMA_ATTR_WEAK_ORDERING))
- return dma_nommu_ops.map_page(dev, page, offset, size,
- direction, attrs);
- else
- return iommu_map_page(dev, cell_get_iommu_table(dev), page,
- offset, size, device_to_mask(dev),
- direction, attrs);
-}
-
-static void dma_fixed_unmap_page(struct device *dev, dma_addr_t dma_addr,
- size_t size, enum dma_data_direction direction,
- unsigned long attrs)
-{
- if (iommu_fixed_is_weak == (attrs & DMA_ATTR_WEAK_ORDERING))
- dma_nommu_ops.unmap_page(dev, dma_addr, size, direction,
- attrs);
- else
- iommu_unmap_page(cell_get_iommu_table(dev), dma_addr, size,
- direction, attrs);
-}
-
-static int dma_fixed_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
- if (iommu_fixed_is_weak == (attrs & DMA_ATTR_WEAK_ORDERING))
- return dma_nommu_ops.map_sg(dev, sg, nents, direction, attrs);
- else
- return ppc_iommu_map_sg(dev, cell_get_iommu_table(dev), sg,
- nents, device_to_mask(dev),
- direction, attrs);
-}
-
-static void dma_fixed_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
- if (iommu_fixed_is_weak == (attrs & DMA_ATTR_WEAK_ORDERING))
- dma_nommu_ops.unmap_sg(dev, sg, nents, direction, attrs);
- else
- ppc_iommu_unmap_sg(cell_get_iommu_table(dev), sg, nents,
- direction, attrs);
-}
-
-static int dma_suported_and_switch(struct device *dev, u64 dma_mask);
-
-static const struct dma_map_ops dma_iommu_fixed_ops = {
- .alloc = dma_fixed_alloc_coherent,
- .free = dma_fixed_free_coherent,
- .map_sg = dma_fixed_map_sg,
- .unmap_sg = dma_fixed_unmap_sg,
- .dma_supported = dma_suported_and_switch,
- .map_page = dma_fixed_map_page,
- .unmap_page = dma_fixed_unmap_page,
- .mapping_error = dma_iommu_mapping_error,
-};
-
static u64 cell_iommu_get_fixed_address(struct device *dev);

static void cell_dma_dev_setup(struct device *dev)
@@ -953,22 +864,10 @@ static u64 cell_iommu_get_fixed_address(struct device *dev)
return dev_addr;
}

-static int dma_suported_and_switch(struct device *dev, u64 dma_mask)
+static bool cell_pci_iommu_bypass_supported(struct pci_dev *pdev, u64 mask)
{
- if (dma_mask == DMA_BIT_MASK(64) &&
- cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR) {
- dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
- set_dma_ops(dev, &dma_iommu_fixed_ops);
- return 1;
- }
-
- if (dma_iommu_dma_supported(dev, dma_mask)) {
- dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
- set_dma_ops(dev, &dma_iommu_ops);
- return 1;
- }
-
- return 0;
+ return mask == DMA_BIT_MASK(64) &&
+ cell_iommu_get_fixed_address(&pdev->dev) != OF_BAD_ADDR;
}

static void insert_16M_pte(unsigned long addr, unsigned long *ptab,
@@ -1122,9 +1021,6 @@ static int __init cell_iommu_fixed_mapping_init(void)
cell_iommu_setup_window(iommu, np, dbase, dsize, 0);
}

- dma_iommu_ops.dma_supported = dma_suported_and_switch;
- set_pci_dma_ops(&dma_iommu_ops);
-
return 0;
}

@@ -1145,7 +1041,7 @@ static int __init setup_iommu_fixed(char *str)
pciep = of_find_node_by_type(NULL, "pcie-endpoint");

if (strcmp(str, "weak") == 0 || (pciep && strcmp(str, "strong") != 0))
- iommu_fixed_is_weak = DMA_ATTR_WEAK_ORDERING;
+ iommu_fixed_is_weak = true;

of_node_put(pciep);

@@ -1153,26 +1049,6 @@ static int __init setup_iommu_fixed(char *str)
}
__setup("iommu_fixed=", setup_iommu_fixed);

-static u64 cell_dma_get_required_mask(struct device *dev)
-{
- const struct dma_map_ops *dma_ops;
-
- if (!dev->dma_mask)
- return 0;
-
- if (!iommu_fixed_disabled &&
- cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR)
- return DMA_BIT_MASK(64);
-
- dma_ops = get_dma_ops(dev);
- if (dma_ops->get_required_mask)
- return dma_ops->get_required_mask(dev);
-
- WARN_ONCE(1, "no get_required_mask in %p ops", dma_ops);
-
- return DMA_BIT_MASK(64);
-}
-
static int __init cell_iommu_init(void)
{
struct device_node *np;
@@ -1189,10 +1065,9 @@ static int __init cell_iommu_init(void)

/* Setup various callbacks */
cell_pci_controller_ops.dma_dev_setup = cell_pci_dma_dev_setup;
- ppc_md.dma_get_required_mask = cell_dma_get_required_mask;

if (!iommu_fixed_disabled && cell_iommu_fixed_mapping_init() == 0)
- goto bail;
+ goto done;

/* Create an iommu for each /axon node. */
for_each_node_by_name(np, "axon") {
@@ -1209,8 +1084,12 @@ static int __init cell_iommu_init(void)
continue;
cell_iommu_init_one(np, SPIDER_DMA_OFFSET);
}
-
+ done:
/* Setup default PCI iommu ops */
+ if (!iommu_fixed_disabled) {
+ cell_pci_controller_ops.iommu_bypass_supported =
+ cell_pci_iommu_bypass_supported;
+ }
set_pci_dma_ops(&dma_iommu_ops);

bail:
--
2.19.0


2018-10-09 13:26:52

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/33] powerpc/cell: move dma direct window setup out of dma_configure

Configure the dma settings at device setup time, and stop playing games
with get_pci_dma_ops. This prepares for using the common dma_configure
code later on.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/platforms/cell/iommu.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index 12352a58072a..cce5bf9515e5 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -657,14 +657,21 @@ static const struct dma_map_ops dma_iommu_fixed_ops = {
.mapping_error = dma_iommu_mapping_error,
};

+static u64 cell_iommu_get_fixed_address(struct device *dev);
+
static void cell_dma_dev_setup(struct device *dev)
{
- if (get_pci_dma_ops() == &dma_iommu_ops)
+ if (get_pci_dma_ops() == &dma_iommu_ops) {
+ u64 addr = cell_iommu_get_fixed_address(dev);
+
+ if (addr != OF_BAD_ADDR)
+ set_dma_offset(dev, addr + dma_iommu_fixed_base);
set_iommu_table_base(dev, cell_get_iommu_table(dev));
- else if (get_pci_dma_ops() == &dma_nommu_ops)
+ } else if (get_pci_dma_ops() == &dma_nommu_ops) {
set_dma_offset(dev, cell_dma_nommu_offset);
- else
+ } else {
BUG();
+ }
}

static void cell_pci_dma_dev_setup(struct pci_dev *dev)
@@ -950,19 +957,14 @@ static int dma_suported_and_switch(struct device *dev, u64 dma_mask)
{
if (dma_mask == DMA_BIT_MASK(64) &&
cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR) {
- u64 addr = cell_iommu_get_fixed_address(dev) +
- dma_iommu_fixed_base;
dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
- dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
set_dma_ops(dev, &dma_iommu_fixed_ops);
- set_dma_offset(dev, addr);
return 1;
}

if (dma_iommu_dma_supported(dev, dma_mask)) {
dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
- set_dma_ops(dev, get_pci_dma_ops());
- cell_dma_dev_setup(dev);
+ set_dma_ops(dev, &dma_iommu_ops);
return 1;
}

--
2.19.0


2018-10-09 13:27:08

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/33] powerpc/dma: remove the unused dma_iommu_ops export

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/kernel/dma-iommu.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index f9fe2080ceb9..2ca6cfaebf65 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -6,7 +6,6 @@
* busses using the iommu infrastructure
*/

-#include <linux/export.h>
#include <asm/iommu.h>

/*
@@ -123,4 +122,3 @@ struct dma_map_ops dma_iommu_ops = {
.get_required_mask = dma_iommu_get_required_mask,
.mapping_error = dma_iommu_mapping_error,
};
-EXPORT_SYMBOL(dma_iommu_ops);
--
2.19.0


2018-10-09 13:27:16

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 09/33] powerpc/pseries: unwind dma_get_required_mask_pSeriesLP a bit

Call dma_get_required_mask_pSeriesLP directly instead of dma_iommu_ops
to simply the code a bit.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 06f02960b439..da5716de7f4c 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1273,7 +1273,7 @@ static u64 dma_get_required_mask_pSeriesLP(struct device *dev)
return DMA_BIT_MASK(64);
}

- return dma_iommu_ops.get_required_mask(dev);
+ return dma_iommu_get_required_mask(dev);
}

static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
--
2.19.0


2018-10-09 13:27:24

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 15/33] powerpc/powernv: remove pnv_pci_ioda_pe_single_vendor

This function is completely bogus - the fact that two PCIe devices come
from the same vendor has absolutely nothing to say about the DMA
capabilities and characteristics.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/platforms/powernv/pci-ioda.c | 28 ++---------------------
1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index cde710297a4e..913175ba1c10 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1746,31 +1746,6 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
*/
}

-static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
-{
- unsigned short vendor = 0;
- struct pci_dev *pdev;
-
- if (pe->device_count == 1)
- return true;
-
- /* pe->pdev should be set if it's a single device, pe->pbus if not */
- if (!pe->pbus)
- return true;
-
- list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
- if (!vendor) {
- vendor = pdev->vendor;
- continue;
- }
-
- if (pdev->vendor != vendor)
- return false;
- }
-
- return true;
-}
-
/*
* Reconfigure TVE#0 to be usable as 64-bit DMA space.
*
@@ -1871,7 +1846,8 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
*/
if (dma_mask >> 32 &&
dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
- pnv_pci_ioda_pe_single_vendor(pe) &&
+ /* pe->pdev should be set if it's a single device, pe->pbus if not */
+ (pe->device_count == 1 || !pe->pbus) &&
phb->model == PNV_PHB_MODEL_PHB3) {
/* Configure the bypass mode */
rc = pnv_pci_ioda_dma_64bit_bypass(pe);
--
2.19.0


2018-10-09 13:27:40

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 29/33] powerpc/dma: remove get_dma_offset

Just fold the calculation into __phys_to_dma/__dma_to_phys as those are
the only places that should know about it.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/include/asm/dma-direct.h | 8 ++++++--
arch/powerpc/include/asm/dma-mapping.h | 16 ----------------
2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h
index 92d8aed86422..a2912b47102c 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -13,11 +13,15 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)

static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
{
- return paddr + get_dma_offset(dev);
+ if (!dev)
+ return paddr + PCI_DRAM_OFFSET;
+ return paddr + dev->archdata.dma_offset;
}

static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
{
- return daddr - get_dma_offset(dev);
+ if (!dev)
+ return daddr - PCI_DRAM_OFFSET;
+ return daddr - dev->archdata.dma_offset;
}
#endif /* ASM_POWERPC_DMA_DIRECT_H */
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 7694985f05ee..2d0879b0acf3 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -87,22 +87,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
return NULL;
}

-/*
- * get_dma_offset()
- *
- * Get the dma offset on configurations where the dma address can be determined
- * from the physical address by looking at a simple offset. Direct dma and
- * swiotlb use this function, but it is typically not used by implementations
- * with an iommu.
- */
-static inline dma_addr_t get_dma_offset(struct device *dev)
-{
- if (dev)
- return dev->archdata.dma_offset;
-
- return PCI_DRAM_OFFSET;
-}
-
static inline void set_dma_offset(struct device *dev, dma_addr_t off)
{
if (dev)
--
2.19.0


2018-10-09 13:27:45

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 21/33] powerpc/dma: remove get_pci_dma_ops

This function is only used by the Cell iommu code, which can keep track
if it is using the iommu internally just as good.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/pci.h | 2 --
arch/powerpc/kernel/pci-common.c | 6 ------
arch/powerpc/platforms/cell/iommu.c | 17 ++++++++---------
3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index a01d2e3d6ff9..4f7cf0a7f89d 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -52,10 +52,8 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)

#ifdef CONFIG_PCI
extern void set_pci_dma_ops(const struct dma_map_ops *dma_ops);
-extern const struct dma_map_ops *get_pci_dma_ops(void);
#else /* CONFIG_PCI */
#define set_pci_dma_ops(d)
-#define get_pci_dma_ops() NULL
#endif

#ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 88e4f69a09e5..a84707680525 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -69,12 +69,6 @@ void set_pci_dma_ops(const struct dma_map_ops *dma_ops)
pci_dma_ops = dma_ops;
}

-const struct dma_map_ops *get_pci_dma_ops(void)
-{
- return pci_dma_ops;
-}
-EXPORT_SYMBOL(get_pci_dma_ops);
-
/*
* This function should run under locking protection, specifically
* hose_spinlock.
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index fb51f78035ce..93c7e4aef571 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -544,6 +544,7 @@ static struct cbe_iommu *cell_iommu_for_node(int nid)
static unsigned long cell_dma_nommu_offset;

static unsigned long dma_iommu_fixed_base;
+static bool cell_iommu_enabled;

/* iommu_fixed_is_weak is set if booted with iommu_fixed=weak */
bool iommu_fixed_is_weak;
@@ -572,16 +573,14 @@ static u64 cell_iommu_get_fixed_address(struct device *dev);

static void cell_dma_dev_setup(struct device *dev)
{
- if (get_pci_dma_ops() == &dma_iommu_ops) {
+ if (cell_iommu_enabled) {
u64 addr = cell_iommu_get_fixed_address(dev);

if (addr != OF_BAD_ADDR)
set_dma_offset(dev, addr + dma_iommu_fixed_base);
set_iommu_table_base(dev, cell_get_iommu_table(dev));
- } else if (get_pci_dma_ops() == &dma_nommu_ops) {
- set_dma_offset(dev, cell_dma_nommu_offset);
} else {
- BUG();
+ set_dma_offset(dev, cell_dma_nommu_offset);
}
}

@@ -599,11 +598,11 @@ static int cell_of_bus_notify(struct notifier_block *nb, unsigned long action,
if (action != BUS_NOTIFY_ADD_DEVICE)
return 0;

- /* We use the PCI DMA ops */
- dev->dma_ops = get_pci_dma_ops();
-
+ if (cell_iommu_enabled)
+ dev->dma_ops = &dma_iommu_ops;
+ else
+ dev->dma_ops = &dma_nommu_ops;
cell_dma_dev_setup(dev);
-
return 0;
}

@@ -1091,7 +1090,7 @@ static int __init cell_iommu_init(void)
cell_pci_iommu_bypass_supported;
}
set_pci_dma_ops(&dma_iommu_ops);
-
+ cell_iommu_enabled = true;
bail:
/* Register callbacks on OF platform device addition/removal
* to handle linking them to the right DMA operations
--
2.19.0


2018-10-09 13:27:50

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 33/33] powerpc/dma: trim the fat from <asm/dma-mapping.h>

There is no need to provide anything but get_arch_dma_ops to
<linux/dma-mapping.h>. More the remaining declarations to <asm/iommu.h>
and drop all the includes.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/dma-mapping.h | 29 --------------------------
arch/powerpc/include/asm/iommu.h | 10 +++++++++
arch/powerpc/platforms/44x/ppc476.c | 1 +
3 files changed, 11 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index a59c42879194..565d6f74b189 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -1,37 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2004 IBM
- *
- * Implements the generic device dma API for powerpc.
- * the pci and vio busses
*/
#ifndef _ASM_DMA_MAPPING_H
#define _ASM_DMA_MAPPING_H
-#ifdef __KERNEL__
-
-#include <linux/types.h>
-#include <linux/cache.h>
-/* need struct page definitions */
-#include <linux/mm.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-debug.h>
-#include <asm/io.h>
-#include <asm/swiotlb.h>
-
-static inline unsigned long device_to_mask(struct device *dev)
-{
- if (dev->dma_mask && *dev->dma_mask)
- return *dev->dma_mask;
- /* Assume devices without mask can take 32 bit addresses */
- return 0xfffffffful;
-}
-
-/*
- * Available generic sets of operations
- */
-#ifdef CONFIG_PPC64
-extern const struct dma_map_ops dma_iommu_ops;
-#endif

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
@@ -43,5 +15,4 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
return NULL;
}

-#endif /* __KERNEL__ */
#endif /* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 264d04f1dcd1..db745575e6a4 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -334,5 +334,15 @@ extern bool iommu_fixed_is_weak;
#define iommu_fixed_is_weak false
#endif

+extern const struct dma_map_ops dma_iommu_ops;
+
+static inline unsigned long device_to_mask(struct device *dev)
+{
+ if (dev->dma_mask && *dev->dma_mask)
+ return *dev->dma_mask;
+ /* Assume devices without mask can take 32 bit addresses */
+ return 0xfffffffful;
+}
+
#endif /* __KERNEL__ */
#endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/platforms/44x/ppc476.c b/arch/powerpc/platforms/44x/ppc476.c
index e55933f9cd55..a5e61e5c16e2 100644
--- a/arch/powerpc/platforms/44x/ppc476.c
+++ b/arch/powerpc/platforms/44x/ppc476.c
@@ -34,6 +34,7 @@
#include <asm/ppc4xx.h>
#include <asm/mpic.h>
#include <asm/mmu.h>
+#include <asm/swiotlb.h>

#include <linux/pci.h>
#include <linux/i2c.h>
--
2.19.0


2018-10-09 13:27:51

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 30/33] powerpc/dma: remove set_dma_offset

There is no good reason for this helper, just opencode it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/dma-mapping.h | 6 ------
arch/powerpc/kernel/pci-common.c | 2 +-
arch/powerpc/platforms/cell/iommu.c | 4 ++--
arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++---
arch/powerpc/platforms/pseries/iommu.c | 7 ++-----
arch/powerpc/sysdev/dart_iommu.c | 2 +-
arch/powerpc/sysdev/fsl_pci.c | 2 +-
drivers/misc/cxl/vphb.c | 2 +-
8 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 2d0879b0acf3..e12439ae8211 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -87,11 +87,5 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
return NULL;
}

-static inline void set_dma_offset(struct device *dev, dma_addr_t off)
-{
- if (dev)
- dev->archdata.dma_offset = off;
-}
-
#endif /* __KERNEL__ */
#endif /* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a84707680525..fca2b0bafd82 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -966,7 +966,7 @@ static void pcibios_setup_device(struct pci_dev *dev)

/* Hook up default DMA ops */
set_dma_ops(&dev->dev, pci_dma_ops);
- set_dma_offset(&dev->dev, PCI_DRAM_OFFSET);
+ dev->dev.archdata.dma_offset = PCI_DRAM_OFFSET;

/* Additional platform DMA/iommu setup */
phb = pci_bus_to_host(dev->bus);
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index 93c7e4aef571..997ee0f5fc2f 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -577,10 +577,10 @@ static void cell_dma_dev_setup(struct device *dev)
u64 addr = cell_iommu_get_fixed_address(dev);

if (addr != OF_BAD_ADDR)
- set_dma_offset(dev, addr + dma_iommu_fixed_base);
+ dev->archdata.dma_offset = addr + dma_iommu_fixed_base;
set_iommu_table_base(dev, cell_get_iommu_table(dev));
} else {
- set_dma_offset(dev, cell_dma_nommu_offset);
+ dev->archdata.dma_offset = cell_dma_nommu_offset;
}
}

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5748b62e2e86..e2333e10598d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1660,7 +1660,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev

pe = &phb->ioda.pe_array[pdn->pe_number];
WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
- set_dma_offset(&pdev->dev, pe->tce_bypass_base);
+ pdev->dev.archdata.dma_offset = pe->tce_bypass_base;
set_iommu_table_base(&pdev->dev, pe->table_group.tables[0]);
/*
* Note: iommu_add_device() will fail here as
@@ -1773,7 +1773,7 @@ static bool pnv_pci_ioda_iommu_bypass_supported(struct pci_dev *pdev,
if (rc)
return rc;
/* 4GB offset bypasses 32-bit space */
- set_dma_offset(&pdev->dev, (1ULL << 32));
+ pdev->dev.archdata.dma_offset = (1ULL << 32);
return true;
}

@@ -1788,7 +1788,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,

list_for_each_entry(dev, &bus->devices, bus_list) {
set_iommu_table_base(&dev->dev, pe->table_group.tables[0]);
- set_dma_offset(&dev->dev, pe->tce_bypass_base);
+ dev->dev.archdata.dma_offset = pe->tce_bypass_base;
if (add_to_group)
iommu_add_device(&dev->dev);

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 8965d174c53b..a2ff20d154fe 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1197,7 +1197,6 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
{
struct device_node *dn = pci_device_to_OF_node(pdev), *pdn;
const __be32 *dma_window = NULL;
- u64 dma_offset;

/* only attempt to use a new window if 64-bit DMA is requested */
if (dma_mask < DMA_BIT_MASK(64))
@@ -1219,11 +1218,9 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
}

if (pdn && PCI_DN(pdn)) {
- dma_offset = enable_ddw(pdev, pdn);
- if (dma_offset != 0) {
- set_dma_offset(&pdev->dev, dma_offset);
+ pdev->dev.archdata.dma_offset = enable_ddw(pdev, pdn);
+ if (pdev->dev.archdata.dma_offset)
return true;
- }
}

return false;
diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index e7d1645a2d2e..21d7e8a30aea 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -386,7 +386,7 @@ static bool dart_device_on_pcie(struct device *dev)
static void pci_dma_dev_setup_dart(struct pci_dev *dev)
{
if (dart_is_u4 && dart_device_on_pcie(&dev->dev))
- set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
+ dev->dev.archdata.dma_offset = DART_U4_BYPASS_BASE;
set_iommu_table_base(&dev->dev, &iommu_table_dart);
}

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index cb91a3d113d1..d6b7c91a9ce2 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -141,7 +141,7 @@ static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
*/
if (dev_is_pci(dev) && dma_mask >= pci64_dma_offset * 2 - 1) {
dev->bus_dma_mask = 0;
- set_dma_offset(dev, pci64_dma_offset);
+ dev->archdata.dma_offset = pci64_dma_offset;
}

return 0;
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 49da2f744bbf..f0645ba1d728 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -44,7 +44,7 @@ static bool cxl_pci_enable_device_hook(struct pci_dev *dev)
}

set_dma_ops(&dev->dev, &dma_nommu_ops);
- set_dma_offset(&dev->dev, PAGE_OFFSET);
+ dev->dev.archdata.dma_offset = PAGE_OFFSET;

/*
* Allocate a context to do cxl things too. If we eventually do real
--
2.19.0


2018-10-09 13:27:54

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 10/33] powerpc/pseries: use the generic iommu bypass code

Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 100 +++++++------------------
1 file changed, 27 insertions(+), 73 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index da5716de7f4c..8965d174c53b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -973,7 +973,7 @@ static LIST_HEAD(failed_ddw_pdn_list);
* pdn: the parent pe node with the ibm,dma_window property
* Future: also check if we can remap the base window for our base page size
*
- * returns the dma offset for use by dma_set_mask
+ * returns the dma offset for use by the direct mapped DMA code.
*/
static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
{
@@ -1193,87 +1193,40 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
iommu_add_device(&dev->dev);
}

-static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
+static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
{
- bool ddw_enabled = false;
- struct device_node *pdn, *dn;
- struct pci_dev *pdev;
+ struct device_node *dn = pci_device_to_OF_node(pdev), *pdn;
const __be32 *dma_window = NULL;
u64 dma_offset;

- if (!dev->dma_mask)
- return -EIO;
-
- if (!dev_is_pci(dev))
- goto check_mask;
-
- pdev = to_pci_dev(dev);
-
/* only attempt to use a new window if 64-bit DMA is requested */
- if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
- dn = pci_device_to_OF_node(pdev);
- dev_dbg(dev, "node is %pOF\n", dn);
+ if (dma_mask < DMA_BIT_MASK(64))
+ return false;

- /*
- * the device tree might contain the dma-window properties
- * per-device and not necessarily for the bus. So we need to
- * search upwards in the tree until we either hit a dma-window
- * property, OR find a parent with a table already allocated.
- */
- for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
- pdn = pdn->parent) {
- dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
- if (dma_window)
- break;
- }
- if (pdn && PCI_DN(pdn)) {
- dma_offset = enable_ddw(pdev, pdn);
- if (dma_offset != 0) {
- dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
- set_dma_offset(dev, dma_offset);
- set_dma_ops(dev, &dma_nommu_ops);
- ddw_enabled = true;
- }
- }
- }
+ dev_dbg(&pdev->dev, "node is %pOF\n", dn);

- /* fall back on iommu ops */
- if (!ddw_enabled && get_dma_ops(dev) != &dma_iommu_ops) {
- dev_info(dev, "Restoring 32-bit DMA via iommu\n");
- set_dma_ops(dev, &dma_iommu_ops);
+ /*
+ * the device tree might contain the dma-window properties
+ * per-device and not necessarily for the bus. So we need to
+ * search upwards in the tree until we either hit a dma-window
+ * property, OR find a parent with a table already allocated.
+ */
+ for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
+ pdn = pdn->parent) {
+ dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
+ if (dma_window)
+ break;
}

-check_mask:
- if (!dma_supported(dev, dma_mask))
- return -EIO;
-
- *dev->dma_mask = dma_mask;
- return 0;
-}
-
-static u64 dma_get_required_mask_pSeriesLP(struct device *dev)
-{
- if (!dev->dma_mask)
- return 0;
-
- if (!disable_ddw && dev_is_pci(dev)) {
- struct pci_dev *pdev = to_pci_dev(dev);
- struct device_node *dn;
-
- dn = pci_device_to_OF_node(pdev);
-
- /* search upwards for ibm,dma-window */
- for (; dn && PCI_DN(dn) && !PCI_DN(dn)->table_group;
- dn = dn->parent)
- if (of_get_property(dn, "ibm,dma-window", NULL))
- break;
- /* if there is a ibm,ddw-applicable property require 64 bits */
- if (dn && PCI_DN(dn) &&
- of_get_property(dn, "ibm,ddw-applicable", NULL))
- return DMA_BIT_MASK(64);
+ if (pdn && PCI_DN(pdn)) {
+ dma_offset = enable_ddw(pdev, pdn);
+ if (dma_offset != 0) {
+ set_dma_offset(&pdev->dev, dma_offset);
+ return true;
+ }
}

- return dma_iommu_get_required_mask(dev);
+ return false;
}

static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
@@ -1368,8 +1321,9 @@ void iommu_init_early_pSeries(void)
if (firmware_has_feature(FW_FEATURE_LPAR)) {
pseries_pci_controller_ops.dma_bus_setup = pci_dma_bus_setup_pSeriesLP;
pseries_pci_controller_ops.dma_dev_setup = pci_dma_dev_setup_pSeriesLP;
- ppc_md.dma_set_mask = dma_set_mask_pSeriesLP;
- ppc_md.dma_get_required_mask = dma_get_required_mask_pSeriesLP;
+ if (!disable_ddw)
+ pseries_pci_controller_ops.iommu_bypass_supported =
+ iommu_bypass_supported_pSeriesLP;
} else {
pseries_pci_controller_ops.dma_bus_setup = pci_dma_bus_setup_pSeries;
pseries_pci_controller_ops.dma_dev_setup = pci_dma_dev_setup_pSeries;
--
2.19.0


2018-10-09 13:27:59

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 32/33] powerpc/dma: use generic direct and swiotlb ops

- The ppc32 case of dma_nommu_dma_supported already was a no-op, and the
64-bit case came to the same conclusion as dma_direct_supported, so
replace it with the generic version.

- supports CMA

- Note that the cache maintainance in the existing code is a bit odd
as it implements both the sync_to_device and sync_to_cpu callouts,
but never flushes caches when unmapping. This patch keeps both
directions arounds, which will lead to more flushing than the previous
implementation. Someone more familar with the required CPUs should
eventually take a look and optimize the cache flush handling if needed.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/dma-mapping.h | 40 ------
arch/powerpc/include/asm/pgtable.h | 1 -
arch/powerpc/include/asm/swiotlb.h | 2 -
arch/powerpc/kernel/Makefile | 2 +-
arch/powerpc/kernel/dma-iommu.c | 11 +-
arch/powerpc/kernel/dma-swiotlb.c | 24 +---
arch/powerpc/kernel/dma.c | 190 -------------------------
arch/powerpc/kernel/pci-common.c | 2 +-
arch/powerpc/kernel/setup-common.c | 2 +-
arch/powerpc/mm/dma-noncoherent.c | 35 +++--
arch/powerpc/mm/mem.c | 19 ---
arch/powerpc/platforms/44x/warp.c | 2 +-
arch/powerpc/platforms/Kconfig.cputype | 2 +
arch/powerpc/platforms/cell/iommu.c | 4 +-
arch/powerpc/platforms/pasemi/iommu.c | 2 +-
arch/powerpc/platforms/pasemi/setup.c | 2 +-
arch/powerpc/platforms/pseries/vio.c | 7 +
arch/powerpc/sysdev/fsl_pci.c | 2 +-
drivers/misc/cxl/vphb.c | 2 +-
20 files changed, 49 insertions(+), 303 deletions(-)
delete mode 100644 arch/powerpc/kernel/dma.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7ea01687995d..7d29524533b4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -155,6 +155,7 @@ config PPC
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select DYNAMIC_FTRACE if FUNCTION_TRACER
+ select DMA_DIRECT_OPS
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
select GENERIC_ATOMIC64 if PPC32
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index ababad4b07a7..a59c42879194 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -18,45 +18,6 @@
#include <asm/io.h>
#include <asm/swiotlb.h>

-/* Some dma direct funcs must be visible for use in other dma_ops */
-extern void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs);
-extern void __dma_nommu_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs);
-int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction direction,
- unsigned long attrs);
-dma_addr_t dma_nommu_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs);
-int dma_nommu_dma_supported(struct device *dev, u64 mask);
-
-#ifdef CONFIG_NOT_COHERENT_CACHE
-/*
- * DMA-consistent mapping functions for PowerPCs that don't support
- * cache snooping. These allocate/free a region of uncached mapped
- * memory space for use with DMA devices. Alternatively, you could
- * allocate the space "normally" and use the cache management functions
- * to ensure it is consistent.
- */
-struct device;
-extern void __dma_sync(void *vaddr, size_t size, int direction);
-extern void __dma_sync_page(struct page *page, unsigned long offset,
- size_t size, int direction);
-extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
-
-#else /* ! CONFIG_NOT_COHERENT_CACHE */
-/*
- * Cache coherent cores.
- */
-
-#define __dma_sync(addr, size, rw) ((void)0)
-#define __dma_sync_page(pg, off, sz, rw) ((void)0)
-
-#endif /* ! CONFIG_NOT_COHERENT_CACHE */
-
static inline unsigned long device_to_mask(struct device *dev)
{
if (dev->dma_mask && *dev->dma_mask)
@@ -71,7 +32,6 @@ static inline unsigned long device_to_mask(struct device *dev)
#ifdef CONFIG_PPC64
extern const struct dma_map_ops dma_iommu_ops;
#endif
-extern const struct dma_map_ops dma_nommu_ops;

static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9bafb38e959e..853372fe1001 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -37,7 +37,6 @@ extern unsigned long empty_zero_page[];

extern pgd_t swapper_pg_dir[];

-int dma_pfn_limit_to_zone(u64 pfn_limit);
extern void paging_init(void);

/*
diff --git a/arch/powerpc/include/asm/swiotlb.h b/arch/powerpc/include/asm/swiotlb.h
index 26a0f12b835b..b7ff218109b3 100644
--- a/arch/powerpc/include/asm/swiotlb.h
+++ b/arch/powerpc/include/asm/swiotlb.h
@@ -13,8 +13,6 @@

#include <linux/swiotlb.h>

-extern const struct dma_map_ops powerpc_swiotlb_dma_ops;
-
extern unsigned int ppc_swiotlb_enable;
int __init swiotlb_setup_bus_notifier(void);

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 3b66f2c19c84..4e9ff65b4eff 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -33,7 +33,7 @@ obj-y := cputable.o ptrace.o syscalls.o \
process.o systbl.o idle.o \
signal.o sysfs.o cacheinfo.o time.o \
prom.o traps.o setup-common.o \
- udbg.o misc.o io.o dma.o misc_$(BITS).o \
+ udbg.o misc.o io.o misc_$(BITS).o \
of_platform.o prom_parse.o
obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
signal_64.o ptrace32.o \
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 2e682004959f..0224925c5628 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -21,7 +21,7 @@
static inline bool dma_iommu_alloc_bypass(struct device *dev)
{
return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
- dma_nommu_dma_supported(dev, dev->coherent_dma_mask);
+ dma_direct_supported(dev, dev->coherent_dma_mask);
}

static inline bool dma_iommu_map_bypass(struct device *dev,
@@ -40,8 +40,7 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
unsigned long attrs)
{
if (dma_iommu_alloc_bypass(dev))
- return __dma_nommu_alloc_coherent(dev, size, dma_handle, flag,
- attrs);
+ return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
dma_handle, dev->coherent_dma_mask, flag,
dev_to_node(dev));
@@ -52,7 +51,7 @@ static void dma_iommu_free_coherent(struct device *dev, size_t size,
unsigned long attrs)
{
if (dma_iommu_alloc_bypass(dev))
- __dma_nommu_free_coherent(dev, size, vaddr, dma_handle, attrs);
+ dma_direct_free(dev, size, vaddr, dma_handle, attrs);
else
iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
dma_handle);
@@ -69,7 +68,7 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
unsigned long attrs)
{
if (dma_iommu_map_bypass(dev, attrs))
- return dma_nommu_map_page(dev, page, offset, size, direction,
+ return dma_direct_map_page(dev, page, offset, size, direction,
attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
size, device_to_mask(dev), direction, attrs);
@@ -91,7 +90,7 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
unsigned long attrs)
{
if (dma_iommu_map_bypass(dev, attrs))
- return dma_nommu_map_sg(dev, sglist, nelems, direction, attrs);
+ return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
device_to_mask(dev), direction, attrs);
}
diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index c6f8519f8d4e..bded4127791a 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -32,28 +32,6 @@ EXPORT_SYMBOL(arch_dma_set_mask);

unsigned int ppc_swiotlb_enable;

-/*
- * At the moment, all platforms that use this code only require
- * swiotlb to be used if we're operating on HIGHMEM. Since
- * we don't ever call anything other than map_sg, unmap_sg,
- * map_page, and unmap_page on highmem, use normal dma_ops
- * for everything else.
- */
-const struct dma_map_ops powerpc_swiotlb_dma_ops = {
- .alloc = __dma_nommu_alloc_coherent,
- .free = __dma_nommu_free_coherent,
- .map_sg = swiotlb_map_sg_attrs,
- .unmap_sg = swiotlb_unmap_sg_attrs,
- .dma_supported = swiotlb_dma_supported,
- .map_page = swiotlb_map_page,
- .unmap_page = swiotlb_unmap_page,
- .sync_single_for_cpu = swiotlb_sync_single_for_cpu,
- .sync_single_for_device = swiotlb_sync_single_for_device,
- .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
- .sync_sg_for_device = swiotlb_sync_sg_for_device,
- .mapping_error = swiotlb_dma_mapping_error,
-};
-
static int ppc_swiotlb_bus_notify(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -65,7 +43,7 @@ static int ppc_swiotlb_bus_notify(struct notifier_block *nb,

/* May need to bounce if the device can't address all of DRAM */
if ((dma_get_mask(dev) + 1) < memblock_end_of_DRAM())
- set_dma_ops(dev, &powerpc_swiotlb_dma_ops);
+ set_dma_ops(dev, &swiotlb_dma_ops);

return NOTIFY_DONE;
}
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
deleted file mode 100644
index 92cc402d249f..000000000000
--- a/arch/powerpc/kernel/dma.c
+++ /dev/null
@@ -1,190 +0,0 @@
-/*
- * Copyright (C) 2006 Benjamin Herrenschmidt, IBM Corporation
- *
- * Provide default implementations of the DMA mapping callbacks for
- * directly mapped busses.
- */
-
-#include <linux/device.h>
-#include <linux/dma-direct.h>
-#include <linux/dma-debug.h>
-#include <linux/gfp.h>
-#include <linux/memblock.h>
-#include <linux/export.h>
-#include <linux/pci.h>
-#include <asm/vio.h>
-#include <asm/bug.h>
-#include <asm/machdep.h>
-#include <asm/swiotlb.h>
-#include <asm/iommu.h>
-
-/*
- * Generic direct DMA implementation
- *
- * This implementation supports a per-device offset that can be applied if
- * the address at which memory is visible to devices is not 0. Platform code
- * can set archdata.dma_data to an unsigned long holding the offset. By
- * default the offset is PCI_DRAM_OFFSET.
- */
-
-static u64 __maybe_unused get_pfn_limit(struct device *dev)
-{
- u64 pfn = (dev->coherent_dma_mask >> PAGE_SHIFT) + 1;
-
-#ifdef CONFIG_SWIOTLB
- if (dev->bus_dma_mask && dev->dma_ops == &powerpc_swiotlb_dma_ops)
- pfn = min_t(u64, pfn, dev->bus_dma_mask >> PAGE_SHIFT);
-#endif
-
- return pfn;
-}
-
-int dma_nommu_dma_supported(struct device *dev, u64 mask)
-{
-#ifdef CONFIG_PPC64
- u64 limit = phys_to_dma(dev, (memblock_end_of_DRAM() - 1));
-
- /* Limit fits in the mask, we are good */
- if (mask >= limit)
- return 1;
-
-#ifdef CONFIG_FSL_SOC
- /* Freescale gets another chance via ZONE_DMA, however
- * that will have to be refined if/when they support iommus
- */
- return 1;
-#endif
- /* Sorry ... */
- return 0;
-#else
- return 1;
-#endif
-}
-
-#ifndef CONFIG_NOT_COHERENT_CACHE
-void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
- void *ret;
- struct page *page;
- int node = dev_to_node(dev);
-#ifdef CONFIG_FSL_SOC
- u64 pfn = get_pfn_limit(dev);
- int zone;
-
- /*
- * This code should be OK on other platforms, but we have drivers that
- * don't set coherent_dma_mask. As a workaround we just ifdef it. This
- * whole routine needs some serious cleanup.
- */
-
- zone = dma_pfn_limit_to_zone(pfn);
- if (zone < 0) {
- dev_err(dev, "%s: No suitable zone for pfn %#llx\n",
- __func__, pfn);
- return NULL;
- }
-
- switch (zone) {
-#ifdef CONFIG_ZONE_DMA
- case ZONE_DMA:
- flag |= GFP_DMA;
- break;
-#endif
- };
-#endif /* CONFIG_FSL_SOC */
-
- page = alloc_pages_node(node, flag, get_order(size));
- if (page == NULL)
- return NULL;
- ret = page_address(page);
- memset(ret, 0, size);
- *dma_handle = phys_to_dma(dev,__pa(ret));
-
- return ret;
-}
-
-void __dma_nommu_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- free_pages((unsigned long)vaddr, get_order(size));
-}
-#endif /* !CONFIG_NOT_COHERENT_CACHE */
-
-int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sgl, sg, nents, i) {
- sg->dma_address = phys_to_dma(dev, sg_phys(sg));
- sg->dma_length = sg->length;
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
- }
-
- return nents;
-}
-
-dma_addr_t dma_nommu_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
-{
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- __dma_sync_page(page, offset, size, dir);
-
- return phys_to_dma(dev, page_to_phys(page)) + offset;
-}
-
-#ifdef CONFIG_NOT_COHERENT_CACHE
-static inline void dma_nommu_sync_sg(struct device *dev,
- struct scatterlist *sgl, int nents,
- enum dma_data_direction direction)
-{
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sgl, sg, nents, i)
- __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
-}
-
-static inline void dma_nommu_sync_single(struct device *dev,
- dma_addr_t dma_handle, size_t size,
- enum dma_data_direction direction)
-{
- __dma_sync(bus_to_virt(dma_handle), size, direction);
-}
-#endif
-
-const struct dma_map_ops dma_nommu_ops = {
- .alloc = __dma_nommu_alloc_coherent,
- .free = __dma_nommu_free_coherent,
- .map_sg = dma_nommu_map_sg,
- .dma_supported = dma_nommu_dma_supported,
- .map_page = dma_nommu_map_page,
-#ifdef CONFIG_NOT_COHERENT_CACHE
- .sync_single_for_cpu = dma_nommu_sync_single,
- .sync_single_for_device = dma_nommu_sync_single,
- .sync_sg_for_cpu = dma_nommu_sync_sg,
- .sync_sg_for_device = dma_nommu_sync_sg,
-#endif
-};
-EXPORT_SYMBOL(dma_nommu_ops);
-
-static int __init dma_init(void)
-{
-#ifdef CONFIG_IBMVIO
- dma_debug_add_bus(&vio_bus_type);
-#endif
-
- return 0;
-}
-fs_initcall(dma_init);
-
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fca2b0bafd82..b645b3882815 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -62,7 +62,7 @@ resource_size_t isa_mem_base;
EXPORT_SYMBOL(isa_mem_base);


-static const struct dma_map_ops *pci_dma_ops = &dma_nommu_ops;
+static const struct dma_map_ops *pci_dma_ops = &dma_direct_ops;

void set_pci_dma_ops(const struct dma_map_ops *dma_ops)
{
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 93fa0c99681e..a012e2060f02 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -792,7 +792,7 @@ void arch_setup_pdev_archdata(struct platform_device *pdev)
{
pdev->archdata.dma_mask = DMA_BIT_MASK(32);
pdev->dev.dma_mask = &pdev->archdata.dma_mask;
- set_dma_ops(&pdev->dev, &dma_nommu_ops);
+ set_dma_ops(&pdev->dev, &dma_direct_ops);
}

static __init void print_system_info(void)
diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index a016c9c356d5..75971a0619a6 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -152,8 +152,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct ppc_vm_region *head, unsi
* Allocate DMA-coherent memory space and return both the kernel remapped
* virtual and bus address for that space.
*/
-void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
struct page *page;
struct ppc_vm_region *c;
@@ -254,7 +254,7 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
/*
* free a page as defined by the above mapping.
*/
-void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
struct ppc_vm_region *c;
@@ -314,7 +314,7 @@ void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
/*
* make an area consistent.
*/
-void __dma_sync(void *vaddr, size_t size, int direction)
+static void __dma_sync(void *vaddr, size_t size, int direction)
{
unsigned long start = (unsigned long)vaddr;
unsigned long end = start + size;
@@ -340,7 +340,6 @@ void __dma_sync(void *vaddr, size_t size, int direction)
break;
}
}
-EXPORT_SYMBOL(__dma_sync);

#ifdef CONFIG_HIGHMEM
/*
@@ -387,21 +386,33 @@ static inline void __dma_sync_page_highmem(struct page *page,
* __dma_sync_page makes memory consistent. identical to __dma_sync, but
* takes a struct page instead of a virtual address
*/
-void __dma_sync_page(struct page *page, unsigned long offset,
- size_t size, int direction)
+static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir)
{
+ struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
+ unsigned offset = paddr & ~PAGE_MASK;
+
#ifdef CONFIG_HIGHMEM
- __dma_sync_page_highmem(page, offset, size, direction);
+ __dma_sync_page_highmem(page, offset, size, dir);
#else
unsigned long start = (unsigned long)page_address(page) + offset;
- __dma_sync((void *)start, size, direction);
+ __dma_sync((void *)start, size, dir);
#endif
}
-EXPORT_SYMBOL(__dma_sync_page);
+
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
+{
+ __dma_sync_page(paddr, size, dir);
+}
+
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir)
+{
+ __dma_sync_page(paddr, size, dir);
+}

/*
- * Return the PFN for a given cpu virtual address returned by
- * __dma_nommu_alloc_coherent.
+ * Return the PFN for a given cpu virtual address returned by arch_dma_alloc.
*/
long arch_dma_coherent_to_pfn(struct device *dev, void *vaddr,
dma_addr_t dma_addr)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8bff7e893bde..b76ff59ee94c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -257,25 +257,6 @@ static int __init mark_nonram_nosave(void)
*/
static unsigned long max_zone_pfns[MAX_NR_ZONES];

-/*
- * Find the least restrictive zone that is entirely below the
- * specified pfn limit. Returns < 0 if no suitable zone is found.
- *
- * pfn_limit must be u64 because it can exceed 32 bits even on 32-bit
- * systems -- the DMA limit can be higher than any possible real pfn.
- */
-int dma_pfn_limit_to_zone(u64 pfn_limit)
-{
- int i;
-
- for (i = TOP_ZONE; i >= 0; i--) {
- if (max_zone_pfns[i] <= pfn_limit)
- return i;
- }
-
- return -EPERM;
-}
-
/*
* paging_init() sets up the page tables - in fact we've already done this.
*/
diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c
index 7e4f8ca19ce8..c0e6fb270d59 100644
--- a/arch/powerpc/platforms/44x/warp.c
+++ b/arch/powerpc/platforms/44x/warp.c
@@ -47,7 +47,7 @@ static int __init warp_probe(void)
if (!of_machine_is_compatible("pika,warp"))
return 0;

- /* For __dma_nommu_alloc_coherent */
+ /* For arch_dma_alloc */
ISA_DMA_THRESHOLD = ~0L;

return 1;
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 0c4fc631cb33..b1bc3b92f54a 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -416,6 +416,8 @@ config NOT_COHERENT_CACHE
bool
depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
select ARCH_HAS_DMA_COHERENT_TO_PFN
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
default n if PPC_47x
default y

diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index 997ee0f5fc2f..348a815779c1 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -601,7 +601,7 @@ static int cell_of_bus_notify(struct notifier_block *nb, unsigned long action,
if (cell_iommu_enabled)
dev->dma_ops = &dma_iommu_ops;
else
- dev->dma_ops = &dma_nommu_ops;
+ dev->dma_ops = &dma_direct_ops;
cell_dma_dev_setup(dev);
return 0;
}
@@ -727,7 +727,7 @@ static int __init cell_iommu_init_disabled(void)
unsigned long base = 0, size;

/* When no iommu is present, we use direct DMA ops */
- set_pci_dma_ops(&dma_nommu_ops);
+ set_pci_dma_ops(&dma_direct_ops);

/* First make sure all IOC translation is turned off */
cell_disable_iommus();
diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
index f06c83f321e6..47bda2b7cbd7 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -186,7 +186,7 @@ static void pci_dma_dev_setup_pasemi(struct pci_dev *dev)
*/
if (dev->vendor == 0x1959 && dev->device == 0xa007 &&
!firmware_has_feature(FW_FEATURE_LPAR)) {
- dev->dev.dma_ops = &dma_nommu_ops;
+ dev->dev.dma_ops = &dma_direct_ops;
/*
* Set the coherent DMA mask to prevent the iommu
* being used unnecessarily
diff --git a/arch/powerpc/platforms/pasemi/setup.c b/arch/powerpc/platforms/pasemi/setup.c
index 9a6eb04cca83..afb44e3dd8d2 100644
--- a/arch/powerpc/platforms/pasemi/setup.c
+++ b/arch/powerpc/platforms/pasemi/setup.c
@@ -362,7 +362,7 @@ static int pcmcia_notify(struct notifier_block *nb, unsigned long action,
return 0;

/* We use the direct ops for localbus */
- dev->dma_ops = &dma_nommu_ops;
+ dev->dma_ops = &dma_direct_ops;

return 0;
}
diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
index 3ad74efc83bf..54ad2ae81bc8 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -1704,3 +1704,10 @@ int vio_disable_interrupts(struct vio_dev *dev)
}
EXPORT_SYMBOL(vio_disable_interrupts);
#endif /* CONFIG_PPC_PSERIES */
+
+static int __init vio_init(void)
+{
+ dma_debug_add_bus(&vio_bus_type);
+ return 0;
+}
+fs_initcall(vio_init);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index d6b7c91a9ce2..964a4aede6b1 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -126,7 +126,7 @@ static void setup_swiotlb_ops(struct pci_controller *hose)
{
if (ppc_swiotlb_enable) {
hose->controller_ops.dma_dev_setup = pci_dma_dev_setup_swiotlb;
- set_pci_dma_ops(&powerpc_swiotlb_dma_ops);
+ set_pci_dma_ops(&swiotlb_dma_ops);
}
}
#else
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index f0645ba1d728..f4ca1a4ada66 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -43,7 +43,7 @@ static bool cxl_pci_enable_device_hook(struct pci_dev *dev)
return false;
}

- set_dma_ops(&dev->dev, &dma_nommu_ops);
+ set_dma_ops(&dev->dev, &dma_direct_ops);
dev->dev.archdata.dma_offset = PAGE_OFFSET;

/*
--
2.19.0


2018-10-09 13:28:23

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 26/33] powerpc/fsl_pci: simplify fsl_pci_dma_set_mask

swiotlb will only bounce buffer the effectice dma address for the device
is smaller than the actual DMA range. Instead of flipping between the
swiotlb and nommu ops for FSL SOCs that have the second outbound window
just don't set the bus dma_mask in this case.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/sysdev/fsl_pci.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index f136567a5ed5..296ffabc9386 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -143,7 +143,7 @@ static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
* mapping that allows addressing any RAM address from across PCI.
*/
if (dev_is_pci(dev) && dma_mask >= pci64_dma_offset * 2 - 1) {
- set_dma_ops(dev, &dma_nommu_ops);
+ dev->bus_dma_mask = 0;
set_dma_offset(dev, pci64_dma_offset);
}

@@ -403,10 +403,6 @@ static void setup_pci_atmu(struct pci_controller *hose)
out_be32(&pci->piw[win_idx].piwar, piwar);
}

- /*
- * install our own dma_set_mask handler to fixup dma_ops
- * and dma_offset
- */
ppc_md.dma_set_mask = fsl_pci_dma_set_mask;

pr_info("%pOF: Setup 64-bit PCI DMA window\n", hose->dn);
--
2.19.0


2018-10-09 13:28:42

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 27/33] dma-mapping, powerpc: simplify the arch dma_set_mask override

Instead of letting the architecture supply all of dma_set_mask just
give it an additional hook selected by Kconfig.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/dma-mapping.h | 3 ---
arch/powerpc/kernel/dma-swiotlb.c | 8 ++++++++
arch/powerpc/kernel/dma.c | 12 ------------
arch/powerpc/sysdev/fsl_pci.c | 4 ----
include/linux/dma-mapping.h | 11 ++++++++---
kernel/dma/Kconfig | 3 +++
7 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7097019d8907..7ea01687995d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -126,6 +126,7 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_HAS_DEVMEM_IS_ALLOWED
+ select ARCH_HAS_DMA_SET_MASK if SWIOTLB
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index b1999880fc61..7694985f05ee 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -109,8 +109,5 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
dev->archdata.dma_offset = off;
}

-#define HAVE_ARCH_DMA_SET_MASK 1
-extern int dma_set_mask(struct device *dev, u64 dma_mask);
-
#endif /* __KERNEL__ */
#endif /* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index e05d95ff50ad..dba216dd70fd 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -22,6 +22,14 @@
#include <asm/swiotlb.h>
#include <asm/dma.h>

+bool arch_dma_set_mask(struct device *dev, u64 dma_mask)
+{
+ if (!ppc_md.dma_set_mask)
+ return 0;
+ return ppc_md.dma_set_mask(dev, dma_mask);
+}
+EXPORT_SYMBOL(arch_dma_set_mask);
+
unsigned int ppc_swiotlb_enable;

/*
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index e3d2c15b209c..795afe387c91 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -197,18 +197,6 @@ const struct dma_map_ops dma_nommu_ops = {
};
EXPORT_SYMBOL(dma_nommu_ops);

-int dma_set_mask(struct device *dev, u64 dma_mask)
-{
- if (ppc_md.dma_set_mask)
- return ppc_md.dma_set_mask(dev, dma_mask);
-
- if (!dev->dma_mask || !dma_supported(dev, dma_mask))
- return -EIO;
- *dev->dma_mask = dma_mask;
- return 0;
-}
-EXPORT_SYMBOL(dma_set_mask);
-
static int __init dma_init(void)
{
#ifdef CONFIG_IBMVIO
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 296ffabc9386..cb91a3d113d1 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -135,9 +135,6 @@ static inline void setup_swiotlb_ops(struct pci_controller *hose) {}

static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
{
- if (!dev->dma_mask || !dma_supported(dev, dma_mask))
- return -EIO;
-
/*
* Fix up PCI devices that are able to DMA to the large inbound
* mapping that allows addressing any RAM address from across PCI.
@@ -147,7 +144,6 @@ static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
set_dma_offset(dev, pci64_dma_offset);
}

- *dev->dma_mask = dma_mask;
return 0;
}

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 15bd41447025..8dd19e66c0e5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -598,18 +598,23 @@ static inline int dma_supported(struct device *dev, u64 mask)
return ops->dma_supported(dev, mask);
}

-#ifndef HAVE_ARCH_DMA_SET_MASK
+#ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
+bool arch_dma_set_mask(struct device *dev, u64 mask);
+#else
+#define arch_dma_set_mask(dev, mask) true
+#endif
+
static inline int dma_set_mask(struct device *dev, u64 mask)
{
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
-
+ if (!arch_dma_set_mask(dev, mask))
+ return -EIO;
dma_check_mask(dev, mask);

*dev->dma_mask = mask;
return 0;
}
-#endif

static inline u64 dma_get_mask(struct device *dev)
{
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 645c7a2ecde8..951045c90c2c 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -16,6 +16,9 @@ config ARCH_DMA_ADDR_T_64BIT
config ARCH_HAS_DMA_COHERENCE_H
bool

+config ARCH_HAS_DMA_SET_MASK
+ bool
+
config HAVE_GENERIC_DMA_COHERENT
bool

--
2.19.0


2018-10-09 13:28:44

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 28/33] powerpc/dma: use phys_to_dma instead of get_dma_offset

Use the standard portable helper instead of the powerpc specific one,
which is about to go away.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/kernel/dma-swiotlb.c | 2 +-
arch/powerpc/kernel/dma.c | 10 +++++-----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index dba216dd70fd..d33caff8c684 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -11,7 +11,7 @@
*
*/

-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
#include <linux/memblock.h>
#include <linux/pfn.h>
#include <linux/of_platform.h>
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 795afe387c91..7f7f3a069b63 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -6,7 +6,7 @@
*/

#include <linux/device.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
#include <linux/dma-debug.h>
#include <linux/gfp.h>
#include <linux/memblock.h>
@@ -42,7 +42,7 @@ static u64 __maybe_unused get_pfn_limit(struct device *dev)
int dma_nommu_dma_supported(struct device *dev, u64 mask)
{
#ifdef CONFIG_PPC64
- u64 limit = get_dma_offset(dev) + (memblock_end_of_DRAM() - 1);
+ u64 limit = phys_to_dma(dev, (memblock_end_of_DRAM() - 1));

/* Limit fits in the mask, we are good */
if (mask >= limit)
@@ -100,7 +100,7 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
return NULL;
ret = page_address(page);
memset(ret, 0, size);
- *dma_handle = __pa(ret) + get_dma_offset(dev);
+ *dma_handle = phys_to_dma(dev,__pa(ret));

return ret;
}
@@ -139,7 +139,7 @@ int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
int i;

for_each_sg(sgl, sg, nents, i) {
- sg->dma_address = sg_phys(sg) + get_dma_offset(dev);
+ sg->dma_address = phys_to_dma(dev, sg_phys(sg));
sg->dma_length = sg->length;

if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
@@ -158,7 +158,7 @@ dma_addr_t dma_nommu_map_page(struct device *dev, struct page *page,
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
__dma_sync_page(page, offset, size, dir);

- return page_to_phys(page) + offset + get_dma_offset(dev);
+ return phys_to_dma(dev, page_to_phys(page)) + offset;
}

#ifdef CONFIG_NOT_COHERENT_CACHE
--
2.19.0


2018-10-09 13:28:54

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 31/33] powerpc/dma: remove dma_nommu_mmap_coherent

The coherent cache version of this function already is functionally
identicall to the default version, and by defining the
arch_dma_coherent_to_pfn hook the same is ture for the noncoherent
version as well.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/dma-mapping.h | 4 ----
arch/powerpc/kernel/dma-iommu.c | 1 -
arch/powerpc/kernel/dma-swiotlb.c | 1 -
arch/powerpc/kernel/dma.c | 19 -------------------
arch/powerpc/mm/dma-noncoherent.c | 7 +++++--
arch/powerpc/platforms/Kconfig.cputype | 1 +
arch/powerpc/platforms/pseries/vio.c | 1 -
7 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index e12439ae8211..ababad4b07a7 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -25,10 +25,6 @@ extern void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
extern void __dma_nommu_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs);
-extern int dma_nommu_mmap_coherent(struct device *dev,
- struct vm_area_struct *vma,
- void *cpu_addr, dma_addr_t handle,
- size_t size, unsigned long attrs);
int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction direction,
unsigned long attrs);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 7fa3636636fa..2e682004959f 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -172,7 +172,6 @@ int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
const struct dma_map_ops dma_iommu_ops = {
.alloc = dma_iommu_alloc_coherent,
.free = dma_iommu_free_coherent,
- .mmap = dma_nommu_mmap_coherent,
.map_sg = dma_iommu_map_sg,
.unmap_sg = dma_iommu_unmap_sg,
.dma_supported = dma_iommu_dma_supported,
diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index d33caff8c684..c6f8519f8d4e 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -42,7 +42,6 @@ unsigned int ppc_swiotlb_enable;
const struct dma_map_ops powerpc_swiotlb_dma_ops = {
.alloc = __dma_nommu_alloc_coherent,
.free = __dma_nommu_free_coherent,
- .mmap = dma_nommu_mmap_coherent,
.map_sg = swiotlb_map_sg_attrs,
.unmap_sg = swiotlb_unmap_sg_attrs,
.dma_supported = swiotlb_dma_supported,
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 7f7f3a069b63..92cc402d249f 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -113,24 +113,6 @@ void __dma_nommu_free_coherent(struct device *dev, size_t size,
}
#endif /* !CONFIG_NOT_COHERENT_CACHE */

-int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
- void *cpu_addr, dma_addr_t handle, size_t size,
- unsigned long attrs)
-{
- unsigned long pfn;
-
-#ifdef CONFIG_NOT_COHERENT_CACHE
- vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- pfn = __dma_get_coherent_pfn((unsigned long)cpu_addr);
-#else
- pfn = page_to_pfn(virt_to_page(cpu_addr));
-#endif
- return remap_pfn_range(vma, vma->vm_start,
- pfn + vma->vm_pgoff,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-}
-
int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction direction,
unsigned long attrs)
@@ -184,7 +166,6 @@ static inline void dma_nommu_sync_single(struct device *dev,
const struct dma_map_ops dma_nommu_ops = {
.alloc = __dma_nommu_alloc_coherent,
.free = __dma_nommu_free_coherent,
- .mmap = dma_nommu_mmap_coherent,
.map_sg = dma_nommu_map_sg,
.dma_supported = dma_nommu_dma_supported,
.map_page = dma_nommu_map_page,
diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index 965ce3d19f5a..a016c9c356d5 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -30,6 +30,7 @@
#include <linux/types.h>
#include <linux/highmem.h>
#include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
#include <linux/export.h>

#include <asm/tlbflush.h>
@@ -400,14 +401,16 @@ EXPORT_SYMBOL(__dma_sync_page);

/*
* Return the PFN for a given cpu virtual address returned by
- * __dma_nommu_alloc_coherent. This is used by dma_mmap_coherent()
+ * __dma_nommu_alloc_coherent.
*/
-unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr)
+long arch_dma_coherent_to_pfn(struct device *dev, void *vaddr,
+ dma_addr_t dma_addr)
{
/* This should always be populated, so we don't test every
* level. If that fails, we'll have a nice crash which
* will be as good as a BUG_ON()
*/
+ unsigned long cpu_addr = (unsigned long)vaddr;
pgd_t *pgd = pgd_offset_k(cpu_addr);
pud_t *pud = pud_offset(pgd, cpu_addr);
pmd_t *pmd = pmd_offset(pud, cpu_addr);
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 6c6a7c72cae4..0c4fc631cb33 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -415,6 +415,7 @@ config NR_CPUS
config NOT_COHERENT_CACHE
bool
depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
+ select ARCH_HAS_DMA_COHERENT_TO_PFN
default n if PPC_47x
default y

diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
index 1dfff53ebd7f..3ad74efc83bf 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -603,7 +603,6 @@ static void vio_dma_iommu_unmap_sg(struct device *dev,
static const struct dma_map_ops vio_dma_mapping_ops = {
.alloc = vio_dma_iommu_alloc_coherent,
.free = vio_dma_iommu_free_coherent,
- .mmap = dma_nommu_mmap_coherent,
.map_sg = vio_dma_iommu_map_sg,
.unmap_sg = vio_dma_iommu_unmap_sg,
.map_page = vio_dma_iommu_map_page,
--
2.19.0


2018-10-09 13:28:57

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 25/33] cxl: drop the dma_set_mask callback from vphb

The CXL code never even looks at the dma mask, so there is no good
reason for this sanity check. Remove it because it gets in the way
of the dma ops refactoring.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/misc/cxl/vphb.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 7908633d9204..49da2f744bbf 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -11,17 +11,6 @@
#include <misc/cxl.h>
#include "cxl.h"

-static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
-{
- if (dma_mask < DMA_BIT_MASK(64)) {
- pr_info("%s only 64bit DMA supported on CXL", __func__);
- return -EIO;
- }
-
- *(pdev->dev.dma_mask) = dma_mask;
- return 0;
-}
-
static int cxl_pci_probe_mode(struct pci_bus *bus)
{
return PCI_PROBE_NORMAL;
@@ -220,7 +209,6 @@ static struct pci_controller_ops cxl_pci_controller_ops =
.reset_secondary_bus = cxl_pci_reset_secondary_bus,
.setup_msi_irqs = cxl_setup_msi_irqs,
.teardown_msi_irqs = cxl_teardown_msi_irqs,
- .dma_set_mask = cxl_dma_set_mask,
};

int cxl_pci_vphb_add(struct cxl_afu *afu)
--
2.19.0


2018-10-09 13:29:03

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 24/33] powerpc/dma: fix an off-by-one in dma_capable

We need to compare the last byte in the dma range and not the one after it
for the bus_dma_mask, just like we do for the regular dma_mask. Fix this
cleanly by merging the two comparisms into one.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/dma-direct.h | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h
index e00ab5d0612d..92d8aed86422 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -4,15 +4,11 @@

static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
{
-#ifdef CONFIG_SWIOTLB
- if (dev->bus_dma_mask && addr + size > dev->bus_dma_mask)
- return false;
-#endif
-
if (!dev->dma_mask)
return false;

- return addr + size - 1 <= *dev->dma_mask;
+ return addr + size - 1 <=
+ min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
}

static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
--
2.19.0


2018-10-09 13:29:05

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 23/33] powerpc/dma: remove max_direct_dma_addr

The max_direct_dma_addr duplicates the bus_dma_mask field in struct
device. Use the generic field instead.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/device.h | 3 ---
arch/powerpc/include/asm/dma-direct.h | 4 +---
arch/powerpc/kernel/dma-swiotlb.c | 20 --------------------
arch/powerpc/kernel/dma.c | 5 ++---
arch/powerpc/sysdev/fsl_pci.c | 3 +--
5 files changed, 4 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 3814e1c2d4bc..a130be13ee83 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -38,9 +38,6 @@ struct dev_archdata {
#ifdef CONFIG_IOMMU_API
void *iommu_domain;
#endif
-#ifdef CONFIG_SWIOTLB
- dma_addr_t max_direct_dma_addr;
-#endif
#ifdef CONFIG_PPC64
struct pci_dn *pci_data;
#endif
diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h
index 7702875aabb7..e00ab5d0612d 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -5,9 +5,7 @@
static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
{
#ifdef CONFIG_SWIOTLB
- struct dev_archdata *sd = &dev->archdata;
-
- if (sd->max_direct_dma_addr && addr + size > sd->max_direct_dma_addr)
+ if (dev->bus_dma_mask && addr + size > dev->bus_dma_mask)
return false;
#endif

diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index 60cb86338233..e05d95ff50ad 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -24,21 +24,6 @@

unsigned int ppc_swiotlb_enable;

-static u64 swiotlb_powerpc_get_required(struct device *dev)
-{
- u64 end, mask, max_direct_dma_addr = dev->archdata.max_direct_dma_addr;
-
- end = memblock_end_of_DRAM();
- if (max_direct_dma_addr && end > max_direct_dma_addr)
- end = max_direct_dma_addr;
- end += get_dma_offset(dev);
-
- mask = 1ULL << (fls64(end) - 1);
- mask += mask - 1;
-
- return mask;
-}
-
/*
* At the moment, all platforms that use this code only require
* swiotlb to be used if we're operating on HIGHMEM. Since
@@ -60,22 +45,17 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = swiotlb_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
- .get_required_mask = swiotlb_powerpc_get_required,
};

static int ppc_swiotlb_bus_notify(struct notifier_block *nb,
unsigned long action, void *data)
{
struct device *dev = data;
- struct dev_archdata *sd;

/* We are only intereted in device addition */
if (action != BUS_NOTIFY_ADD_DEVICE)
return 0;

- sd = &dev->archdata;
- sd->max_direct_dma_addr = 0;
-
/* May need to bounce if the device can't address all of DRAM */
if ((dma_get_mask(dev) + 1) < memblock_end_of_DRAM())
set_dma_ops(dev, &powerpc_swiotlb_dma_ops);
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index bef91b8ad064..e3d2c15b209c 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -30,11 +30,10 @@
static u64 __maybe_unused get_pfn_limit(struct device *dev)
{
u64 pfn = (dev->coherent_dma_mask >> PAGE_SHIFT) + 1;
- struct dev_archdata __maybe_unused *sd = &dev->archdata;

#ifdef CONFIG_SWIOTLB
- if (sd->max_direct_dma_addr && dev->dma_ops == &powerpc_swiotlb_dma_ops)
- pfn = min_t(u64, pfn, sd->max_direct_dma_addr >> PAGE_SHIFT);
+ if (dev->bus_dma_mask && dev->dma_ops == &powerpc_swiotlb_dma_ops)
+ pfn = min_t(u64, pfn, dev->bus_dma_mask >> PAGE_SHIFT);
#endif

return pfn;
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 561f97d698cc..f136567a5ed5 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -117,9 +117,8 @@ static u64 pci64_dma_offset;
static void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
{
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
- struct dev_archdata *sd = &pdev->dev.archdata;

- sd->max_direct_dma_addr =
+ pdev->dev.bus_dma_mask =
hose->dma_window_base_cur + hose->dma_window_size;
}

--
2.19.0


2018-10-09 13:29:24

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 22/33] powerpc/dma: move pci_dma_dev_setup_swiotlb to fsl_pci.c

pci_dma_dev_setup_swiotlb is only used by the fsl_pci code, and closely
related to it, so fsl_pci.c seems like a better place for it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/swiotlb.h | 2 --
arch/powerpc/kernel/dma-swiotlb.c | 11 -----------
arch/powerpc/sysdev/fsl_pci.c | 9 +++++++++
3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/swiotlb.h b/arch/powerpc/include/asm/swiotlb.h
index f65ecf57b66c..26a0f12b835b 100644
--- a/arch/powerpc/include/asm/swiotlb.h
+++ b/arch/powerpc/include/asm/swiotlb.h
@@ -18,8 +18,6 @@ extern const struct dma_map_ops powerpc_swiotlb_dma_ops;
extern unsigned int ppc_swiotlb_enable;
int __init swiotlb_setup_bus_notifier(void);

-extern void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev);
-
#ifdef CONFIG_SWIOTLB
void swiotlb_detect_4g(void);
#else
diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index 93a4622563c6..60cb86338233 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -63,17 +63,6 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = {
.get_required_mask = swiotlb_powerpc_get_required,
};

-void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
-{
- struct pci_controller *hose;
- struct dev_archdata *sd;
-
- hose = pci_bus_to_host(pdev->bus);
- sd = &pdev->dev.archdata;
- sd->max_direct_dma_addr =
- hose->dma_window_base_cur + hose->dma_window_size;
-}
-
static int ppc_swiotlb_bus_notify(struct notifier_block *nb,
unsigned long action, void *data)
{
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 918be816b097..561f97d698cc 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -114,6 +114,15 @@ static struct pci_ops fsl_indirect_pcie_ops =
static u64 pci64_dma_offset;

#ifdef CONFIG_SWIOTLB
+static void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
+{
+ struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+ struct dev_archdata *sd = &pdev->dev.archdata;
+
+ sd->max_direct_dma_addr =
+ hose->dma_window_base_cur + hose->dma_window_size;
+}
+
static void setup_swiotlb_ops(struct pci_controller *hose)
{
if (ppc_swiotlb_enable) {
--
2.19.0


2018-10-09 13:29:43

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 14/33] powerpc/dart: use the generic iommu bypass code

Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/sysdev/dart_iommu.c | 45 +++++++++++---------------------
1 file changed, 15 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index ce5dd2048f57..e7d1645a2d2e 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -360,13 +360,6 @@ static void iommu_table_dart_setup(void)
set_bit(iommu_table_dart.it_size - 1, iommu_table_dart.it_map);
}

-static void pci_dma_dev_setup_dart(struct pci_dev *dev)
-{
- if (dart_is_u4)
- set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
- set_iommu_table_base(&dev->dev, &iommu_table_dart);
-}
-
static void pci_dma_bus_setup_dart(struct pci_bus *bus)
{
if (!iommu_table_dart_inited) {
@@ -390,27 +383,16 @@ static bool dart_device_on_pcie(struct device *dev)
return false;
}

-static int dart_dma_set_mask(struct device *dev, u64 dma_mask)
+static void pci_dma_dev_setup_dart(struct pci_dev *dev)
{
- if (!dev->dma_mask || !dma_supported(dev, dma_mask))
- return -EIO;
-
- /* U4 supports a DART bypass, we use it for 64-bit capable
- * devices to improve performances. However, that only works
- * for devices connected to U4 own PCIe interface, not bridged
- * through hypertransport. We need the device to support at
- * least 40 bits of addresses.
- */
- if (dart_device_on_pcie(dev) && dma_mask >= DMA_BIT_MASK(40)) {
- dev_info(dev, "Using 64-bit DMA iommu bypass\n");
- set_dma_ops(dev, &dma_nommu_ops);
- } else {
- dev_info(dev, "Using 32-bit DMA via iommu\n");
- set_dma_ops(dev, &dma_iommu_ops);
- }
+ if (dart_is_u4 && dart_device_on_pcie(&dev->dev))
+ set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
+ set_iommu_table_base(&dev->dev, &iommu_table_dart);
+}

- *dev->dma_mask = dma_mask;
- return 0;
+static bool iommu_bypass_supported_dart(struct pci_dev *dev, u64 mask)
+{
+ return dart_is_u4 && dart_device_on_pcie(&dev->dev);
}

void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops)
@@ -430,12 +412,15 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops)
if (dart_init(dn) != 0)
return;

- /* Setup bypass if supported */
- if (dart_is_u4)
- ppc_md.dma_set_mask = dart_dma_set_mask;
-
+ /*
+ * U4 supports a DART bypass, we use it for 64-bit capable devices to
+ * improve performance. However, that only works for devices connected
+ * to the U4 own PCIe interface, not bridged through hypertransport.
+ * We need the device to support at least 40 bits of addresses.
+ */
controller_ops->dma_dev_setup = pci_dma_dev_setup_dart;
controller_ops->dma_bus_setup = pci_dma_bus_setup_dart;
+ controller_ops->iommu_bypass_supported = iommu_bypass_supported_dart;

/* Setup pci_dma ops */
set_pci_dma_ops(&dma_iommu_ops);
--
2.19.0


2018-10-09 13:29:46

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 18/33] powerpc/dma: stop overriding dma_get_required_mask

The ppc_md and pci_controller_ops methods are unused now and can be
removed. The dma_nommu implementation is generic to the generic one
except for using max_pfn instead of calling into the memblock API,
and all other dma_map_ops instances implement a method of their own.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/device.h | 2 --
arch/powerpc/include/asm/dma-mapping.h | 2 --
arch/powerpc/include/asm/machdep.h | 2 --
arch/powerpc/include/asm/pci-bridge.h | 1 -
arch/powerpc/kernel/dma.c | 42 --------------------------
drivers/base/platform.c | 2 --
6 files changed, 51 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 1aa53318b4bc..3814e1c2d4bc 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -59,6 +59,4 @@ struct pdev_archdata {
u64 dma_mask;
};

-#define ARCH_HAS_DMA_GET_REQUIRED_MASK
-
#endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 59c090b7eaac..b1999880fc61 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,5 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
#define HAVE_ARCH_DMA_SET_MASK 1
extern int dma_set_mask(struct device *dev, u64 dma_mask);

-extern u64 __dma_get_required_mask(struct device *dev);
-
#endif /* __KERNEL__ */
#endif /* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index a47de82fb8e2..99f06102474e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -47,9 +47,7 @@ struct machdep_calls {
#endif
#endif /* CONFIG_PPC64 */

- /* Platform set_dma_mask and dma_get_required_mask overrides */
int (*dma_set_mask)(struct device *dev, u64 dma_mask);
- u64 (*dma_get_required_mask)(struct device *dev);

int (*probe)(void);
void (*setup_arch)(void); /* Optional, may be NULL */
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 5c7a1e7ffc8a..aace7033fa02 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -46,7 +46,6 @@ struct pci_controller_ops {
#endif

int (*dma_set_mask)(struct pci_dev *pdev, u64 dma_mask);
- u64 (*dma_get_required_mask)(struct pci_dev *pdev);

void (*shutdown)(struct pci_controller *hose);
};
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 716b7eab3ee7..9ca9d2cec4ed 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -197,18 +197,6 @@ int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
return nents;
}

-static u64 dma_nommu_get_required_mask(struct device *dev)
-{
- u64 end, mask;
-
- end = memblock_end_of_DRAM() + get_dma_offset(dev);
-
- mask = 1ULL << (fls64(end) - 1);
- mask += mask - 1;
-
- return mask;
-}
-
dma_addr_t dma_nommu_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction dir, unsigned long attrs)
@@ -246,7 +234,6 @@ const struct dma_map_ops dma_nommu_ops = {
.map_sg = dma_nommu_map_sg,
.dma_supported = dma_nommu_dma_supported,
.map_page = dma_nommu_map_page,
- .get_required_mask = dma_nommu_get_required_mask,
#ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_for_cpu = dma_nommu_sync_single,
.sync_single_for_device = dma_nommu_sync_single,
@@ -294,35 +281,6 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
}
EXPORT_SYMBOL(dma_set_mask);

-u64 __dma_get_required_mask(struct device *dev)
-{
- const struct dma_map_ops *dma_ops = get_dma_ops(dev);
-
- if (unlikely(dma_ops == NULL))
- return 0;
-
- if (dma_ops->get_required_mask)
- return dma_ops->get_required_mask(dev);
-
- return DMA_BIT_MASK(8 * sizeof(dma_addr_t));
-}
-
-u64 dma_get_required_mask(struct device *dev)
-{
- if (ppc_md.dma_get_required_mask)
- return ppc_md.dma_get_required_mask(dev);
-
- if (dev_is_pci(dev)) {
- struct pci_dev *pdev = to_pci_dev(dev);
- struct pci_controller *phb = pci_bus_to_host(pdev->bus);
- if (phb->controller_ops.dma_get_required_mask)
- return phb->controller_ops.dma_get_required_mask(pdev);
- }
-
- return __dma_get_required_mask(dev);
-}
-EXPORT_SYMBOL_GPL(dma_get_required_mask);
-
static int __init dma_init(void)
{
#ifdef CONFIG_IBMVIO
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 23cf4427f425..057357521561 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1179,7 +1179,6 @@ int __init platform_bus_init(void)
return error;
}

-#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
static u64 dma_default_get_required_mask(struct device *dev)
{
u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
@@ -1208,7 +1207,6 @@ u64 dma_get_required_mask(struct device *dev)
return dma_default_get_required_mask(dev);
}
EXPORT_SYMBOL_GPL(dma_get_required_mask);
-#endif

static __initdata LIST_HEAD(early_platform_driver_list);
static __initdata LIST_HEAD(early_platform_device_list);
--
2.19.0


2018-10-09 13:29:47

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 13/33] powerpc/dart: remove dead cleanup code in iommu_init_early_dart

If dart_init failed we didn't have a chance to setup dma or controller
ops yet, so there is no point in resetting them.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/sysdev/dart_iommu.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 5ca3e22d0512..ce5dd2048f57 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -428,7 +428,7 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops)

/* Initialize the DART HW */
if (dart_init(dn) != 0)
- goto bail;
+ return;

/* Setup bypass if supported */
if (dart_is_u4)
@@ -439,15 +439,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops)

/* Setup pci_dma ops */
set_pci_dma_ops(&dma_iommu_ops);
- return;
-
- bail:
- /* If init failed, use direct iommu and null setup functions */
- controller_ops->dma_dev_setup = NULL;
- controller_ops->dma_bus_setup = NULL;
-
- /* Setup pci_dma ops */
- set_pci_dma_ops(&dma_nommu_ops);
}

#ifdef CONFIG_PM
--
2.19.0


2018-10-09 13:30:12

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 19/33] powerpc/pci: remove the dma_set_mask pci_controller ops methods

Unused now.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/pci-bridge.h | 2 --
arch/powerpc/kernel/dma.c | 7 -------
2 files changed, 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index aace7033fa02..a50703af7db3 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -45,8 +45,6 @@ struct pci_controller_ops {
void (*teardown_msi_irqs)(struct pci_dev *pdev);
#endif

- int (*dma_set_mask)(struct pci_dev *pdev, u64 dma_mask);
-
void (*shutdown)(struct pci_controller *hose);
};

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 9ca9d2cec4ed..78ef4076e15c 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -267,13 +267,6 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
if (ppc_md.dma_set_mask)
return ppc_md.dma_set_mask(dev, dma_mask);

- if (dev_is_pci(dev)) {
- struct pci_dev *pdev = to_pci_dev(dev);
- struct pci_controller *phb = pci_bus_to_host(pdev->bus);
- if (phb->controller_ops.dma_set_mask)
- return phb->controller_ops.dma_set_mask(pdev, dma_mask);
- }
-
if (!dev->dma_mask || !dma_supported(dev, dma_mask))
return -EIO;
*dev->dma_mask = dma_mask;
--
2.19.0


2018-10-09 13:30:23

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 08/33] powerpc/dma: handle iommu bypass in dma_iommu_ops

Add a new iommu_bypass flag to struct dev_archdata so that the dma_iommu
implementation can handle the direct mapping transparently instead of
switiching ops around. Setting of this flag is controlled by new
pci_controller_ops method.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/device.h | 5 ++
arch/powerpc/include/asm/dma-mapping.h | 7 +++
arch/powerpc/include/asm/pci-bridge.h | 2 +
arch/powerpc/kernel/dma-iommu.c | 70 +++++++++++++++++++++++---
arch/powerpc/kernel/dma.c | 17 +++----
5 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 0245bfcaac32..1aa53318b4bc 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -19,6 +19,11 @@ struct iommu_table;
* drivers/macintosh/macio_asic.c
*/
struct dev_archdata {
+ /*
+ * Set to %true if the dma_iommu_ops are requested to use a direct
+ * window instead of dynamically mapping memory.
+ */
+ bool iommu_bypass : 1;
/*
* These two used to be a union. However, with the hybrid ops we need
* both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index dacd0f93f2b2..27283eb68c50 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -29,6 +29,13 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
size_t size, unsigned long attrs);
+int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction direction,
+ unsigned long attrs);
+dma_addr_t dma_nommu_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+int dma_nommu_dma_supported(struct device *dev, u64 mask);

#ifdef CONFIG_NOT_COHERENT_CACHE
/*
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 94d449031b18..5c7a1e7ffc8a 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -19,6 +19,8 @@ struct device_node;
struct pci_controller_ops {
void (*dma_dev_setup)(struct pci_dev *pdev);
void (*dma_bus_setup)(struct pci_bus *bus);
+ bool (*iommu_bypass_supported)(struct pci_dev *pdev,
+ u64 mask);

int (*probe_mode)(struct pci_bus *bus);

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 0613278abf9f..c865e15ad024 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -6,12 +6,30 @@
* busses using the iommu infrastructure
*/

+#include <linux/dma-direct.h>
+#include <linux/pci.h>
#include <asm/iommu.h>

/*
* Generic iommu implementation
*/

+/*
+ * The coherent mask may be smaller than the real mask, check if we can
+ * really use a direct window.
+ */
+static inline bool dma_iommu_alloc_bypass(struct device *dev)
+{
+ return dev->archdata.iommu_bypass &&
+ dma_nommu_dma_supported(dev, dev->coherent_dma_mask);
+}
+
+static inline bool dma_iommu_map_bypass(struct device *dev,
+ unsigned long attrs)
+{
+ return dev->archdata.iommu_bypass;
+}
+
/* Allocates a contiguous real buffer and creates mappings over it.
* Returns the virtual address of the buffer and sets dma_handle
* to the dma address (mapping) of the first page.
@@ -20,6 +38,9 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag,
unsigned long attrs)
{
+ if (dma_iommu_alloc_bypass(dev))
+ return __dma_nommu_alloc_coherent(dev, size, dma_handle, flag,
+ attrs);
return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
dma_handle, dev->coherent_dma_mask, flag,
dev_to_node(dev));
@@ -29,7 +50,11 @@ static void dma_iommu_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
{
- iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
+ if (dma_iommu_alloc_bypass(dev))
+ __dma_nommu_free_coherent(dev, size, vaddr, dma_handle, attrs);
+ else
+ iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
+ dma_handle);
}

/* Creates TCEs for a user provided buffer. The user buffer must be
@@ -42,6 +67,9 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction direction,
unsigned long attrs)
{
+ if (dma_iommu_map_bypass(dev, attrs))
+ return dma_nommu_map_page(dev, page, offset, size, direction,
+ attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
size, device_to_mask(dev), direction, attrs);
}
@@ -51,8 +79,9 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction direction,
unsigned long attrs)
{
- iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size, direction,
- attrs);
+ if (!dma_iommu_map_bypass(dev, attrs))
+ iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
+ direction, attrs);
}


@@ -60,6 +89,8 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction direction,
unsigned long attrs)
{
+ if (dma_iommu_map_bypass(dev, attrs))
+ return dma_nommu_map_sg(dev, sglist, nelems, direction, attrs);
return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
device_to_mask(dev), direction, attrs);
}
@@ -68,10 +99,20 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction direction,
unsigned long attrs)
{
- ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
+ if (!dma_iommu_map_bypass(dev, attrs))
+ ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
direction, attrs);
}

+static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct pci_controller *phb = pci_bus_to_host(pdev->bus);
+
+ return phb->controller_ops.iommu_bypass_supported &&
+ phb->controller_ops.iommu_bypass_supported(pdev, mask);
+}
+
/* We support DMA to/from any memory page via the iommu */
int dma_iommu_dma_supported(struct device *dev, u64 mask)
{
@@ -83,22 +124,39 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
return 0;
}

+ if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
+ dev->archdata.iommu_bypass = true;
+ dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
+ return 1;
+ }
+
if (tbl->it_offset > (mask >> tbl->it_page_shift)) {
dev_info(dev, "Warning: IOMMU offset too big for device mask\n");
dev_info(dev, "mask: 0x%08llx, table offset: 0x%08lx\n",
mask, tbl->it_offset << tbl->it_page_shift);
return 0;
- } else
- return 1;
+ }
+
+ dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
+ dev->archdata.iommu_bypass = false;
+ return 1;
}

u64 dma_iommu_get_required_mask(struct device *dev)
{
struct iommu_table *tbl = get_iommu_table_base(dev);
u64 mask;
+
if (!tbl)
return 0;

+ if (dev_is_pci(dev)) {
+ u64 bypass_mask = dma_direct_get_required_mask(dev);
+
+ if (dma_iommu_bypass_supported(dev, bypass_mask))
+ return bypass_mask;
+ }
+
mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
mask += mask - 1;

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 7078d72baec2..716b7eab3ee7 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -40,7 +40,7 @@ static u64 __maybe_unused get_pfn_limit(struct device *dev)
return pfn;
}

-static int dma_nommu_dma_supported(struct device *dev, u64 mask)
+int dma_nommu_dma_supported(struct device *dev, u64 mask)
{
#ifdef CONFIG_PPC64
u64 limit = get_dma_offset(dev) + (memblock_end_of_DRAM() - 1);
@@ -177,9 +177,9 @@ int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
vma->vm_page_prot);
}

-static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction direction,
- unsigned long attrs)
+int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction direction,
+ unsigned long attrs)
{
struct scatterlist *sg;
int i;
@@ -209,12 +209,9 @@ static u64 dma_nommu_get_required_mask(struct device *dev)
return mask;
}

-static inline dma_addr_t dma_nommu_map_page(struct device *dev,
- struct page *page,
- unsigned long offset,
- size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t dma_nommu_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
__dma_sync_page(page, offset, size, dir);
--
2.19.0


2018-10-09 13:30:32

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 16/33] powerpc/powernv: remove dead npu-dma code

This code has been unused since it was merged and is in the way of
cleaning up the DMA code, thus remove it.

This effectively reverts commit 5d2aa710 ("powerpc/powernv: Add support
for Nvlink NPUs").

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/include/asm/pci.h | 3 -
arch/powerpc/include/asm/powernv.h | 23 -
arch/powerpc/platforms/powernv/Makefile | 2 +-
arch/powerpc/platforms/powernv/npu-dma.c | 999 ----------------------
arch/powerpc/platforms/powernv/pci-ioda.c | 243 ------
arch/powerpc/platforms/powernv/pci.h | 11 -
6 files changed, 1 insertion(+), 1280 deletions(-)
delete mode 100644 arch/powerpc/platforms/powernv/npu-dma.c

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 2af9ded80540..a01d2e3d6ff9 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -127,7 +127,4 @@ extern void pcibios_scan_phb(struct pci_controller *hose);

#endif /* __KERNEL__ */

-extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
-extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
-
#endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
index 2f3ff7a27881..4848a6b3c6b2 100644
--- a/arch/powerpc/include/asm/powernv.h
+++ b/arch/powerpc/include/asm/powernv.h
@@ -11,33 +11,10 @@
#define _ASM_POWERNV_H

#ifdef CONFIG_PPC_POWERNV
-#define NPU2_WRITE 1
extern void powernv_set_nmmu_ptcr(unsigned long ptcr);
-extern struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
- unsigned long flags,
- void (*cb)(struct npu_context *, void *),
- void *priv);
-extern void pnv_npu2_destroy_context(struct npu_context *context,
- struct pci_dev *gpdev);
-extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
- unsigned long *flags, unsigned long *status,
- int count);
-
void pnv_tm_init(void);
#else
static inline void powernv_set_nmmu_ptcr(unsigned long ptcr) { }
-static inline struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
- unsigned long flags,
- struct npu_context *(*cb)(struct npu_context *, void *),
- void *priv) { return ERR_PTR(-ENODEV); }
-static inline void pnv_npu2_destroy_context(struct npu_context *context,
- struct pci_dev *gpdev) { }
-
-static inline int pnv_npu2_handle_fault(struct npu_context *context,
- uintptr_t *ea, unsigned long *flags,
- unsigned long *status, int count) {
- return -ENODEV;
-}

static inline void pnv_tm_init(void) { }
static inline void pnv_power9_force_smt4(void) { }
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b540ce8eec55..2b13e9dd137c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,7 +6,7 @@ obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o

obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
+obj-$(CONFIG_PCI) += pci.o pci-ioda.o pci-ioda-tce.o
obj-$(CONFIG_CXL_BASE) += pci-cxl.o
obj-$(CONFIG_EEH) += eeh-powernv.o
obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
deleted file mode 100644
index 8006c54a91e3..000000000000
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ /dev/null
@@ -1,999 +0,0 @@
-/*
- * This file implements the DMA operations for NVLink devices. The NPU
- * devices all point to the same iommu table as the parent PCI device.
- *
- * Copyright Alistair Popple, IBM Corporation 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-
-#include <linux/slab.h>
-#include <linux/mmu_notifier.h>
-#include <linux/mmu_context.h>
-#include <linux/of.h>
-#include <linux/export.h>
-#include <linux/pci.h>
-#include <linux/memblock.h>
-#include <linux/iommu.h>
-#include <linux/debugfs.h>
-
-#include <asm/debugfs.h>
-#include <asm/tlb.h>
-#include <asm/powernv.h>
-#include <asm/reg.h>
-#include <asm/opal.h>
-#include <asm/io.h>
-#include <asm/iommu.h>
-#include <asm/pnv-pci.h>
-#include <asm/msi_bitmap.h>
-#include <asm/opal.h>
-
-#include "powernv.h"
-#include "pci.h"
-
-#define npu_to_phb(x) container_of(x, struct pnv_phb, npu)
-
-/*
- * spinlock to protect initialisation of an npu_context for a particular
- * mm_struct.
- */
-static DEFINE_SPINLOCK(npu_context_lock);
-
-/*
- * When an address shootdown range exceeds this threshold we invalidate the
- * entire TLB on the GPU for the given PID rather than each specific address in
- * the range.
- */
-static uint64_t atsd_threshold = 2 * 1024 * 1024;
-static struct dentry *atsd_threshold_dentry;
-
-/*
- * Other types of TCE cache invalidation are not functional in the
- * hardware.
- */
-static struct pci_dev *get_pci_dev(struct device_node *dn)
-{
- struct pci_dn *pdn = PCI_DN(dn);
-
- return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
- pdn->busno, pdn->devfn);
-}
-
-/* Given a NPU device get the associated PCI device. */
-struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev)
-{
- struct device_node *dn;
- struct pci_dev *gpdev;
-
- if (WARN_ON(!npdev))
- return NULL;
-
- if (WARN_ON(!npdev->dev.of_node))
- return NULL;
-
- /* Get assoicated PCI device */
- dn = of_parse_phandle(npdev->dev.of_node, "ibm,gpu", 0);
- if (!dn)
- return NULL;
-
- gpdev = get_pci_dev(dn);
- of_node_put(dn);
-
- return gpdev;
-}
-EXPORT_SYMBOL(pnv_pci_get_gpu_dev);
-
-/* Given the real PCI device get a linked NPU device. */
-struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index)
-{
- struct device_node *dn;
- struct pci_dev *npdev;
-
- if (WARN_ON(!gpdev))
- return NULL;
-
- /* Not all PCI devices have device-tree nodes */
- if (!gpdev->dev.of_node)
- return NULL;
-
- /* Get assoicated PCI device */
- dn = of_parse_phandle(gpdev->dev.of_node, "ibm,npu", index);
- if (!dn)
- return NULL;
-
- npdev = get_pci_dev(dn);
- of_node_put(dn);
-
- return npdev;
-}
-EXPORT_SYMBOL(pnv_pci_get_npu_dev);
-
-#define NPU_DMA_OP_UNSUPPORTED() \
- dev_err_once(dev, "%s operation unsupported for NVLink devices\n", \
- __func__)
-
-static void *dma_npu_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
- NPU_DMA_OP_UNSUPPORTED();
- return NULL;
-}
-
-static void dma_npu_free(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- NPU_DMA_OP_UNSUPPORTED();
-}
-
-static dma_addr_t dma_npu_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction direction,
- unsigned long attrs)
-{
- NPU_DMA_OP_UNSUPPORTED();
- return 0;
-}
-
-static int dma_npu_map_sg(struct device *dev, struct scatterlist *sglist,
- int nelems, enum dma_data_direction direction,
- unsigned long attrs)
-{
- NPU_DMA_OP_UNSUPPORTED();
- return 0;
-}
-
-static int dma_npu_dma_supported(struct device *dev, u64 mask)
-{
- NPU_DMA_OP_UNSUPPORTED();
- return 0;
-}
-
-static u64 dma_npu_get_required_mask(struct device *dev)
-{
- NPU_DMA_OP_UNSUPPORTED();
- return 0;
-}
-
-static const struct dma_map_ops dma_npu_ops = {
- .map_page = dma_npu_map_page,
- .map_sg = dma_npu_map_sg,
- .alloc = dma_npu_alloc,
- .free = dma_npu_free,
- .dma_supported = dma_npu_dma_supported,
- .get_required_mask = dma_npu_get_required_mask,
-};
-
-/*
- * Returns the PE assoicated with the PCI device of the given
- * NPU. Returns the linked pci device if pci_dev != NULL.
- */
-static struct pnv_ioda_pe *get_gpu_pci_dev_and_pe(struct pnv_ioda_pe *npe,
- struct pci_dev **gpdev)
-{
- struct pnv_phb *phb;
- struct pci_controller *hose;
- struct pci_dev *pdev;
- struct pnv_ioda_pe *pe;
- struct pci_dn *pdn;
-
- pdev = pnv_pci_get_gpu_dev(npe->pdev);
- if (!pdev)
- return NULL;
-
- pdn = pci_get_pdn(pdev);
- if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
- return NULL;
-
- hose = pci_bus_to_host(pdev->bus);
- phb = hose->private_data;
- pe = &phb->ioda.pe_array[pdn->pe_number];
-
- if (gpdev)
- *gpdev = pdev;
-
- return pe;
-}
-
-long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
- struct iommu_table *tbl)
-{
- struct pnv_phb *phb = npe->phb;
- int64_t rc;
- const unsigned long size = tbl->it_indirect_levels ?
- tbl->it_level_size : tbl->it_size;
- const __u64 start_addr = tbl->it_offset << tbl->it_page_shift;
- const __u64 win_size = tbl->it_size << tbl->it_page_shift;
-
- pe_info(npe, "Setting up window %llx..%llx pg=%lx\n",
- start_addr, start_addr + win_size - 1,
- IOMMU_PAGE_SIZE(tbl));
-
- rc = opal_pci_map_pe_dma_window(phb->opal_id,
- npe->pe_number,
- npe->pe_number,
- tbl->it_indirect_levels + 1,
- __pa(tbl->it_base),
- size << 3,
- IOMMU_PAGE_SIZE(tbl));
- if (rc) {
- pe_err(npe, "Failed to configure TCE table, err %lld\n", rc);
- return rc;
- }
- pnv_pci_ioda2_tce_invalidate_entire(phb, false);
-
- /* Add the table to the list so its TCE cache will get invalidated */
- pnv_pci_link_table_and_group(phb->hose->node, num,
- tbl, &npe->table_group);
-
- return 0;
-}
-
-long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num)
-{
- struct pnv_phb *phb = npe->phb;
- int64_t rc;
-
- pe_info(npe, "Removing DMA window\n");
-
- rc = opal_pci_map_pe_dma_window(phb->opal_id, npe->pe_number,
- npe->pe_number,
- 0/* levels */, 0/* table address */,
- 0/* table size */, 0/* page size */);
- if (rc) {
- pe_err(npe, "Unmapping failed, ret = %lld\n", rc);
- return rc;
- }
- pnv_pci_ioda2_tce_invalidate_entire(phb, false);
-
- pnv_pci_unlink_table_and_group(npe->table_group.tables[num],
- &npe->table_group);
-
- return 0;
-}
-
-/*
- * Enables 32 bit DMA on NPU.
- */
-static void pnv_npu_dma_set_32(struct pnv_ioda_pe *npe)
-{
- struct pci_dev *gpdev;
- struct pnv_ioda_pe *gpe;
- int64_t rc;
-
- /*
- * Find the assoicated PCI devices and get the dma window
- * information from there.
- */
- if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
- return;
-
- gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
- if (!gpe)
- return;
-
- rc = pnv_npu_set_window(npe, 0, gpe->table_group.tables[0]);
-
- /*
- * We don't initialise npu_pe->tce32_table as we always use
- * dma_npu_ops which are nops.
- */
- set_dma_ops(&npe->pdev->dev, &dma_npu_ops);
-}
-
-/*
- * Enables bypass mode on the NPU. The NPU only supports one
- * window per link, so bypass needs to be explicitly enabled or
- * disabled. Unlike for a PHB3 bypass and non-bypass modes can't be
- * active at the same time.
- */
-static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
-{
- struct pnv_phb *phb = npe->phb;
- int64_t rc = 0;
- phys_addr_t top = memblock_end_of_DRAM();
-
- if (phb->type != PNV_PHB_NPU_NVLINK || !npe->pdev)
- return -EINVAL;
-
- rc = pnv_npu_unset_window(npe, 0);
- if (rc != OPAL_SUCCESS)
- return rc;
-
- /* Enable the bypass window */
-
- top = roundup_pow_of_two(top);
- dev_info(&npe->pdev->dev, "Enabling bypass for PE %x\n",
- npe->pe_number);
- rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
- npe->pe_number, npe->pe_number,
- 0 /* bypass base */, top);
-
- if (rc == OPAL_SUCCESS)
- pnv_pci_ioda2_tce_invalidate_entire(phb, false);
-
- return rc;
-}
-
-void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass)
-{
- int i;
- struct pnv_phb *phb;
- struct pci_dn *pdn;
- struct pnv_ioda_pe *npe;
- struct pci_dev *npdev;
-
- for (i = 0; ; ++i) {
- npdev = pnv_pci_get_npu_dev(gpdev, i);
-
- if (!npdev)
- break;
-
- pdn = pci_get_pdn(npdev);
- if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
- return;
-
- phb = pci_bus_to_host(npdev->bus)->private_data;
-
- /* We only do bypass if it's enabled on the linked device */
- npe = &phb->ioda.pe_array[pdn->pe_number];
-
- if (bypass) {
- dev_info(&npdev->dev,
- "Using 64-bit DMA iommu bypass\n");
- pnv_npu_dma_set_bypass(npe);
- } else {
- dev_info(&npdev->dev, "Using 32-bit DMA via iommu\n");
- pnv_npu_dma_set_32(npe);
- }
- }
-}
-
-/* Switch ownership from platform code to external user (e.g. VFIO) */
-void pnv_npu_take_ownership(struct pnv_ioda_pe *npe)
-{
- struct pnv_phb *phb = npe->phb;
- int64_t rc;
-
- /*
- * Note: NPU has just a single TVE in the hardware which means that
- * while used by the kernel, it can have either 32bit window or
- * DMA bypass but never both. So we deconfigure 32bit window only
- * if it was enabled at the moment of ownership change.
- */
- if (npe->table_group.tables[0]) {
- pnv_npu_unset_window(npe, 0);
- return;
- }
-
- /* Disable bypass */
- rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
- npe->pe_number, npe->pe_number,
- 0 /* bypass base */, 0);
- if (rc) {
- pe_err(npe, "Failed to disable bypass, err %lld\n", rc);
- return;
- }
- pnv_pci_ioda2_tce_invalidate_entire(npe->phb, false);
-}
-
-struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
-{
- struct pnv_phb *phb = npe->phb;
- struct pci_bus *pbus = phb->hose->bus;
- struct pci_dev *npdev, *gpdev = NULL, *gptmp;
- struct pnv_ioda_pe *gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
-
- if (!gpe || !gpdev)
- return NULL;
-
- list_for_each_entry(npdev, &pbus->devices, bus_list) {
- gptmp = pnv_pci_get_gpu_dev(npdev);
-
- if (gptmp != gpdev)
- continue;
-
- pe_info(gpe, "Attached NPU %s\n", dev_name(&npdev->dev));
- iommu_group_add_device(gpe->table_group.group, &npdev->dev);
- }
-
- return gpe;
-}
-
-/* Maximum number of nvlinks per npu */
-#define NV_MAX_LINKS 6
-
-/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
-static int max_npu2_index;
-
-struct npu_context {
- struct mm_struct *mm;
- struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
- struct mmu_notifier mn;
- struct kref kref;
- bool nmmu_flush;
-
- /* Callback to stop translation requests on a given GPU */
- void (*release_cb)(struct npu_context *context, void *priv);
-
- /*
- * Private pointer passed to the above callback for usage by
- * device drivers.
- */
- void *priv;
-};
-
-struct mmio_atsd_reg {
- struct npu *npu;
- int reg;
-};
-
-/*
- * Find a free MMIO ATSD register and mark it in use. Return -ENOSPC
- * if none are available.
- */
-static int get_mmio_atsd_reg(struct npu *npu)
-{
- int i;
-
- for (i = 0; i < npu->mmio_atsd_count; i++) {
- if (!test_bit(i, &npu->mmio_atsd_usage))
- if (!test_and_set_bit_lock(i, &npu->mmio_atsd_usage))
- return i;
- }
-
- return -ENOSPC;
-}
-
-static void put_mmio_atsd_reg(struct npu *npu, int reg)
-{
- clear_bit_unlock(reg, &npu->mmio_atsd_usage);
-}
-
-/* MMIO ATSD register offsets */
-#define XTS_ATSD_AVA 1
-#define XTS_ATSD_STAT 2
-
-static void mmio_launch_invalidate(struct mmio_atsd_reg *mmio_atsd_reg,
- unsigned long launch, unsigned long va)
-{
- struct npu *npu = mmio_atsd_reg->npu;
- int reg = mmio_atsd_reg->reg;
-
- __raw_writeq_be(va, npu->mmio_atsd_regs[reg] + XTS_ATSD_AVA);
- eieio();
- __raw_writeq_be(launch, npu->mmio_atsd_regs[reg]);
-}
-
-static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
- unsigned long pid, bool flush)
-{
- int i;
- unsigned long launch;
-
- for (i = 0; i <= max_npu2_index; i++) {
- if (mmio_atsd_reg[i].reg < 0)
- continue;
-
- /* IS set to invalidate matching PID */
- launch = PPC_BIT(12);
-
- /* PRS set to process-scoped */
- launch |= PPC_BIT(13);
-
- /* AP */
- launch |= (u64)
- mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
-
- /* PID */
- launch |= pid << PPC_BITLSHIFT(38);
-
- /* No flush */
- launch |= !flush << PPC_BITLSHIFT(39);
-
- /* Invalidating the entire process doesn't use a va */
- mmio_launch_invalidate(&mmio_atsd_reg[i], launch, 0);
- }
-}
-
-static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
- unsigned long va, unsigned long pid, bool flush)
-{
- int i;
- unsigned long launch;
-
- for (i = 0; i <= max_npu2_index; i++) {
- if (mmio_atsd_reg[i].reg < 0)
- continue;
-
- /* IS set to invalidate target VA */
- launch = 0;
-
- /* PRS set to process scoped */
- launch |= PPC_BIT(13);
-
- /* AP */
- launch |= (u64)
- mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
-
- /* PID */
- launch |= pid << PPC_BITLSHIFT(38);
-
- /* No flush */
- launch |= !flush << PPC_BITLSHIFT(39);
-
- mmio_launch_invalidate(&mmio_atsd_reg[i], launch, va);
- }
-}
-
-#define mn_to_npu_context(x) container_of(x, struct npu_context, mn)
-
-static void mmio_invalidate_wait(
- struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
-{
- struct npu *npu;
- int i, reg;
-
- /* Wait for all invalidations to complete */
- for (i = 0; i <= max_npu2_index; i++) {
- if (mmio_atsd_reg[i].reg < 0)
- continue;
-
- /* Wait for completion */
- npu = mmio_atsd_reg[i].npu;
- reg = mmio_atsd_reg[i].reg;
- while (__raw_readq(npu->mmio_atsd_regs[reg] + XTS_ATSD_STAT))
- cpu_relax();
- }
-}
-
-/*
- * Acquires all the address translation shootdown (ATSD) registers required to
- * launch an ATSD on all links this npu_context is active on.
- */
-static void acquire_atsd_reg(struct npu_context *npu_context,
- struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
-{
- int i, j;
- struct npu *npu;
- struct pci_dev *npdev;
- struct pnv_phb *nphb;
-
- for (i = 0; i <= max_npu2_index; i++) {
- mmio_atsd_reg[i].reg = -1;
- for (j = 0; j < NV_MAX_LINKS; j++) {
- /*
- * There are no ordering requirements with respect to
- * the setup of struct npu_context, but to ensure
- * consistent behaviour we need to ensure npdev[][] is
- * only read once.
- */
- npdev = READ_ONCE(npu_context->npdev[i][j]);
- if (!npdev)
- continue;
-
- nphb = pci_bus_to_host(npdev->bus)->private_data;
- npu = &nphb->npu;
- mmio_atsd_reg[i].npu = npu;
- mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
- while (mmio_atsd_reg[i].reg < 0) {
- mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
- cpu_relax();
- }
- break;
- }
- }
-}
-
-/*
- * Release previously acquired ATSD registers. To avoid deadlocks the registers
- * must be released in the same order they were acquired above in
- * acquire_atsd_reg.
- */
-static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
-{
- int i;
-
- for (i = 0; i <= max_npu2_index; i++) {
- /*
- * We can't rely on npu_context->npdev[][] being the same here
- * as when acquire_atsd_reg() was called, hence we use the
- * values stored in mmio_atsd_reg during the acquire phase
- * rather than re-reading npdev[][].
- */
- if (mmio_atsd_reg[i].reg < 0)
- continue;
-
- put_mmio_atsd_reg(mmio_atsd_reg[i].npu, mmio_atsd_reg[i].reg);
- }
-}
-
-/*
- * Invalidate either a single address or an entire PID depending on
- * the value of va.
- */
-static void mmio_invalidate(struct npu_context *npu_context, int va,
- unsigned long address, bool flush)
-{
- struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS];
- unsigned long pid = npu_context->mm->context.id;
-
- if (npu_context->nmmu_flush)
- /*
- * Unfortunately the nest mmu does not support flushing specific
- * addresses so we have to flush the whole mm once before
- * shooting down the GPU translation.
- */
- flush_all_mm(npu_context->mm);
-
- /*
- * Loop over all the NPUs this process is active on and launch
- * an invalidate.
- */
- acquire_atsd_reg(npu_context, mmio_atsd_reg);
- if (va)
- mmio_invalidate_va(mmio_atsd_reg, address, pid, flush);
- else
- mmio_invalidate_pid(mmio_atsd_reg, pid, flush);
-
- mmio_invalidate_wait(mmio_atsd_reg);
- if (flush) {
- /*
- * The GPU requires two flush ATSDs to ensure all entries have
- * been flushed. We use PID 0 as it will never be used for a
- * process on the GPU.
- */
- mmio_invalidate_pid(mmio_atsd_reg, 0, true);
- mmio_invalidate_wait(mmio_atsd_reg);
- mmio_invalidate_pid(mmio_atsd_reg, 0, true);
- mmio_invalidate_wait(mmio_atsd_reg);
- }
- release_atsd_reg(mmio_atsd_reg);
-}
-
-static void pnv_npu2_mn_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
-{
- struct npu_context *npu_context = mn_to_npu_context(mn);
-
- /* Call into device driver to stop requests to the NMMU */
- if (npu_context->release_cb)
- npu_context->release_cb(npu_context, npu_context->priv);
-
- /*
- * There should be no more translation requests for this PID, but we
- * need to ensure any entries for it are removed from the TLB.
- */
- mmio_invalidate(npu_context, 0, 0, true);
-}
-
-static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
- struct mm_struct *mm,
- unsigned long address,
- pte_t pte)
-{
- struct npu_context *npu_context = mn_to_npu_context(mn);
-
- mmio_invalidate(npu_context, 1, address, true);
-}
-
-static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
- struct mm_struct *mm,
- unsigned long start, unsigned long end)
-{
- struct npu_context *npu_context = mn_to_npu_context(mn);
- unsigned long address;
-
- if (end - start > atsd_threshold) {
- /*
- * Just invalidate the entire PID if the address range is too
- * large.
- */
- mmio_invalidate(npu_context, 0, 0, true);
- } else {
- for (address = start; address < end; address += PAGE_SIZE)
- mmio_invalidate(npu_context, 1, address, false);
-
- /* Do the flush only on the final addess == end */
- mmio_invalidate(npu_context, 1, address, true);
- }
-}
-
-static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
- .release = pnv_npu2_mn_release,
- .change_pte = pnv_npu2_mn_change_pte,
- .invalidate_range = pnv_npu2_mn_invalidate_range,
-};
-
-/*
- * Call into OPAL to setup the nmmu context for the current task in
- * the NPU. This must be called to setup the context tables before the
- * GPU issues ATRs. pdev should be a pointed to PCIe GPU device.
- *
- * A release callback should be registered to allow a device driver to
- * be notified that it should not launch any new translation requests
- * as the final TLB invalidate is about to occur.
- *
- * Returns an error if there no contexts are currently available or a
- * npu_context which should be passed to pnv_npu2_handle_fault().
- *
- * mmap_sem must be held in write mode and must not be called from interrupt
- * context.
- */
-struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
- unsigned long flags,
- void (*cb)(struct npu_context *, void *),
- void *priv)
-{
- int rc;
- u32 nvlink_index;
- struct device_node *nvlink_dn;
- struct mm_struct *mm = current->mm;
- struct pnv_phb *nphb;
- struct npu *npu;
- struct npu_context *npu_context;
-
- /*
- * At present we don't support GPUs connected to multiple NPUs and I'm
- * not sure the hardware does either.
- */
- struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
-
- if (!firmware_has_feature(FW_FEATURE_OPAL))
- return ERR_PTR(-ENODEV);
-
- if (!npdev)
- /* No nvlink associated with this GPU device */
- return ERR_PTR(-ENODEV);
-
- nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
- if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
- &nvlink_index)))
- return ERR_PTR(-ENODEV);
-
- if (!mm || mm->context.id == 0) {
- /*
- * Kernel thread contexts are not supported and context id 0 is
- * reserved on the GPU.
- */
- return ERR_PTR(-EINVAL);
- }
-
- nphb = pci_bus_to_host(npdev->bus)->private_data;
- npu = &nphb->npu;
-
- /*
- * Setup the NPU context table for a particular GPU. These need to be
- * per-GPU as we need the tables to filter ATSDs when there are no
- * active contexts on a particular GPU. It is safe for these to be
- * called concurrently with destroy as the OPAL call takes appropriate
- * locks and refcounts on init/destroy.
- */
- rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags,
- PCI_DEVID(gpdev->bus->number, gpdev->devfn));
- if (rc < 0)
- return ERR_PTR(-ENOSPC);
-
- /*
- * We store the npu pci device so we can more easily get at the
- * associated npus.
- */
- spin_lock(&npu_context_lock);
- npu_context = mm->context.npu_context;
- if (npu_context) {
- if (npu_context->release_cb != cb ||
- npu_context->priv != priv) {
- spin_unlock(&npu_context_lock);
- opal_npu_destroy_context(nphb->opal_id, mm->context.id,
- PCI_DEVID(gpdev->bus->number,
- gpdev->devfn));
- return ERR_PTR(-EINVAL);
- }
-
- WARN_ON(!kref_get_unless_zero(&npu_context->kref));
- }
- spin_unlock(&npu_context_lock);
-
- if (!npu_context) {
- /*
- * We can set up these fields without holding the
- * npu_context_lock as the npu_context hasn't been returned to
- * the caller meaning it can't be destroyed. Parallel allocation
- * is protected against by mmap_sem.
- */
- rc = -ENOMEM;
- npu_context = kzalloc(sizeof(struct npu_context), GFP_KERNEL);
- if (npu_context) {
- kref_init(&npu_context->kref);
- npu_context->mm = mm;
- npu_context->mn.ops = &nv_nmmu_notifier_ops;
- rc = __mmu_notifier_register(&npu_context->mn, mm);
- }
-
- if (rc) {
- kfree(npu_context);
- opal_npu_destroy_context(nphb->opal_id, mm->context.id,
- PCI_DEVID(gpdev->bus->number,
- gpdev->devfn));
- return ERR_PTR(rc);
- }
-
- mm->context.npu_context = npu_context;
- }
-
- npu_context->release_cb = cb;
- npu_context->priv = priv;
-
- /*
- * npdev is a pci_dev pointer setup by the PCI code. We assign it to
- * npdev[][] to indicate to the mmu notifiers that an invalidation
- * should also be sent over this nvlink. The notifiers don't use any
- * other fields in npu_context, so we just need to ensure that when they
- * deference npu_context->npdev[][] it is either a valid pointer or
- * NULL.
- */
- WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
-
- if (!nphb->npu.nmmu_flush) {
- /*
- * If we're not explicitly flushing ourselves we need to mark
- * the thread for global flushes
- */
- npu_context->nmmu_flush = false;
- mm_context_add_copro(mm);
- } else
- npu_context->nmmu_flush = true;
-
- return npu_context;
-}
-EXPORT_SYMBOL(pnv_npu2_init_context);
-
-static void pnv_npu2_release_context(struct kref *kref)
-{
- struct npu_context *npu_context =
- container_of(kref, struct npu_context, kref);
-
- if (!npu_context->nmmu_flush)
- mm_context_remove_copro(npu_context->mm);
-
- npu_context->mm->context.npu_context = NULL;
-}
-
-/*
- * Destroy a context on the given GPU. May free the npu_context if it is no
- * longer active on any GPUs. Must not be called from interrupt context.
- */
-void pnv_npu2_destroy_context(struct npu_context *npu_context,
- struct pci_dev *gpdev)
-{
- int removed;
- struct pnv_phb *nphb;
- struct npu *npu;
- struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
- struct device_node *nvlink_dn;
- u32 nvlink_index;
-
- if (WARN_ON(!npdev))
- return;
-
- if (!firmware_has_feature(FW_FEATURE_OPAL))
- return;
-
- nphb = pci_bus_to_host(npdev->bus)->private_data;
- npu = &nphb->npu;
- nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
- if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
- &nvlink_index)))
- return;
- WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
- opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
- PCI_DEVID(gpdev->bus->number, gpdev->devfn));
- spin_lock(&npu_context_lock);
- removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
- spin_unlock(&npu_context_lock);
-
- /*
- * We need to do this outside of pnv_npu2_release_context so that it is
- * outside the spinlock as mmu_notifier_destroy uses SRCU.
- */
- if (removed) {
- mmu_notifier_unregister(&npu_context->mn,
- npu_context->mm);
-
- kfree(npu_context);
- }
-
-}
-EXPORT_SYMBOL(pnv_npu2_destroy_context);
-
-/*
- * Assumes mmap_sem is held for the contexts associated mm.
- */
-int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
- unsigned long *flags, unsigned long *status, int count)
-{
- u64 rc = 0, result = 0;
- int i, is_write;
- struct page *page[1];
-
- /* mmap_sem should be held so the struct_mm must be present */
- struct mm_struct *mm = context->mm;
-
- if (!firmware_has_feature(FW_FEATURE_OPAL))
- return -ENODEV;
-
- WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
-
- for (i = 0; i < count; i++) {
- is_write = flags[i] & NPU2_WRITE;
- rc = get_user_pages_remote(NULL, mm, ea[i], 1,
- is_write ? FOLL_WRITE : 0,
- page, NULL, NULL);
-
- /*
- * To support virtualised environments we will have to do an
- * access to the page to ensure it gets faulted into the
- * hypervisor. For the moment virtualisation is not supported in
- * other areas so leave the access out.
- */
- if (rc != 1) {
- status[i] = rc;
- result = -EFAULT;
- continue;
- }
-
- status[i] = 0;
- put_page(page[0]);
- }
-
- return result;
-}
-EXPORT_SYMBOL(pnv_npu2_handle_fault);
-
-int pnv_npu2_init(struct pnv_phb *phb)
-{
- unsigned int i;
- u64 mmio_atsd;
- struct device_node *dn;
- struct pci_dev *gpdev;
- static int npu_index;
- uint64_t rc = 0;
-
- if (!atsd_threshold_dentry) {
- atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
- 0600, powerpc_debugfs_root, &atsd_threshold);
- }
-
- phb->npu.nmmu_flush =
- of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
- for_each_child_of_node(phb->hose->dn, dn) {
- gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
- if (gpdev) {
- rc = opal_npu_map_lpar(phb->opal_id,
- PCI_DEVID(gpdev->bus->number, gpdev->devfn),
- 0, 0);
- if (rc)
- dev_err(&gpdev->dev,
- "Error %lld mapping device to LPAR\n",
- rc);
- }
- }
-
- for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
- i, &mmio_atsd); i++)
- phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
-
- pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
- phb->npu.mmio_atsd_count = i;
- phb->npu.mmio_atsd_usage = 0;
- npu_index++;
- if (WARN_ON(npu_index >= NV_MAX_NPUS))
- return -ENOSPC;
- max_npu2_index = npu_index;
- phb->npu.index = npu_index;
-
- return 0;
-}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 913175ba1c10..b6db65917bb4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1203,75 +1203,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
return pe;
}

-static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
-{
- int pe_num, found_pe = false, rc;
- long rid;
- struct pnv_ioda_pe *pe;
- struct pci_dev *gpu_pdev;
- struct pci_dn *npu_pdn;
- struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
- struct pnv_phb *phb = hose->private_data;
-
- /*
- * Due to a hardware errata PE#0 on the NPU is reserved for
- * error handling. This means we only have three PEs remaining
- * which need to be assigned to four links, implying some
- * links must share PEs.
- *
- * To achieve this we assign PEs such that NPUs linking the
- * same GPU get assigned the same PE.
- */
- gpu_pdev = pnv_pci_get_gpu_dev(npu_pdev);
- for (pe_num = 0; pe_num < phb->ioda.total_pe_num; pe_num++) {
- pe = &phb->ioda.pe_array[pe_num];
- if (!pe->pdev)
- continue;
-
- if (pnv_pci_get_gpu_dev(pe->pdev) == gpu_pdev) {
- /*
- * This device has the same peer GPU so should
- * be assigned the same PE as the existing
- * peer NPU.
- */
- dev_info(&npu_pdev->dev,
- "Associating to existing PE %x\n", pe_num);
- pci_dev_get(npu_pdev);
- npu_pdn = pci_get_pdn(npu_pdev);
- rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
- npu_pdn->pe_number = pe_num;
- phb->ioda.pe_rmap[rid] = pe->pe_number;
-
- /* Map the PE to this link */
- rc = opal_pci_set_pe(phb->opal_id, pe_num, rid,
- OpalPciBusAll,
- OPAL_COMPARE_RID_DEVICE_NUMBER,
- OPAL_COMPARE_RID_FUNCTION_NUMBER,
- OPAL_MAP_PE);
- WARN_ON(rc != OPAL_SUCCESS);
- found_pe = true;
- break;
- }
- }
-
- if (!found_pe)
- /*
- * Could not find an existing PE so allocate a new
- * one.
- */
- return pnv_ioda_setup_dev_PE(npu_pdev);
- else
- return pe;
-}
-
-static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
-{
- struct pci_dev *pdev;
-
- list_for_each_entry(pdev, &bus->devices, bus_list)
- pnv_ioda_setup_npu_PE(pdev);
-}
-
static void pnv_pci_ioda_setup_PEs(void)
{
struct pci_controller *hose, *tmp;
@@ -1281,13 +1212,6 @@ static void pnv_pci_ioda_setup_PEs(void)

list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
phb = hose->private_data;
- if (phb->type == PNV_PHB_NPU_NVLINK) {
- /* PE#0 is needed for error reporting */
- pnv_ioda_reserve_pe(phb, 0);
- pnv_ioda_setup_npu_PEs(hose->bus);
- if (phb->model == PNV_PHB_MODEL_NPU2)
- pnv_npu2_init(phb);
- }
if (phb->type == PNV_PHB_NPU_OCAPI) {
bus = hose->bus;
list_for_each_entry(pdev, &bus->devices, bus_list)
@@ -1871,9 +1795,6 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
}
*pdev->dev.dma_mask = dma_mask;

- /* Update peer npu devices */
- pnv_npu_try_dma_set_bypass(pdev, bypass);
-
return 0;
}

@@ -2119,14 +2040,6 @@ static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
}
}

-void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
-{
- if (phb->model == PNV_PHB_MODEL_NPU || phb->model == PNV_PHB_MODEL_PHB3)
- pnv_pci_phb3_tce_invalidate_entire(phb, rm);
- else
- opal_pci_tce_kill(phb->opal_id, OPAL_PCI_TCE_KILL, 0, 0, 0, 0);
-}
-
static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
long npages, unsigned long uaddr,
enum dma_data_direction direction,
@@ -2615,137 +2528,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
.take_ownership = pnv_ioda2_take_ownership,
.release_ownership = pnv_ioda2_release_ownership,
};
-
-static int gpe_table_group_to_npe_cb(struct device *dev, void *opaque)
-{
- struct pci_controller *hose;
- struct pnv_phb *phb;
- struct pnv_ioda_pe **ptmppe = opaque;
- struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
- struct pci_dn *pdn = pci_get_pdn(pdev);
-
- if (!pdn || pdn->pe_number == IODA_INVALID_PE)
- return 0;
-
- hose = pci_bus_to_host(pdev->bus);
- phb = hose->private_data;
- if (phb->type != PNV_PHB_NPU_NVLINK)
- return 0;
-
- *ptmppe = &phb->ioda.pe_array[pdn->pe_number];
-
- return 1;
-}
-
-/*
- * This returns PE of associated NPU.
- * This assumes that NPU is in the same IOMMU group with GPU and there is
- * no other PEs.
- */
-static struct pnv_ioda_pe *gpe_table_group_to_npe(
- struct iommu_table_group *table_group)
-{
- struct pnv_ioda_pe *npe = NULL;
- int ret = iommu_group_for_each_dev(table_group->group, &npe,
- gpe_table_group_to_npe_cb);
-
- BUG_ON(!ret || !npe);
-
- return npe;
-}
-
-static long pnv_pci_ioda2_npu_set_window(struct iommu_table_group *table_group,
- int num, struct iommu_table *tbl)
-{
- struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
- int num2 = (num == 0) ? 1 : 0;
- long ret = pnv_pci_ioda2_set_window(table_group, num, tbl);
-
- if (ret)
- return ret;
-
- if (table_group->tables[num2])
- pnv_npu_unset_window(npe, num2);
-
- ret = pnv_npu_set_window(npe, num, tbl);
- if (ret) {
- pnv_pci_ioda2_unset_window(table_group, num);
- if (table_group->tables[num2])
- pnv_npu_set_window(npe, num2,
- table_group->tables[num2]);
- }
-
- return ret;
-}
-
-static long pnv_pci_ioda2_npu_unset_window(
- struct iommu_table_group *table_group,
- int num)
-{
- struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
- int num2 = (num == 0) ? 1 : 0;
- long ret = pnv_pci_ioda2_unset_window(table_group, num);
-
- if (ret)
- return ret;
-
- if (!npe->table_group.tables[num])
- return 0;
-
- ret = pnv_npu_unset_window(npe, num);
- if (ret)
- return ret;
-
- if (table_group->tables[num2])
- ret = pnv_npu_set_window(npe, num2, table_group->tables[num2]);
-
- return ret;
-}
-
-static void pnv_ioda2_npu_take_ownership(struct iommu_table_group *table_group)
-{
- /*
- * Detach NPU first as pnv_ioda2_take_ownership() will destroy
- * the iommu_table if 32bit DMA is enabled.
- */
- pnv_npu_take_ownership(gpe_table_group_to_npe(table_group));
- pnv_ioda2_take_ownership(table_group);
-}
-
-static struct iommu_table_group_ops pnv_pci_ioda2_npu_ops = {
- .get_table_size = pnv_pci_ioda2_get_table_size,
- .create_table = pnv_pci_ioda2_create_table_userspace,
- .set_window = pnv_pci_ioda2_npu_set_window,
- .unset_window = pnv_pci_ioda2_npu_unset_window,
- .take_ownership = pnv_ioda2_npu_take_ownership,
- .release_ownership = pnv_ioda2_release_ownership,
-};
-
-static void pnv_pci_ioda_setup_iommu_api(void)
-{
- struct pci_controller *hose, *tmp;
- struct pnv_phb *phb;
- struct pnv_ioda_pe *pe, *gpe;
-
- /*
- * Now we have all PHBs discovered, time to add NPU devices to
- * the corresponding IOMMU groups.
- */
- list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
- phb = hose->private_data;
-
- if (phb->type != PNV_PHB_NPU_NVLINK)
- continue;
-
- list_for_each_entry(pe, &phb->ioda.pe_list, list) {
- gpe = pnv_pci_npu_setup_iommu(pe);
- if (gpe)
- gpe->table_group.ops = &pnv_pci_ioda2_npu_ops;
- }
- }
-}
-#else /* !CONFIG_IOMMU_API */
-static void pnv_pci_ioda_setup_iommu_api(void) { };
#endif

static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
@@ -3242,7 +3024,6 @@ static void pnv_pci_enable_bridges(void)
static void pnv_pci_ioda_fixup(void)
{
pnv_pci_ioda_setup_PEs();
- pnv_pci_ioda_setup_iommu_api();
pnv_pci_ioda_create_dbgfs();

pnv_pci_enable_bridges();
@@ -3689,27 +3470,6 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
.shutdown = pnv_pci_ioda_shutdown,
};

-static int pnv_npu_dma_set_mask(struct pci_dev *npdev, u64 dma_mask)
-{
- dev_err_once(&npdev->dev,
- "%s operation unsupported for NVLink devices\n",
- __func__);
- return -EPERM;
-}
-
-static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
- .dma_dev_setup = pnv_pci_dma_dev_setup,
-#ifdef CONFIG_PCI_MSI
- .setup_msi_irqs = pnv_setup_msi_irqs,
- .teardown_msi_irqs = pnv_teardown_msi_irqs,
-#endif
- .enable_device_hook = pnv_pci_enable_device_hook,
- .window_alignment = pnv_pci_window_alignment,
- .reset_secondary_bus = pnv_pci_reset_secondary_bus,
- .dma_set_mask = pnv_npu_dma_set_mask,
- .shutdown = pnv_pci_ioda_shutdown,
-};
-
static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
.enable_device_hook = pnv_pci_enable_device_hook,
.window_alignment = pnv_pci_window_alignment,
@@ -3931,9 +3691,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;

switch (phb->type) {
- case PNV_PHB_NPU_NVLINK:
- hose->controller_ops = pnv_npu_ioda_controller_ops;
- break;
case PNV_PHB_NPU_OCAPI:
hose->controller_ops = pnv_npu_ocapi_ioda_controller_ops;
break;
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 8b37b28e3831..54f2935b7ac5 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -231,17 +231,6 @@ extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
#define pe_info(pe, fmt, ...) \
pe_level_printk(pe, KERN_INFO, fmt, ##__VA_ARGS__)

-/* Nvlink functions */
-extern void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass);
-extern void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
-extern struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe);
-extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
- struct iommu_table *tbl);
-extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
-extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
-extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
-extern int pnv_npu2_init(struct pnv_phb *phb);
-
/* pci-ioda-tce.c */
#define POWERNV_IOMMU_DEFAULT_LEVELS 1
#define POWERNV_IOMMU_MAX_LEVELS 5
--
2.19.0


2018-10-09 13:30:55

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 02/33] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/include/asm/dma-mapping.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..f2a4a7142b1e 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask);

extern u64 __dma_get_required_mask(struct device *dev);

-#define ARCH_HAS_DMA_MMAP_COHERENT
-
#endif /* __KERNEL__ */
#endif /* _ASM_DMA_MAPPING_H */
--
2.19.0


2018-10-09 13:30:56

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 20/33] powerpc/dma: remove the iommu fallback for coherent allocations

All iommu capable platforms now always use the iommu code with the
internal bypass, so there is not need for this magic anymore.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/Kconfig | 4 ---
arch/powerpc/kernel/dma.c | 68 ++-------------------------------------
2 files changed, 2 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 06996df07cad..7097019d8907 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -119,9 +119,6 @@ config GENERIC_HWEIGHT
bool
default y

-config ARCH_HAS_DMA_SET_COHERENT_MASK
- bool
-
config PPC
bool
default y
@@ -129,7 +126,6 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_HAS_DEVMEM_IS_ALLOWED
- select ARCH_HAS_DMA_SET_COHERENT_MASK
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 78ef4076e15c..bef91b8ad064 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -114,51 +114,6 @@ void __dma_nommu_free_coherent(struct device *dev, size_t size,
}
#endif /* !CONFIG_NOT_COHERENT_CACHE */

-static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
- struct iommu_table *iommu;
-
- /* The coherent mask may be smaller than the real mask, check if
- * we can really use the direct ops
- */
- if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask))
- return __dma_nommu_alloc_coherent(dev, size, dma_handle,
- flag, attrs);
-
- /* Ok we can't ... do we have an iommu ? If not, fail */
- iommu = get_iommu_table_base(dev);
- if (!iommu)
- return NULL;
-
- /* Try to use the iommu */
- return iommu_alloc_coherent(dev, iommu, size, dma_handle,
- dev->coherent_dma_mask, flag,
- dev_to_node(dev));
-}
-
-static void dma_nommu_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- struct iommu_table *iommu;
-
- /* See comments in dma_nommu_alloc_coherent() */
- if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask))
- return __dma_nommu_free_coherent(dev, size, vaddr, dma_handle,
- attrs);
- /* Maybe we used an iommu ... */
- iommu = get_iommu_table_base(dev);
-
- /* If we hit that we should have never allocated in the first
- * place so how come we are freeing ?
- */
- if (WARN_ON(!iommu))
- return;
- iommu_free_coherent(iommu, size, vaddr, dma_handle);
-}
-
int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle, size_t size,
unsigned long attrs)
@@ -228,8 +183,8 @@ static inline void dma_nommu_sync_single(struct device *dev,
#endif

const struct dma_map_ops dma_nommu_ops = {
- .alloc = dma_nommu_alloc_coherent,
- .free = dma_nommu_free_coherent,
+ .alloc = __dma_nommu_alloc_coherent,
+ .free = __dma_nommu_free_coherent,
.mmap = dma_nommu_mmap_coherent,
.map_sg = dma_nommu_map_sg,
.dma_supported = dma_nommu_dma_supported,
@@ -243,25 +198,6 @@ const struct dma_map_ops dma_nommu_ops = {
};
EXPORT_SYMBOL(dma_nommu_ops);

-int dma_set_coherent_mask(struct device *dev, u64 mask)
-{
- if (!dma_supported(dev, mask)) {
- /*
- * We need to special case the direct DMA ops which can
- * support a fallback for coherent allocations. There
- * is no dma_op->set_coherent_mask() so we have to do
- * things the hard way:
- */
- if (get_dma_ops(dev) != &dma_nommu_ops ||
- get_iommu_table_base(dev) == NULL ||
- !dma_iommu_dma_supported(dev, mask))
- return -EIO;
- }
- dev->coherent_dma_mask = mask;
- return 0;
-}
-EXPORT_SYMBOL(dma_set_coherent_mask);
-
int dma_set_mask(struct device *dev, u64 dma_mask)
{
if (ppc_md.dma_set_mask)
--
2.19.0


2018-10-09 13:31:09

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 17/33] powerpc/powernv: use the generic iommu bypass code

Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/platforms/powernv/pci-ioda.c | 92 ++++++-----------------
1 file changed, 25 insertions(+), 67 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index b6db65917bb4..5748b62e2e86 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1739,86 +1739,45 @@ static int pnv_pci_ioda_dma_64bit_bypass(struct pnv_ioda_pe *pe)
return -EIO;
}

-static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
+static bool pnv_pci_ioda_iommu_bypass_supported(struct pci_dev *pdev,
+ u64 dma_mask)
{
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pnv_phb *phb = hose->private_data;
struct pci_dn *pdn = pci_get_pdn(pdev);
struct pnv_ioda_pe *pe;
- uint64_t top;
- bool bypass = false;
- s64 rc;

if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
return -ENODEV;

pe = &phb->ioda.pe_array[pdn->pe_number];
if (pe->tce_bypass_enabled) {
- top = pe->tce_bypass_base + memblock_end_of_DRAM() - 1;
- bypass = (dma_mask >= top);
+ u64 top = pe->tce_bypass_base + memblock_end_of_DRAM() - 1;
+ if (dma_mask >= top)
+ return true;
}

- if (bypass) {
- dev_info(&pdev->dev, "Using 64-bit DMA iommu bypass\n");
- set_dma_ops(&pdev->dev, &dma_nommu_ops);
- } else {
- /*
- * If the device can't set the TCE bypass bit but still wants
- * to access 4GB or more, on PHB3 we can reconfigure TVE#0 to
- * bypass the 32-bit region and be usable for 64-bit DMAs.
- * The device needs to be able to address all of this space.
- */
- if (dma_mask >> 32 &&
- dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
- /* pe->pdev should be set if it's a single device, pe->pbus if not */
- (pe->device_count == 1 || !pe->pbus) &&
- phb->model == PNV_PHB_MODEL_PHB3) {
- /* Configure the bypass mode */
- rc = pnv_pci_ioda_dma_64bit_bypass(pe);
- if (rc)
- return rc;
- /* 4GB offset bypasses 32-bit space */
- set_dma_offset(&pdev->dev, (1ULL << 32));
- set_dma_ops(&pdev->dev, &dma_nommu_ops);
- } else if (dma_mask >> 32 && dma_mask != DMA_BIT_MASK(64)) {
- /*
- * Fail the request if a DMA mask between 32 and 64 bits
- * was requested but couldn't be fulfilled. Ideally we
- * would do this for 64-bits but historically we have
- * always fallen back to 32-bits.
- */
- return -ENOMEM;
- } else {
- dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
- set_dma_ops(&pdev->dev, &dma_iommu_ops);
- }
+ /*
+ * If the device can't set the TCE bypass bit but still wants
+ * to access 4GB or more, on PHB3 we can reconfigure TVE#0 to
+ * bypass the 32-bit region and be usable for 64-bit DMAs.
+ * The device needs to be able to address all of this space.
+ */
+ if (dma_mask >> 32 &&
+ dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
+ /* pe->pdev should be set if it's a single device, pe->pbus if not */
+ (pe->device_count == 1 || !pe->pbus) &&
+ phb->model == PNV_PHB_MODEL_PHB3) {
+ /* Configure the bypass mode */
+ s64 rc = pnv_pci_ioda_dma_64bit_bypass(pe);
+ if (rc)
+ return rc;
+ /* 4GB offset bypasses 32-bit space */
+ set_dma_offset(&pdev->dev, (1ULL << 32));
+ return true;
}
- *pdev->dev.dma_mask = dma_mask;

- return 0;
-}
-
-static u64 pnv_pci_ioda_dma_get_required_mask(struct pci_dev *pdev)
-{
- struct pci_controller *hose = pci_bus_to_host(pdev->bus);
- struct pnv_phb *phb = hose->private_data;
- struct pci_dn *pdn = pci_get_pdn(pdev);
- struct pnv_ioda_pe *pe;
- u64 end, mask;
-
- if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
- return 0;
-
- pe = &phb->ioda.pe_array[pdn->pe_number];
- if (!pe->tce_bypass_enabled)
- return __dma_get_required_mask(&pdev->dev);
-
-
- end = pe->tce_bypass_base + memblock_end_of_DRAM();
- mask = 1ULL << (fls64(end) - 1);
- mask += mask - 1;
-
- return mask;
+ return false;
}

static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
@@ -3456,6 +3415,7 @@ static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
.dma_dev_setup = pnv_pci_dma_dev_setup,
.dma_bus_setup = pnv_pci_dma_bus_setup,
+ .iommu_bypass_supported = pnv_pci_ioda_iommu_bypass_supported,
#ifdef CONFIG_PCI_MSI
.setup_msi_irqs = pnv_setup_msi_irqs,
.teardown_msi_irqs = pnv_teardown_msi_irqs,
@@ -3465,8 +3425,6 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
.window_alignment = pnv_pci_window_alignment,
.setup_bridge = pnv_pci_setup_bridge,
.reset_secondary_bus = pnv_pci_reset_secondary_bus,
- .dma_set_mask = pnv_pci_ioda_dma_set_mask,
- .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
.shutdown = pnv_pci_ioda_shutdown,
};

--
2.19.0


2018-10-15 01:09:47

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 01/33] powerpc: use mm zones more sensibly

On Tue, 2018-10-09 at 15:24 +0200, Christoph Hellwig wrote:
> * Find the least restrictive zone that is entirely below the
> @@ -324,11 +305,14 @@ void __init paging_init(void)
> printk(KERN_DEBUG "Memory hole size: %ldMB\n",
> (long int)((top_of_ram - total_ram) >> 20));
>
> +#ifdef CONFIG_ZONE_DMA
> + max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffffffUL >> PAGE_SHIFT);
> +#endif
> + max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
> #ifdef CONFIG_HIGHMEM
> - limit_zone_pfn(ZONE_NORMAL, lowmem_end_addr >> PAGE_SHIFT);
> + max_zone_pfns[ZONE_HIGHMEM] = max_pfn
^
Missing a ";" here --------------------------|

Sorry ... works with that fix on an old laptop with highmem.

> #endif
> - limit_zone_pfn(TOP_ZONE, top_of_ram >> PAGE_SHIFT);
> - zone_limits_final = true;
> +
> free_area_init_nodes(max_zone_pfns);
>


2018-10-15 01:34:59

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 16/33] powerpc/powernv: remove dead npu-dma code


On 10/10/2018 00:24, Christoph Hellwig wrote:
> This code has been unused since it was merged and is in the way of
> cleaning up the DMA code, thus remove it.
>
> This effectively reverts commit 5d2aa710 ("powerpc/powernv: Add support
> for Nvlink NPUs").


This code is heavily used by the NVIDIA GPU driver.



>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> arch/powerpc/include/asm/pci.h | 3 -
> arch/powerpc/include/asm/powernv.h | 23 -
> arch/powerpc/platforms/powernv/Makefile | 2 +-
> arch/powerpc/platforms/powernv/npu-dma.c | 999 ----------------------
> arch/powerpc/platforms/powernv/pci-ioda.c | 243 ------
> arch/powerpc/platforms/powernv/pci.h | 11 -
> 6 files changed, 1 insertion(+), 1280 deletions(-)
> delete mode 100644 arch/powerpc/platforms/powernv/npu-dma.c
>
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 2af9ded80540..a01d2e3d6ff9 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -127,7 +127,4 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>
> #endif /* __KERNEL__ */
>
> -extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
> -extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
> -
> #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
> index 2f3ff7a27881..4848a6b3c6b2 100644
> --- a/arch/powerpc/include/asm/powernv.h
> +++ b/arch/powerpc/include/asm/powernv.h
> @@ -11,33 +11,10 @@
> #define _ASM_POWERNV_H
>
> #ifdef CONFIG_PPC_POWERNV
> -#define NPU2_WRITE 1
> extern void powernv_set_nmmu_ptcr(unsigned long ptcr);
> -extern struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
> - unsigned long flags,
> - void (*cb)(struct npu_context *, void *),
> - void *priv);
> -extern void pnv_npu2_destroy_context(struct npu_context *context,
> - struct pci_dev *gpdev);
> -extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
> - unsigned long *flags, unsigned long *status,
> - int count);
> -
> void pnv_tm_init(void);
> #else
> static inline void powernv_set_nmmu_ptcr(unsigned long ptcr) { }
> -static inline struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
> - unsigned long flags,
> - struct npu_context *(*cb)(struct npu_context *, void *),
> - void *priv) { return ERR_PTR(-ENODEV); }
> -static inline void pnv_npu2_destroy_context(struct npu_context *context,
> - struct pci_dev *gpdev) { }
> -
> -static inline int pnv_npu2_handle_fault(struct npu_context *context,
> - uintptr_t *ea, unsigned long *flags,
> - unsigned long *status, int count) {
> - return -ENODEV;
> -}
>
> static inline void pnv_tm_init(void) { }
> static inline void pnv_power9_force_smt4(void) { }
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index b540ce8eec55..2b13e9dd137c 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -6,7 +6,7 @@ obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
> obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
>
> obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o
> -obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
> +obj-$(CONFIG_PCI) += pci.o pci-ioda.o pci-ioda-tce.o
> obj-$(CONFIG_CXL_BASE) += pci-cxl.o
> obj-$(CONFIG_EEH) += eeh-powernv.o
> obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> deleted file mode 100644
> index 8006c54a91e3..000000000000
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ /dev/null
> @@ -1,999 +0,0 @@
> -/*
> - * This file implements the DMA operations for NVLink devices. The NPU
> - * devices all point to the same iommu table as the parent PCI device.
> - *
> - * Copyright Alistair Popple, IBM Corporation 2015.
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of version 2 of the GNU General Public
> - * License as published by the Free Software Foundation.
> - */
> -
> -#include <linux/slab.h>
> -#include <linux/mmu_notifier.h>
> -#include <linux/mmu_context.h>
> -#include <linux/of.h>
> -#include <linux/export.h>
> -#include <linux/pci.h>
> -#include <linux/memblock.h>
> -#include <linux/iommu.h>
> -#include <linux/debugfs.h>
> -
> -#include <asm/debugfs.h>
> -#include <asm/tlb.h>
> -#include <asm/powernv.h>
> -#include <asm/reg.h>
> -#include <asm/opal.h>
> -#include <asm/io.h>
> -#include <asm/iommu.h>
> -#include <asm/pnv-pci.h>
> -#include <asm/msi_bitmap.h>
> -#include <asm/opal.h>
> -
> -#include "powernv.h"
> -#include "pci.h"
> -
> -#define npu_to_phb(x) container_of(x, struct pnv_phb, npu)
> -
> -/*
> - * spinlock to protect initialisation of an npu_context for a particular
> - * mm_struct.
> - */
> -static DEFINE_SPINLOCK(npu_context_lock);
> -
> -/*
> - * When an address shootdown range exceeds this threshold we invalidate the
> - * entire TLB on the GPU for the given PID rather than each specific address in
> - * the range.
> - */
> -static uint64_t atsd_threshold = 2 * 1024 * 1024;
> -static struct dentry *atsd_threshold_dentry;
> -
> -/*
> - * Other types of TCE cache invalidation are not functional in the
> - * hardware.
> - */
> -static struct pci_dev *get_pci_dev(struct device_node *dn)
> -{
> - struct pci_dn *pdn = PCI_DN(dn);
> -
> - return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
> - pdn->busno, pdn->devfn);
> -}
> -
> -/* Given a NPU device get the associated PCI device. */
> -struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev)
> -{
> - struct device_node *dn;
> - struct pci_dev *gpdev;
> -
> - if (WARN_ON(!npdev))
> - return NULL;
> -
> - if (WARN_ON(!npdev->dev.of_node))
> - return NULL;
> -
> - /* Get assoicated PCI device */
> - dn = of_parse_phandle(npdev->dev.of_node, "ibm,gpu", 0);
> - if (!dn)
> - return NULL;
> -
> - gpdev = get_pci_dev(dn);
> - of_node_put(dn);
> -
> - return gpdev;
> -}
> -EXPORT_SYMBOL(pnv_pci_get_gpu_dev);
> -
> -/* Given the real PCI device get a linked NPU device. */
> -struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index)
> -{
> - struct device_node *dn;
> - struct pci_dev *npdev;
> -
> - if (WARN_ON(!gpdev))
> - return NULL;
> -
> - /* Not all PCI devices have device-tree nodes */
> - if (!gpdev->dev.of_node)
> - return NULL;
> -
> - /* Get assoicated PCI device */
> - dn = of_parse_phandle(gpdev->dev.of_node, "ibm,npu", index);
> - if (!dn)
> - return NULL;
> -
> - npdev = get_pci_dev(dn);
> - of_node_put(dn);
> -
> - return npdev;
> -}
> -EXPORT_SYMBOL(pnv_pci_get_npu_dev);
> -
> -#define NPU_DMA_OP_UNSUPPORTED() \
> - dev_err_once(dev, "%s operation unsupported for NVLink devices\n", \
> - __func__)
> -
> -static void *dma_npu_alloc(struct device *dev, size_t size,
> - dma_addr_t *dma_handle, gfp_t flag,
> - unsigned long attrs)
> -{
> - NPU_DMA_OP_UNSUPPORTED();
> - return NULL;
> -}
> -
> -static void dma_npu_free(struct device *dev, size_t size,
> - void *vaddr, dma_addr_t dma_handle,
> - unsigned long attrs)
> -{
> - NPU_DMA_OP_UNSUPPORTED();
> -}
> -
> -static dma_addr_t dma_npu_map_page(struct device *dev, struct page *page,
> - unsigned long offset, size_t size,
> - enum dma_data_direction direction,
> - unsigned long attrs)
> -{
> - NPU_DMA_OP_UNSUPPORTED();
> - return 0;
> -}
> -
> -static int dma_npu_map_sg(struct device *dev, struct scatterlist *sglist,
> - int nelems, enum dma_data_direction direction,
> - unsigned long attrs)
> -{
> - NPU_DMA_OP_UNSUPPORTED();
> - return 0;
> -}
> -
> -static int dma_npu_dma_supported(struct device *dev, u64 mask)
> -{
> - NPU_DMA_OP_UNSUPPORTED();
> - return 0;
> -}
> -
> -static u64 dma_npu_get_required_mask(struct device *dev)
> -{
> - NPU_DMA_OP_UNSUPPORTED();
> - return 0;
> -}
> -
> -static const struct dma_map_ops dma_npu_ops = {
> - .map_page = dma_npu_map_page,
> - .map_sg = dma_npu_map_sg,
> - .alloc = dma_npu_alloc,
> - .free = dma_npu_free,
> - .dma_supported = dma_npu_dma_supported,
> - .get_required_mask = dma_npu_get_required_mask,
> -};
> -
> -/*
> - * Returns the PE assoicated with the PCI device of the given
> - * NPU. Returns the linked pci device if pci_dev != NULL.
> - */
> -static struct pnv_ioda_pe *get_gpu_pci_dev_and_pe(struct pnv_ioda_pe *npe,
> - struct pci_dev **gpdev)
> -{
> - struct pnv_phb *phb;
> - struct pci_controller *hose;
> - struct pci_dev *pdev;
> - struct pnv_ioda_pe *pe;
> - struct pci_dn *pdn;
> -
> - pdev = pnv_pci_get_gpu_dev(npe->pdev);
> - if (!pdev)
> - return NULL;
> -
> - pdn = pci_get_pdn(pdev);
> - if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> - return NULL;
> -
> - hose = pci_bus_to_host(pdev->bus);
> - phb = hose->private_data;
> - pe = &phb->ioda.pe_array[pdn->pe_number];
> -
> - if (gpdev)
> - *gpdev = pdev;
> -
> - return pe;
> -}
> -
> -long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
> - struct iommu_table *tbl)
> -{
> - struct pnv_phb *phb = npe->phb;
> - int64_t rc;
> - const unsigned long size = tbl->it_indirect_levels ?
> - tbl->it_level_size : tbl->it_size;
> - const __u64 start_addr = tbl->it_offset << tbl->it_page_shift;
> - const __u64 win_size = tbl->it_size << tbl->it_page_shift;
> -
> - pe_info(npe, "Setting up window %llx..%llx pg=%lx\n",
> - start_addr, start_addr + win_size - 1,
> - IOMMU_PAGE_SIZE(tbl));
> -
> - rc = opal_pci_map_pe_dma_window(phb->opal_id,
> - npe->pe_number,
> - npe->pe_number,
> - tbl->it_indirect_levels + 1,
> - __pa(tbl->it_base),
> - size << 3,
> - IOMMU_PAGE_SIZE(tbl));
> - if (rc) {
> - pe_err(npe, "Failed to configure TCE table, err %lld\n", rc);
> - return rc;
> - }
> - pnv_pci_ioda2_tce_invalidate_entire(phb, false);
> -
> - /* Add the table to the list so its TCE cache will get invalidated */
> - pnv_pci_link_table_and_group(phb->hose->node, num,
> - tbl, &npe->table_group);
> -
> - return 0;
> -}
> -
> -long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num)
> -{
> - struct pnv_phb *phb = npe->phb;
> - int64_t rc;
> -
> - pe_info(npe, "Removing DMA window\n");
> -
> - rc = opal_pci_map_pe_dma_window(phb->opal_id, npe->pe_number,
> - npe->pe_number,
> - 0/* levels */, 0/* table address */,
> - 0/* table size */, 0/* page size */);
> - if (rc) {
> - pe_err(npe, "Unmapping failed, ret = %lld\n", rc);
> - return rc;
> - }
> - pnv_pci_ioda2_tce_invalidate_entire(phb, false);
> -
> - pnv_pci_unlink_table_and_group(npe->table_group.tables[num],
> - &npe->table_group);
> -
> - return 0;
> -}
> -
> -/*
> - * Enables 32 bit DMA on NPU.
> - */
> -static void pnv_npu_dma_set_32(struct pnv_ioda_pe *npe)
> -{
> - struct pci_dev *gpdev;
> - struct pnv_ioda_pe *gpe;
> - int64_t rc;
> -
> - /*
> - * Find the assoicated PCI devices and get the dma window
> - * information from there.
> - */
> - if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
> - return;
> -
> - gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> - if (!gpe)
> - return;
> -
> - rc = pnv_npu_set_window(npe, 0, gpe->table_group.tables[0]);
> -
> - /*
> - * We don't initialise npu_pe->tce32_table as we always use
> - * dma_npu_ops which are nops.
> - */
> - set_dma_ops(&npe->pdev->dev, &dma_npu_ops);
> -}
> -
> -/*
> - * Enables bypass mode on the NPU. The NPU only supports one
> - * window per link, so bypass needs to be explicitly enabled or
> - * disabled. Unlike for a PHB3 bypass and non-bypass modes can't be
> - * active at the same time.
> - */
> -static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
> -{
> - struct pnv_phb *phb = npe->phb;
> - int64_t rc = 0;
> - phys_addr_t top = memblock_end_of_DRAM();
> -
> - if (phb->type != PNV_PHB_NPU_NVLINK || !npe->pdev)
> - return -EINVAL;
> -
> - rc = pnv_npu_unset_window(npe, 0);
> - if (rc != OPAL_SUCCESS)
> - return rc;
> -
> - /* Enable the bypass window */
> -
> - top = roundup_pow_of_two(top);
> - dev_info(&npe->pdev->dev, "Enabling bypass for PE %x\n",
> - npe->pe_number);
> - rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
> - npe->pe_number, npe->pe_number,
> - 0 /* bypass base */, top);
> -
> - if (rc == OPAL_SUCCESS)
> - pnv_pci_ioda2_tce_invalidate_entire(phb, false);
> -
> - return rc;
> -}
> -
> -void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass)
> -{
> - int i;
> - struct pnv_phb *phb;
> - struct pci_dn *pdn;
> - struct pnv_ioda_pe *npe;
> - struct pci_dev *npdev;
> -
> - for (i = 0; ; ++i) {
> - npdev = pnv_pci_get_npu_dev(gpdev, i);
> -
> - if (!npdev)
> - break;
> -
> - pdn = pci_get_pdn(npdev);
> - if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> - return;
> -
> - phb = pci_bus_to_host(npdev->bus)->private_data;
> -
> - /* We only do bypass if it's enabled on the linked device */
> - npe = &phb->ioda.pe_array[pdn->pe_number];
> -
> - if (bypass) {
> - dev_info(&npdev->dev,
> - "Using 64-bit DMA iommu bypass\n");
> - pnv_npu_dma_set_bypass(npe);
> - } else {
> - dev_info(&npdev->dev, "Using 32-bit DMA via iommu\n");
> - pnv_npu_dma_set_32(npe);
> - }
> - }
> -}
> -
> -/* Switch ownership from platform code to external user (e.g. VFIO) */
> -void pnv_npu_take_ownership(struct pnv_ioda_pe *npe)
> -{
> - struct pnv_phb *phb = npe->phb;
> - int64_t rc;
> -
> - /*
> - * Note: NPU has just a single TVE in the hardware which means that
> - * while used by the kernel, it can have either 32bit window or
> - * DMA bypass but never both. So we deconfigure 32bit window only
> - * if it was enabled at the moment of ownership change.
> - */
> - if (npe->table_group.tables[0]) {
> - pnv_npu_unset_window(npe, 0);
> - return;
> - }
> -
> - /* Disable bypass */
> - rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
> - npe->pe_number, npe->pe_number,
> - 0 /* bypass base */, 0);
> - if (rc) {
> - pe_err(npe, "Failed to disable bypass, err %lld\n", rc);
> - return;
> - }
> - pnv_pci_ioda2_tce_invalidate_entire(npe->phb, false);
> -}
> -
> -struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
> -{
> - struct pnv_phb *phb = npe->phb;
> - struct pci_bus *pbus = phb->hose->bus;
> - struct pci_dev *npdev, *gpdev = NULL, *gptmp;
> - struct pnv_ioda_pe *gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> -
> - if (!gpe || !gpdev)
> - return NULL;
> -
> - list_for_each_entry(npdev, &pbus->devices, bus_list) {
> - gptmp = pnv_pci_get_gpu_dev(npdev);
> -
> - if (gptmp != gpdev)
> - continue;
> -
> - pe_info(gpe, "Attached NPU %s\n", dev_name(&npdev->dev));
> - iommu_group_add_device(gpe->table_group.group, &npdev->dev);
> - }
> -
> - return gpe;
> -}
> -
> -/* Maximum number of nvlinks per npu */
> -#define NV_MAX_LINKS 6
> -
> -/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> -static int max_npu2_index;
> -
> -struct npu_context {
> - struct mm_struct *mm;
> - struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
> - struct mmu_notifier mn;
> - struct kref kref;
> - bool nmmu_flush;
> -
> - /* Callback to stop translation requests on a given GPU */
> - void (*release_cb)(struct npu_context *context, void *priv);
> -
> - /*
> - * Private pointer passed to the above callback for usage by
> - * device drivers.
> - */
> - void *priv;
> -};
> -
> -struct mmio_atsd_reg {
> - struct npu *npu;
> - int reg;
> -};
> -
> -/*
> - * Find a free MMIO ATSD register and mark it in use. Return -ENOSPC
> - * if none are available.
> - */
> -static int get_mmio_atsd_reg(struct npu *npu)
> -{
> - int i;
> -
> - for (i = 0; i < npu->mmio_atsd_count; i++) {
> - if (!test_bit(i, &npu->mmio_atsd_usage))
> - if (!test_and_set_bit_lock(i, &npu->mmio_atsd_usage))
> - return i;
> - }
> -
> - return -ENOSPC;
> -}
> -
> -static void put_mmio_atsd_reg(struct npu *npu, int reg)
> -{
> - clear_bit_unlock(reg, &npu->mmio_atsd_usage);
> -}
> -
> -/* MMIO ATSD register offsets */
> -#define XTS_ATSD_AVA 1
> -#define XTS_ATSD_STAT 2
> -
> -static void mmio_launch_invalidate(struct mmio_atsd_reg *mmio_atsd_reg,
> - unsigned long launch, unsigned long va)
> -{
> - struct npu *npu = mmio_atsd_reg->npu;
> - int reg = mmio_atsd_reg->reg;
> -
> - __raw_writeq_be(va, npu->mmio_atsd_regs[reg] + XTS_ATSD_AVA);
> - eieio();
> - __raw_writeq_be(launch, npu->mmio_atsd_regs[reg]);
> -}
> -
> -static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
> - unsigned long pid, bool flush)
> -{
> - int i;
> - unsigned long launch;
> -
> - for (i = 0; i <= max_npu2_index; i++) {
> - if (mmio_atsd_reg[i].reg < 0)
> - continue;
> -
> - /* IS set to invalidate matching PID */
> - launch = PPC_BIT(12);
> -
> - /* PRS set to process-scoped */
> - launch |= PPC_BIT(13);
> -
> - /* AP */
> - launch |= (u64)
> - mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
> -
> - /* PID */
> - launch |= pid << PPC_BITLSHIFT(38);
> -
> - /* No flush */
> - launch |= !flush << PPC_BITLSHIFT(39);
> -
> - /* Invalidating the entire process doesn't use a va */
> - mmio_launch_invalidate(&mmio_atsd_reg[i], launch, 0);
> - }
> -}
> -
> -static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
> - unsigned long va, unsigned long pid, bool flush)
> -{
> - int i;
> - unsigned long launch;
> -
> - for (i = 0; i <= max_npu2_index; i++) {
> - if (mmio_atsd_reg[i].reg < 0)
> - continue;
> -
> - /* IS set to invalidate target VA */
> - launch = 0;
> -
> - /* PRS set to process scoped */
> - launch |= PPC_BIT(13);
> -
> - /* AP */
> - launch |= (u64)
> - mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
> -
> - /* PID */
> - launch |= pid << PPC_BITLSHIFT(38);
> -
> - /* No flush */
> - launch |= !flush << PPC_BITLSHIFT(39);
> -
> - mmio_launch_invalidate(&mmio_atsd_reg[i], launch, va);
> - }
> -}
> -
> -#define mn_to_npu_context(x) container_of(x, struct npu_context, mn)
> -
> -static void mmio_invalidate_wait(
> - struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
> -{
> - struct npu *npu;
> - int i, reg;
> -
> - /* Wait for all invalidations to complete */
> - for (i = 0; i <= max_npu2_index; i++) {
> - if (mmio_atsd_reg[i].reg < 0)
> - continue;
> -
> - /* Wait for completion */
> - npu = mmio_atsd_reg[i].npu;
> - reg = mmio_atsd_reg[i].reg;
> - while (__raw_readq(npu->mmio_atsd_regs[reg] + XTS_ATSD_STAT))
> - cpu_relax();
> - }
> -}
> -
> -/*
> - * Acquires all the address translation shootdown (ATSD) registers required to
> - * launch an ATSD on all links this npu_context is active on.
> - */
> -static void acquire_atsd_reg(struct npu_context *npu_context,
> - struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
> -{
> - int i, j;
> - struct npu *npu;
> - struct pci_dev *npdev;
> - struct pnv_phb *nphb;
> -
> - for (i = 0; i <= max_npu2_index; i++) {
> - mmio_atsd_reg[i].reg = -1;
> - for (j = 0; j < NV_MAX_LINKS; j++) {
> - /*
> - * There are no ordering requirements with respect to
> - * the setup of struct npu_context, but to ensure
> - * consistent behaviour we need to ensure npdev[][] is
> - * only read once.
> - */
> - npdev = READ_ONCE(npu_context->npdev[i][j]);
> - if (!npdev)
> - continue;
> -
> - nphb = pci_bus_to_host(npdev->bus)->private_data;
> - npu = &nphb->npu;
> - mmio_atsd_reg[i].npu = npu;
> - mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
> - while (mmio_atsd_reg[i].reg < 0) {
> - mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
> - cpu_relax();
> - }
> - break;
> - }
> - }
> -}
> -
> -/*
> - * Release previously acquired ATSD registers. To avoid deadlocks the registers
> - * must be released in the same order they were acquired above in
> - * acquire_atsd_reg.
> - */
> -static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
> -{
> - int i;
> -
> - for (i = 0; i <= max_npu2_index; i++) {
> - /*
> - * We can't rely on npu_context->npdev[][] being the same here
> - * as when acquire_atsd_reg() was called, hence we use the
> - * values stored in mmio_atsd_reg during the acquire phase
> - * rather than re-reading npdev[][].
> - */
> - if (mmio_atsd_reg[i].reg < 0)
> - continue;
> -
> - put_mmio_atsd_reg(mmio_atsd_reg[i].npu, mmio_atsd_reg[i].reg);
> - }
> -}
> -
> -/*
> - * Invalidate either a single address or an entire PID depending on
> - * the value of va.
> - */
> -static void mmio_invalidate(struct npu_context *npu_context, int va,
> - unsigned long address, bool flush)
> -{
> - struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS];
> - unsigned long pid = npu_context->mm->context.id;
> -
> - if (npu_context->nmmu_flush)
> - /*
> - * Unfortunately the nest mmu does not support flushing specific
> - * addresses so we have to flush the whole mm once before
> - * shooting down the GPU translation.
> - */
> - flush_all_mm(npu_context->mm);
> -
> - /*
> - * Loop over all the NPUs this process is active on and launch
> - * an invalidate.
> - */
> - acquire_atsd_reg(npu_context, mmio_atsd_reg);
> - if (va)
> - mmio_invalidate_va(mmio_atsd_reg, address, pid, flush);
> - else
> - mmio_invalidate_pid(mmio_atsd_reg, pid, flush);
> -
> - mmio_invalidate_wait(mmio_atsd_reg);
> - if (flush) {
> - /*
> - * The GPU requires two flush ATSDs to ensure all entries have
> - * been flushed. We use PID 0 as it will never be used for a
> - * process on the GPU.
> - */
> - mmio_invalidate_pid(mmio_atsd_reg, 0, true);
> - mmio_invalidate_wait(mmio_atsd_reg);
> - mmio_invalidate_pid(mmio_atsd_reg, 0, true);
> - mmio_invalidate_wait(mmio_atsd_reg);
> - }
> - release_atsd_reg(mmio_atsd_reg);
> -}
> -
> -static void pnv_npu2_mn_release(struct mmu_notifier *mn,
> - struct mm_struct *mm)
> -{
> - struct npu_context *npu_context = mn_to_npu_context(mn);
> -
> - /* Call into device driver to stop requests to the NMMU */
> - if (npu_context->release_cb)
> - npu_context->release_cb(npu_context, npu_context->priv);
> -
> - /*
> - * There should be no more translation requests for this PID, but we
> - * need to ensure any entries for it are removed from the TLB.
> - */
> - mmio_invalidate(npu_context, 0, 0, true);
> -}
> -
> -static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
> - struct mm_struct *mm,
> - unsigned long address,
> - pte_t pte)
> -{
> - struct npu_context *npu_context = mn_to_npu_context(mn);
> -
> - mmio_invalidate(npu_context, 1, address, true);
> -}
> -
> -static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
> - struct mm_struct *mm,
> - unsigned long start, unsigned long end)
> -{
> - struct npu_context *npu_context = mn_to_npu_context(mn);
> - unsigned long address;
> -
> - if (end - start > atsd_threshold) {
> - /*
> - * Just invalidate the entire PID if the address range is too
> - * large.
> - */
> - mmio_invalidate(npu_context, 0, 0, true);
> - } else {
> - for (address = start; address < end; address += PAGE_SIZE)
> - mmio_invalidate(npu_context, 1, address, false);
> -
> - /* Do the flush only on the final addess == end */
> - mmio_invalidate(npu_context, 1, address, true);
> - }
> -}
> -
> -static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
> - .release = pnv_npu2_mn_release,
> - .change_pte = pnv_npu2_mn_change_pte,
> - .invalidate_range = pnv_npu2_mn_invalidate_range,
> -};
> -
> -/*
> - * Call into OPAL to setup the nmmu context for the current task in
> - * the NPU. This must be called to setup the context tables before the
> - * GPU issues ATRs. pdev should be a pointed to PCIe GPU device.
> - *
> - * A release callback should be registered to allow a device driver to
> - * be notified that it should not launch any new translation requests
> - * as the final TLB invalidate is about to occur.
> - *
> - * Returns an error if there no contexts are currently available or a
> - * npu_context which should be passed to pnv_npu2_handle_fault().
> - *
> - * mmap_sem must be held in write mode and must not be called from interrupt
> - * context.
> - */
> -struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
> - unsigned long flags,
> - void (*cb)(struct npu_context *, void *),
> - void *priv)
> -{
> - int rc;
> - u32 nvlink_index;
> - struct device_node *nvlink_dn;
> - struct mm_struct *mm = current->mm;
> - struct pnv_phb *nphb;
> - struct npu *npu;
> - struct npu_context *npu_context;
> -
> - /*
> - * At present we don't support GPUs connected to multiple NPUs and I'm
> - * not sure the hardware does either.
> - */
> - struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
> -
> - if (!firmware_has_feature(FW_FEATURE_OPAL))
> - return ERR_PTR(-ENODEV);
> -
> - if (!npdev)
> - /* No nvlink associated with this GPU device */
> - return ERR_PTR(-ENODEV);
> -
> - nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
> - if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
> - &nvlink_index)))
> - return ERR_PTR(-ENODEV);
> -
> - if (!mm || mm->context.id == 0) {
> - /*
> - * Kernel thread contexts are not supported and context id 0 is
> - * reserved on the GPU.
> - */
> - return ERR_PTR(-EINVAL);
> - }
> -
> - nphb = pci_bus_to_host(npdev->bus)->private_data;
> - npu = &nphb->npu;
> -
> - /*
> - * Setup the NPU context table for a particular GPU. These need to be
> - * per-GPU as we need the tables to filter ATSDs when there are no
> - * active contexts on a particular GPU. It is safe for these to be
> - * called concurrently with destroy as the OPAL call takes appropriate
> - * locks and refcounts on init/destroy.
> - */
> - rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> - if (rc < 0)
> - return ERR_PTR(-ENOSPC);
> -
> - /*
> - * We store the npu pci device so we can more easily get at the
> - * associated npus.
> - */
> - spin_lock(&npu_context_lock);
> - npu_context = mm->context.npu_context;
> - if (npu_context) {
> - if (npu_context->release_cb != cb ||
> - npu_context->priv != priv) {
> - spin_unlock(&npu_context_lock);
> - opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> - PCI_DEVID(gpdev->bus->number,
> - gpdev->devfn));
> - return ERR_PTR(-EINVAL);
> - }
> -
> - WARN_ON(!kref_get_unless_zero(&npu_context->kref));
> - }
> - spin_unlock(&npu_context_lock);
> -
> - if (!npu_context) {
> - /*
> - * We can set up these fields without holding the
> - * npu_context_lock as the npu_context hasn't been returned to
> - * the caller meaning it can't be destroyed. Parallel allocation
> - * is protected against by mmap_sem.
> - */
> - rc = -ENOMEM;
> - npu_context = kzalloc(sizeof(struct npu_context), GFP_KERNEL);
> - if (npu_context) {
> - kref_init(&npu_context->kref);
> - npu_context->mm = mm;
> - npu_context->mn.ops = &nv_nmmu_notifier_ops;
> - rc = __mmu_notifier_register(&npu_context->mn, mm);
> - }
> -
> - if (rc) {
> - kfree(npu_context);
> - opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> - PCI_DEVID(gpdev->bus->number,
> - gpdev->devfn));
> - return ERR_PTR(rc);
> - }
> -
> - mm->context.npu_context = npu_context;
> - }
> -
> - npu_context->release_cb = cb;
> - npu_context->priv = priv;
> -
> - /*
> - * npdev is a pci_dev pointer setup by the PCI code. We assign it to
> - * npdev[][] to indicate to the mmu notifiers that an invalidation
> - * should also be sent over this nvlink. The notifiers don't use any
> - * other fields in npu_context, so we just need to ensure that when they
> - * deference npu_context->npdev[][] it is either a valid pointer or
> - * NULL.
> - */
> - WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
> -
> - if (!nphb->npu.nmmu_flush) {
> - /*
> - * If we're not explicitly flushing ourselves we need to mark
> - * the thread for global flushes
> - */
> - npu_context->nmmu_flush = false;
> - mm_context_add_copro(mm);
> - } else
> - npu_context->nmmu_flush = true;
> -
> - return npu_context;
> -}
> -EXPORT_SYMBOL(pnv_npu2_init_context);
> -
> -static void pnv_npu2_release_context(struct kref *kref)
> -{
> - struct npu_context *npu_context =
> - container_of(kref, struct npu_context, kref);
> -
> - if (!npu_context->nmmu_flush)
> - mm_context_remove_copro(npu_context->mm);
> -
> - npu_context->mm->context.npu_context = NULL;
> -}
> -
> -/*
> - * Destroy a context on the given GPU. May free the npu_context if it is no
> - * longer active on any GPUs. Must not be called from interrupt context.
> - */
> -void pnv_npu2_destroy_context(struct npu_context *npu_context,
> - struct pci_dev *gpdev)
> -{
> - int removed;
> - struct pnv_phb *nphb;
> - struct npu *npu;
> - struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
> - struct device_node *nvlink_dn;
> - u32 nvlink_index;
> -
> - if (WARN_ON(!npdev))
> - return;
> -
> - if (!firmware_has_feature(FW_FEATURE_OPAL))
> - return;
> -
> - nphb = pci_bus_to_host(npdev->bus)->private_data;
> - npu = &nphb->npu;
> - nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
> - if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
> - &nvlink_index)))
> - return;
> - WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
> - opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> - spin_lock(&npu_context_lock);
> - removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
> - spin_unlock(&npu_context_lock);
> -
> - /*
> - * We need to do this outside of pnv_npu2_release_context so that it is
> - * outside the spinlock as mmu_notifier_destroy uses SRCU.
> - */
> - if (removed) {
> - mmu_notifier_unregister(&npu_context->mn,
> - npu_context->mm);
> -
> - kfree(npu_context);
> - }
> -
> -}
> -EXPORT_SYMBOL(pnv_npu2_destroy_context);
> -
> -/*
> - * Assumes mmap_sem is held for the contexts associated mm.
> - */
> -int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
> - unsigned long *flags, unsigned long *status, int count)
> -{
> - u64 rc = 0, result = 0;
> - int i, is_write;
> - struct page *page[1];
> -
> - /* mmap_sem should be held so the struct_mm must be present */
> - struct mm_struct *mm = context->mm;
> -
> - if (!firmware_has_feature(FW_FEATURE_OPAL))
> - return -ENODEV;
> -
> - WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
> -
> - for (i = 0; i < count; i++) {
> - is_write = flags[i] & NPU2_WRITE;
> - rc = get_user_pages_remote(NULL, mm, ea[i], 1,
> - is_write ? FOLL_WRITE : 0,
> - page, NULL, NULL);
> -
> - /*
> - * To support virtualised environments we will have to do an
> - * access to the page to ensure it gets faulted into the
> - * hypervisor. For the moment virtualisation is not supported in
> - * other areas so leave the access out.
> - */
> - if (rc != 1) {
> - status[i] = rc;
> - result = -EFAULT;
> - continue;
> - }
> -
> - status[i] = 0;
> - put_page(page[0]);
> - }
> -
> - return result;
> -}
> -EXPORT_SYMBOL(pnv_npu2_handle_fault);
> -
> -int pnv_npu2_init(struct pnv_phb *phb)
> -{
> - unsigned int i;
> - u64 mmio_atsd;
> - struct device_node *dn;
> - struct pci_dev *gpdev;
> - static int npu_index;
> - uint64_t rc = 0;
> -
> - if (!atsd_threshold_dentry) {
> - atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
> - 0600, powerpc_debugfs_root, &atsd_threshold);
> - }
> -
> - phb->npu.nmmu_flush =
> - of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
> - for_each_child_of_node(phb->hose->dn, dn) {
> - gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> - if (gpdev) {
> - rc = opal_npu_map_lpar(phb->opal_id,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn),
> - 0, 0);
> - if (rc)
> - dev_err(&gpdev->dev,
> - "Error %lld mapping device to LPAR\n",
> - rc);
> - }
> - }
> -
> - for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
> - i, &mmio_atsd); i++)
> - phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
> -
> - pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
> - phb->npu.mmio_atsd_count = i;
> - phb->npu.mmio_atsd_usage = 0;
> - npu_index++;
> - if (WARN_ON(npu_index >= NV_MAX_NPUS))
> - return -ENOSPC;
> - max_npu2_index = npu_index;
> - phb->npu.index = npu_index;
> -
> - return 0;
> -}
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 913175ba1c10..b6db65917bb4 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1203,75 +1203,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
> return pe;
> }
>
> -static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> -{
> - int pe_num, found_pe = false, rc;
> - long rid;
> - struct pnv_ioda_pe *pe;
> - struct pci_dev *gpu_pdev;
> - struct pci_dn *npu_pdn;
> - struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
> - struct pnv_phb *phb = hose->private_data;
> -
> - /*
> - * Due to a hardware errata PE#0 on the NPU is reserved for
> - * error handling. This means we only have three PEs remaining
> - * which need to be assigned to four links, implying some
> - * links must share PEs.
> - *
> - * To achieve this we assign PEs such that NPUs linking the
> - * same GPU get assigned the same PE.
> - */
> - gpu_pdev = pnv_pci_get_gpu_dev(npu_pdev);
> - for (pe_num = 0; pe_num < phb->ioda.total_pe_num; pe_num++) {
> - pe = &phb->ioda.pe_array[pe_num];
> - if (!pe->pdev)
> - continue;
> -
> - if (pnv_pci_get_gpu_dev(pe->pdev) == gpu_pdev) {
> - /*
> - * This device has the same peer GPU so should
> - * be assigned the same PE as the existing
> - * peer NPU.
> - */
> - dev_info(&npu_pdev->dev,
> - "Associating to existing PE %x\n", pe_num);
> - pci_dev_get(npu_pdev);
> - npu_pdn = pci_get_pdn(npu_pdev);
> - rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
> - npu_pdn->pe_number = pe_num;
> - phb->ioda.pe_rmap[rid] = pe->pe_number;
> -
> - /* Map the PE to this link */
> - rc = opal_pci_set_pe(phb->opal_id, pe_num, rid,
> - OpalPciBusAll,
> - OPAL_COMPARE_RID_DEVICE_NUMBER,
> - OPAL_COMPARE_RID_FUNCTION_NUMBER,
> - OPAL_MAP_PE);
> - WARN_ON(rc != OPAL_SUCCESS);
> - found_pe = true;
> - break;
> - }
> - }
> -
> - if (!found_pe)
> - /*
> - * Could not find an existing PE so allocate a new
> - * one.
> - */
> - return pnv_ioda_setup_dev_PE(npu_pdev);
> - else
> - return pe;
> -}
> -
> -static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
> -{
> - struct pci_dev *pdev;
> -
> - list_for_each_entry(pdev, &bus->devices, bus_list)
> - pnv_ioda_setup_npu_PE(pdev);
> -}
> -
> static void pnv_pci_ioda_setup_PEs(void)
> {
> struct pci_controller *hose, *tmp;
> @@ -1281,13 +1212,6 @@ static void pnv_pci_ioda_setup_PEs(void)
>
> list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> phb = hose->private_data;
> - if (phb->type == PNV_PHB_NPU_NVLINK) {
> - /* PE#0 is needed for error reporting */
> - pnv_ioda_reserve_pe(phb, 0);
> - pnv_ioda_setup_npu_PEs(hose->bus);
> - if (phb->model == PNV_PHB_MODEL_NPU2)
> - pnv_npu2_init(phb);
> - }
> if (phb->type == PNV_PHB_NPU_OCAPI) {
> bus = hose->bus;
> list_for_each_entry(pdev, &bus->devices, bus_list)
> @@ -1871,9 +1795,6 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> }
> *pdev->dev.dma_mask = dma_mask;
>
> - /* Update peer npu devices */
> - pnv_npu_try_dma_set_bypass(pdev, bypass);
> -
> return 0;
> }
>
> @@ -2119,14 +2040,6 @@ static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
> }
> }
>
> -void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
> -{
> - if (phb->model == PNV_PHB_MODEL_NPU || phb->model == PNV_PHB_MODEL_PHB3)
> - pnv_pci_phb3_tce_invalidate_entire(phb, rm);
> - else
> - opal_pci_tce_kill(phb->opal_id, OPAL_PCI_TCE_KILL, 0, 0, 0, 0);
> -}
> -
> static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
> long npages, unsigned long uaddr,
> enum dma_data_direction direction,
> @@ -2615,137 +2528,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
> .take_ownership = pnv_ioda2_take_ownership,
> .release_ownership = pnv_ioda2_release_ownership,
> };
> -
> -static int gpe_table_group_to_npe_cb(struct device *dev, void *opaque)
> -{
> - struct pci_controller *hose;
> - struct pnv_phb *phb;
> - struct pnv_ioda_pe **ptmppe = opaque;
> - struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
> - struct pci_dn *pdn = pci_get_pdn(pdev);
> -
> - if (!pdn || pdn->pe_number == IODA_INVALID_PE)
> - return 0;
> -
> - hose = pci_bus_to_host(pdev->bus);
> - phb = hose->private_data;
> - if (phb->type != PNV_PHB_NPU_NVLINK)
> - return 0;
> -
> - *ptmppe = &phb->ioda.pe_array[pdn->pe_number];
> -
> - return 1;
> -}
> -
> -/*
> - * This returns PE of associated NPU.
> - * This assumes that NPU is in the same IOMMU group with GPU and there is
> - * no other PEs.
> - */
> -static struct pnv_ioda_pe *gpe_table_group_to_npe(
> - struct iommu_table_group *table_group)
> -{
> - struct pnv_ioda_pe *npe = NULL;
> - int ret = iommu_group_for_each_dev(table_group->group, &npe,
> - gpe_table_group_to_npe_cb);
> -
> - BUG_ON(!ret || !npe);
> -
> - return npe;
> -}
> -
> -static long pnv_pci_ioda2_npu_set_window(struct iommu_table_group *table_group,
> - int num, struct iommu_table *tbl)
> -{
> - struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
> - int num2 = (num == 0) ? 1 : 0;
> - long ret = pnv_pci_ioda2_set_window(table_group, num, tbl);
> -
> - if (ret)
> - return ret;
> -
> - if (table_group->tables[num2])
> - pnv_npu_unset_window(npe, num2);
> -
> - ret = pnv_npu_set_window(npe, num, tbl);
> - if (ret) {
> - pnv_pci_ioda2_unset_window(table_group, num);
> - if (table_group->tables[num2])
> - pnv_npu_set_window(npe, num2,
> - table_group->tables[num2]);
> - }
> -
> - return ret;
> -}
> -
> -static long pnv_pci_ioda2_npu_unset_window(
> - struct iommu_table_group *table_group,
> - int num)
> -{
> - struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
> - int num2 = (num == 0) ? 1 : 0;
> - long ret = pnv_pci_ioda2_unset_window(table_group, num);
> -
> - if (ret)
> - return ret;
> -
> - if (!npe->table_group.tables[num])
> - return 0;
> -
> - ret = pnv_npu_unset_window(npe, num);
> - if (ret)
> - return ret;
> -
> - if (table_group->tables[num2])
> - ret = pnv_npu_set_window(npe, num2, table_group->tables[num2]);
> -
> - return ret;
> -}
> -
> -static void pnv_ioda2_npu_take_ownership(struct iommu_table_group *table_group)
> -{
> - /*
> - * Detach NPU first as pnv_ioda2_take_ownership() will destroy
> - * the iommu_table if 32bit DMA is enabled.
> - */
> - pnv_npu_take_ownership(gpe_table_group_to_npe(table_group));
> - pnv_ioda2_take_ownership(table_group);
> -}
> -
> -static struct iommu_table_group_ops pnv_pci_ioda2_npu_ops = {
> - .get_table_size = pnv_pci_ioda2_get_table_size,
> - .create_table = pnv_pci_ioda2_create_table_userspace,
> - .set_window = pnv_pci_ioda2_npu_set_window,
> - .unset_window = pnv_pci_ioda2_npu_unset_window,
> - .take_ownership = pnv_ioda2_npu_take_ownership,
> - .release_ownership = pnv_ioda2_release_ownership,
> -};
> -
> -static void pnv_pci_ioda_setup_iommu_api(void)
> -{
> - struct pci_controller *hose, *tmp;
> - struct pnv_phb *phb;
> - struct pnv_ioda_pe *pe, *gpe;
> -
> - /*
> - * Now we have all PHBs discovered, time to add NPU devices to
> - * the corresponding IOMMU groups.
> - */
> - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> - phb = hose->private_data;
> -
> - if (phb->type != PNV_PHB_NPU_NVLINK)
> - continue;
> -
> - list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> - gpe = pnv_pci_npu_setup_iommu(pe);
> - if (gpe)
> - gpe->table_group.ops = &pnv_pci_ioda2_npu_ops;
> - }
> - }
> -}
> -#else /* !CONFIG_IOMMU_API */
> -static void pnv_pci_ioda_setup_iommu_api(void) { };
> #endif
>
> static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
> @@ -3242,7 +3024,6 @@ static void pnv_pci_enable_bridges(void)
> static void pnv_pci_ioda_fixup(void)
> {
> pnv_pci_ioda_setup_PEs();
> - pnv_pci_ioda_setup_iommu_api();
> pnv_pci_ioda_create_dbgfs();
>
> pnv_pci_enable_bridges();
> @@ -3689,27 +3470,6 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
> .shutdown = pnv_pci_ioda_shutdown,
> };
>
> -static int pnv_npu_dma_set_mask(struct pci_dev *npdev, u64 dma_mask)
> -{
> - dev_err_once(&npdev->dev,
> - "%s operation unsupported for NVLink devices\n",
> - __func__);
> - return -EPERM;
> -}
> -
> -static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> - .dma_dev_setup = pnv_pci_dma_dev_setup,
> -#ifdef CONFIG_PCI_MSI
> - .setup_msi_irqs = pnv_setup_msi_irqs,
> - .teardown_msi_irqs = pnv_teardown_msi_irqs,
> -#endif
> - .enable_device_hook = pnv_pci_enable_device_hook,
> - .window_alignment = pnv_pci_window_alignment,
> - .reset_secondary_bus = pnv_pci_reset_secondary_bus,
> - .dma_set_mask = pnv_npu_dma_set_mask,
> - .shutdown = pnv_pci_ioda_shutdown,
> -};
> -
> static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
> .enable_device_hook = pnv_pci_enable_device_hook,
> .window_alignment = pnv_pci_window_alignment,
> @@ -3931,9 +3691,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
>
> switch (phb->type) {
> - case PNV_PHB_NPU_NVLINK:
> - hose->controller_ops = pnv_npu_ioda_controller_ops;
> - break;
> case PNV_PHB_NPU_OCAPI:
> hose->controller_ops = pnv_npu_ocapi_ioda_controller_ops;
> break;
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 8b37b28e3831..54f2935b7ac5 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -231,17 +231,6 @@ extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
> #define pe_info(pe, fmt, ...) \
> pe_level_printk(pe, KERN_INFO, fmt, ##__VA_ARGS__)
>
> -/* Nvlink functions */
> -extern void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass);
> -extern void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
> -extern struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe);
> -extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
> - struct iommu_table *tbl);
> -extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
> -extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
> -extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
> -extern int pnv_npu2_init(struct pnv_phb *phb);
> -
> /* pci-ioda-tce.c */
> #define POWERNV_IOMMU_DEFAULT_LEVELS 1
> #define POWERNV_IOMMU_MAX_LEVELS 5
>

--
Alexey

2018-10-15 05:51:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 16/33] powerpc/powernv: remove dead npu-dma code

On Mon, Oct 15, 2018 at 12:34:02PM +1100, Alexey Kardashevskiy wrote:
>
> On 10/10/2018 00:24, Christoph Hellwig wrote:
> > This code has been unused since it was merged and is in the way of
> > cleaning up the DMA code, thus remove it.
> >
> > This effectively reverts commit 5d2aa710 ("powerpc/powernv: Add support
> > for Nvlink NPUs").
>
>
> This code is heavily used by the NVIDIA GPU driver.

Not by the that actually exists in the kernel tree, so it simply doesn't
matter.

2018-12-23 16:26:16

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 16/33] powerpc/powernv: remove dead npu-dma code

On Mon, 2018-10-15 at 12:34 +1100, Alexey Kardashevskiy wrote:
> On 10/10/2018 00:24, Christoph Hellwig wrote:
> > This code has been unused since it was merged and is in the way of
> > cleaning up the DMA code, thus remove it.
> >
> > This effectively reverts commit 5d2aa710 ("powerpc/powernv: Add support
> > for Nvlink NPUs").
>
>
> This code is heavily used by the NVIDIA GPU driver.

Some of it is, yes. And while I don't want to be involved in the
discussion about that specific can of worms, there is code in this file
related to the custom "always error" DMA ops that I suppose we could
remove, which is what is getting in the way of Christoph cleanups. It's
just meant as a debug stuff to catch incorrect attempts at doing the
dma mappings on the wrong "side" of the GPU.

Cheers,
Ben.