Dan Williams started to look into addressing I/O to and from
Persistent Memory in his series from June:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
I've started looking into DMA mapping of these SGLs specifically instead
of the map_pfn method in there. In addition to supporting NVDIMM backed
I/O I also suspect this would be highly useful for media drivers that
go through nasty hoops to be able to DMA from/to their ioremapped regions,
with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
being a prime example for the unsafe hacks currently used.
It turns out most DMA mapping implementation can handle SGLs without
page structures with some fairly simple mechanical work. Most of it
is just about consistently using sg_phys. For implementations that
need to flush caches we need a new helper that skips these cache
flushes if a entry doesn't have a kernel virtual address.
However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
to be operate mostly on virtual addresses. It's a fairly odd concept
that I don't fully grasp, so I'll need some help with those if we want
to bring this forward.
Additional this series skips ARM entirely for now. The reason is
that most arm implementations of the .map_sg operation just iterate
over all entries and call ->map_page for it, which means we'd need
to convert those to a ->map_pfn similar to Dan's previous approach.
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/scatterlist.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 9b1ef0c..b1056bf 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -230,6 +230,16 @@ static inline dma_addr_t sg_phys(struct scatterlist *sg)
return page_to_phys(sg_page(sg)) + sg->offset;
}
+static inline unsigned long sg_pfn(struct scatterlist *sg)
+{
+ return page_to_pfn(sg_page(sg));
+}
+
+static inline bool sg_has_page(struct scatterlist *sg)
+{
+ return true;
+}
+
/**
* sg_virt - Return virtual address of an sg entry
* @sg: SG entry
--
1.9.1
From: Dan Williams <[email protected]>
Coccinelle cleanup to replace open coded sg to physical address
translations. This is in preparation for introducing scatterlists that
reference __pfn_t.
// sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
// usage: make coccicheck COCCI=sg_phys.cocci MODE=patch
virtual patch
@@
struct scatterlist *sg;
@@
- page_to_phys(sg_page(sg)) + sg->offset
+ sg_phys(sg)
@@
struct scatterlist *sg;
@@
- page_to_phys(sg_page(sg))
+ sg_phys(sg) & PAGE_MASK
Signed-off-by: Dan Williams <[email protected]>
---
arch/arm/mm/dma-mapping.c | 2 +-
arch/microblaze/kernel/dma.c | 3 +--
drivers/iommu/intel-iommu.c | 4 ++--
drivers/iommu/iommu.c | 2 +-
drivers/staging/android/ion/ion_chunk_heap.c | 4 ++--
5 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index cba12f3..3d3d6aa 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1520,7 +1520,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
return -ENOMEM;
for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
- phys_addr_t phys = page_to_phys(sg_page(s));
+ phys_addr_t phys = sg_phys(s) & PAGE_MASK;
unsigned int len = PAGE_ALIGN(s->offset + s->length);
if (!is_coherent &&
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index bf4dec2..c89da63 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -61,8 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
/* FIXME this part of code is untested */
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = sg_phys(sg);
- __dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
- sg->length, direction);
+ __dma_sync(sg_phys(sg), sg->length, direction);
}
return nents;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0649b94..3541d65 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2097,7 +2097,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
sg_res = aligned_nrpages(sg->offset, sg->length);
sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset;
sg->dma_length = sg->length;
- pteval = page_to_phys(sg_page(sg)) | prot;
+ pteval = (sg_phys(sg) & PAGE_MASK) | prot;
phys_pfn = pteval >> VTD_PAGE_SHIFT;
}
@@ -3623,7 +3623,7 @@ static int intel_nontranslate_map_sg(struct device *hddev,
for_each_sg(sglist, sg, nelems, i) {
BUG_ON(!sg_page(sg));
- sg->dma_address = page_to_phys(sg_page(sg)) + sg->offset;
+ sg->dma_address = sg_phys(sg);
sg->dma_length = sg->length;
}
return nelems;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f286090..049df49 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1408,7 +1408,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
for_each_sg(sg, s, nents, i) {
- phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
+ phys_addr_t phys = sg_phys(s);
/*
* We are mapping on IOMMU page boundaries, so offset within
diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index 5474615..f7b6ef9 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -81,7 +81,7 @@ static int ion_chunk_heap_allocate(struct ion_heap *heap,
err:
sg = table->sgl;
for (i -= 1; i >= 0; i--) {
- gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)),
+ gen_pool_free(chunk_heap->pool, sg_phys(sg) & PAGE_MASK,
sg->length);
sg = sg_next(sg);
}
@@ -109,7 +109,7 @@ static void ion_chunk_heap_free(struct ion_buffer *buffer)
DMA_BIDIRECTIONAL);
for_each_sg(table->sgl, sg, table->nents, i) {
- gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)),
+ gen_pool_free(chunk_heap->pool, sg_phys(sg) & PAGE_MASK,
sg->length);
}
chunk_heap->allocated -= allocated_size;
--
1.9.1
Use sg_pfn to get a the PFN and skip checks that require a kernel
virtual address.
Signed-off-by: Christoph Hellwig <[email protected]>
---
lib/dma-debug.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index dace71f..a215a80 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -1368,7 +1368,7 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
entry->type = dma_debug_sg;
entry->dev = dev;
- entry->pfn = page_to_pfn(sg_page(s));
+ entry->pfn = sg_pfn(s);
entry->offset = s->offset,
entry->size = sg_dma_len(s);
entry->dev_addr = sg_dma_address(s);
@@ -1376,7 +1376,7 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
entry->sg_call_ents = nents;
entry->sg_mapped_ents = mapped_ents;
- if (!PageHighMem(sg_page(s))) {
+ if (sg_has_page(s) && !PageHighMem(sg_page(s))) {
check_for_stack(dev, sg_virt(s));
check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
}
@@ -1419,7 +1419,7 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
struct dma_debug_entry ref = {
.type = dma_debug_sg,
.dev = dev,
- .pfn = page_to_pfn(sg_page(s)),
+ .pfn = sg_pfn(s),
.offset = s->offset,
.dev_addr = sg_dma_address(s),
.size = sg_dma_len(s),
@@ -1580,7 +1580,7 @@ void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
struct dma_debug_entry ref = {
.type = dma_debug_sg,
.dev = dev,
- .pfn = page_to_pfn(sg_page(s)),
+ .pfn = sg_pfn(s),
.offset = s->offset,
.dev_addr = sg_dma_address(s),
.size = sg_dma_len(s),
@@ -1613,7 +1613,7 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
struct dma_debug_entry ref = {
.type = dma_debug_sg,
.dev = dev,
- .pfn = page_to_pfn(sg_page(s)),
+ .pfn = sg_pfn(s),
.offset = s->offset,
.dev_addr = sg_dma_address(s),
.size = sg_dma_len(s),
--
1.9.1
Just remove a BUG_ON, the code handles them just fine as-is.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/kernel/pci-nommu.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index da15918..a218059 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -63,7 +63,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
WARN_ON(nents == 0 || sg[0].length == 0);
for_each_sg(sg, s, nents, i) {
- BUG_ON(!sg_page(s));
s->dma_address = sg_phys(s);
if (!check_addr("map_sg", hwdev, s->dma_address, s->length))
return 0;
--
1.9.1
For the iommu offset we just need and offset into the page. Calculate
that using the physical address instead of using the virtual address
so that we don't require a virtual mapping.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/kernel/pci-calgary_64.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 0497f71..8f1581d 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -368,16 +368,14 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg,
{
struct iommu_table *tbl = find_iommu_table(dev);
struct scatterlist *s;
- unsigned long vaddr;
+ unsigned long paddr;
unsigned int npages;
unsigned long entry;
int i;
for_each_sg(sg, s, nelems, i) {
- BUG_ON(!sg_page(s));
-
- vaddr = (unsigned long) sg_virt(s);
- npages = iommu_num_pages(vaddr, s->length, PAGE_SIZE);
+ paddr = sg_phys(s);
+ npages = iommu_num_pages(paddr, s->length, PAGE_SIZE);
entry = iommu_range_alloc(dev, tbl, npages);
if (entry == DMA_ERROR_CODE) {
@@ -389,7 +387,7 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg,
s->dma_address = (entry << PAGE_SHIFT) | s->offset;
/* insert into HW table */
- tce_build(tbl, entry, npages, vaddr & PAGE_MASK, dir);
+ tce_build(tbl, entry, npages, paddr & PAGE_MASK, dir);
s->dma_length = s->length;
}
--
1.9.1
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't
require a kernel virtual address.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/alpha/kernel/pci-noop.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c
index df24b76..7319151 100644
--- a/arch/alpha/kernel/pci-noop.c
+++ b/arch/alpha/kernel/pci-noop.c
@@ -145,11 +145,7 @@ static int alpha_noop_map_sg(struct device *dev, struct scatterlist *sgl, int ne
struct scatterlist *sg;
for_each_sg(sgl, sg, nents, i) {
- void *va;
-
- BUG_ON(!sg_page(sg));
- va = sg_virt(sg);
- sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va);
+ sg_dma_address(sg) = (dma_addr_t)sg_phys(sg);
sg_dma_len(sg) = sg->length;
}
--
1.9.1
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't
require a kernel virtual address, and switch a few debug printfs to
print physical instead of virtual addresses.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/alpha/kernel/pci_iommu.c | 36 +++++++++++++++---------------------
1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index eddee77..5d46b49 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -248,20 +248,17 @@ static int pci_dac_dma_supported(struct pci_dev *dev, u64 mask)
until either pci_unmap_single or pci_dma_sync_single is performed. */
static dma_addr_t
-pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size,
+pci_map_single_1(struct pci_dev *pdev, unsigned long paddr, size_t size,
int dac_allowed)
{
struct pci_controller *hose = pdev ? pdev->sysdata : pci_isa_hose;
dma_addr_t max_dma = pdev ? pdev->dma_mask : ISA_DMA_MASK;
struct pci_iommu_arena *arena;
long npages, dma_ofs, i;
- unsigned long paddr;
dma_addr_t ret;
unsigned int align = 0;
struct device *dev = pdev ? &pdev->dev : NULL;
- paddr = __pa(cpu_addr);
-
#if !DEBUG_NODIRECT
/* First check to see if we can use the direct map window. */
if (paddr + size + __direct_map_base - 1 <= max_dma
@@ -269,7 +266,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size,
ret = paddr + __direct_map_base;
DBGA2("pci_map_single: [%p,%zx] -> direct %llx from %pf\n",
- cpu_addr, size, ret, __builtin_return_address(0));
+ paddr, size, ret, __builtin_return_address(0));
return ret;
}
@@ -280,7 +277,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size,
ret = paddr + alpha_mv.pci_dac_offset;
DBGA2("pci_map_single: [%p,%zx] -> DAC %llx from %pf\n",
- cpu_addr, size, ret, __builtin_return_address(0));
+ paddr, size, ret, __builtin_return_address(0));
return ret;
}
@@ -309,15 +306,15 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size,
return 0;
}
+ offset = paddr & ~PAGE_MASK;
paddr &= PAGE_MASK;
for (i = 0; i < npages; ++i, paddr += PAGE_SIZE)
arena->ptes[i + dma_ofs] = mk_iommu_pte(paddr);
- ret = arena->dma_base + dma_ofs * PAGE_SIZE;
- ret += (unsigned long)cpu_addr & ~PAGE_MASK;
+ ret = arena->dma_base + dma_ofs * PAGE_SIZE + offset;
DBGA2("pci_map_single: [%p,%zx] np %ld -> sg %llx from %pf\n",
- cpu_addr, size, npages, ret, __builtin_return_address(0));
+ paddr, size, npages, ret, __builtin_return_address(0));
return ret;
}
@@ -357,7 +354,7 @@ static dma_addr_t alpha_pci_map_page(struct device *dev, struct page *page,
BUG_ON(dir == PCI_DMA_NONE);
dac_allowed = pdev ? pci_dac_dma_supported(pdev, pdev->dma_mask) : 0;
- return pci_map_single_1(pdev, (char *)page_address(page) + offset,
+ return pci_map_single_1(pdev, page_to_phys(page) + offset,
size, dac_allowed);
}
@@ -453,7 +450,7 @@ try_again:
}
memset(cpu_addr, 0, size);
- *dma_addrp = pci_map_single_1(pdev, cpu_addr, size, 0);
+ *dma_addrp = pci_map_single_1(pdev, __pa(cpu_addr), size, 0);
if (*dma_addrp == 0) {
free_pages((unsigned long)cpu_addr, order);
if (alpha_mv.mv_pci_tbi || (gfp & GFP_DMA))
@@ -497,9 +494,6 @@ static void alpha_pci_free_coherent(struct device *dev, size_t size,
Write dma_length of each leader with the combined lengths of
the mergable followers. */
-#define SG_ENT_VIRT_ADDRESS(SG) (sg_virt((SG)))
-#define SG_ENT_PHYS_ADDRESS(SG) __pa(SG_ENT_VIRT_ADDRESS(SG))
-
static void
sg_classify(struct device *dev, struct scatterlist *sg, struct scatterlist *end,
int virt_ok)
@@ -512,13 +506,13 @@ sg_classify(struct device *dev, struct scatterlist *sg, struct scatterlist *end,
leader = sg;
leader_flag = 0;
leader_length = leader->length;
- next_paddr = SG_ENT_PHYS_ADDRESS(leader) + leader_length;
+ next_paddr = sg_phys(leader) + leader_length;
/* we will not marge sg without device. */
max_seg_size = dev ? dma_get_max_seg_size(dev) : 0;
for (++sg; sg < end; ++sg) {
unsigned long addr, len;
- addr = SG_ENT_PHYS_ADDRESS(sg);
+ addr = sg_phys(sg);
len = sg->length;
if (leader_length + len > max_seg_size)
@@ -555,7 +549,7 @@ sg_fill(struct device *dev, struct scatterlist *leader, struct scatterlist *end,
struct scatterlist *out, struct pci_iommu_arena *arena,
dma_addr_t max_dma, int dac_allowed)
{
- unsigned long paddr = SG_ENT_PHYS_ADDRESS(leader);
+ unsigned long paddr = sg_phys(leader);
long size = leader->dma_length;
struct scatterlist *sg;
unsigned long *ptes;
@@ -621,7 +615,7 @@ sg_fill(struct device *dev, struct scatterlist *leader, struct scatterlist *end,
#endif
size = sg->length;
- paddr = SG_ENT_PHYS_ADDRESS(sg);
+ paddr = sg_phys(sg);
while (sg+1 < end && (int) sg[1].dma_address == -1) {
size += sg[1].length;
@@ -636,11 +630,11 @@ sg_fill(struct device *dev, struct scatterlist *leader, struct scatterlist *end,
#if DEBUG_ALLOC > 0
DBGA(" (%ld) [%p,%x] np %ld\n",
- last_sg - leader, SG_ENT_VIRT_ADDRESS(last_sg),
+ last_sg - leader, sg_phys(last_sg),
last_sg->length, npages);
while (++last_sg <= sg) {
DBGA(" (%ld) [%p,%x] cont\n",
- last_sg - leader, SG_ENT_VIRT_ADDRESS(last_sg),
+ last_sg - leader, sg_phys(last_sg),
last_sg->length);
}
#endif
@@ -668,7 +662,7 @@ static int alpha_pci_map_sg(struct device *dev, struct scatterlist *sg,
if (nents == 1) {
sg->dma_length = sg->length;
sg->dma_address
- = pci_map_single_1(pdev, SG_ENT_VIRT_ADDRESS(sg),
+ = pci_map_single_1(pdev, sg_phys(sg),
sg->length, dac_allowed);
return sg->dma_address != 0;
}
--
1.9.1
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't
require a kernel virtual address.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/c6x/kernel/dma.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c
index ab7b12d..79cae03 100644
--- a/arch/c6x/kernel/dma.c
+++ b/arch/c6x/kernel/dma.c
@@ -68,8 +68,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist,
int i;
for_each_sg(sglist, sg, nents, i)
- sg->dma_address = dma_map_single(dev, sg_virt(sg), sg->length,
- dir);
+ sg->dma_address = sg_phys(sg);
debug_dma_map_sg(dev, sglist, nents, nents, dir);
--
1.9.1
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't
require a kernel virtual address.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/ia64/sn/pci/pci_dma.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d0853e8..8f713c8 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -18,9 +18,6 @@
#include <asm/sn/pcidev.h>
#include <asm/sn/sn_sal.h>
-#define SG_ENT_VIRT_ADDRESS(sg) (sg_virt((sg)))
-#define SG_ENT_PHYS_ADDRESS(SG) virt_to_phys(SG_ENT_VIRT_ADDRESS(SG))
-
/**
* sn_dma_supported - test a DMA mask
* @dev: device to test
@@ -291,7 +288,7 @@ static int sn_dma_map_sg(struct device *dev, struct scatterlist *sgl,
*/
for_each_sg(sgl, sg, nhwentries, i) {
dma_addr_t dma_addr;
- phys_addr = SG_ENT_PHYS_ADDRESS(sg);
+ phys_addr = sg_phys(sg);
if (dmabarr)
dma_addr = provider->dma_map_consistent(pdev,
phys_addr,
--
1.9.1
For the iommu offset we just need and offset into the page. Calculate
that using the physical address instead of using the virtual address
so that we don't require a virtual mapping.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/kernel/iommu.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..0f52e40 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -457,7 +457,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
max_seg_size = dma_get_max_seg_size(dev);
for_each_sg(sglist, s, nelems, i) {
- unsigned long vaddr, npages, entry, slen;
+ unsigned long paddr, npages, entry, slen;
slen = s->length;
/* Sanity check */
@@ -466,22 +466,22 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
continue;
}
/* Allocate iommu entries for that segment */
- vaddr = (unsigned long) sg_virt(s);
- npages = iommu_num_pages(vaddr, slen, IOMMU_PAGE_SIZE(tbl));
+ paddr = sg_phys(s);
+ npages = iommu_num_pages(paddr, slen, IOMMU_PAGE_SIZE(tbl));
align = 0;
if (tbl->it_page_shift < PAGE_SHIFT && slen >= PAGE_SIZE &&
- (vaddr & ~PAGE_MASK) == 0)
+ (paddr & ~PAGE_MASK) == 0)
align = PAGE_SHIFT - tbl->it_page_shift;
entry = iommu_range_alloc(dev, tbl, npages, &handle,
mask >> tbl->it_page_shift, align);
- DBG(" - vaddr: %lx, size: %lx\n", vaddr, slen);
+ DBG(" - paddr: %lx, size: %lx\n", paddr, slen);
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
if (printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
- "vaddr %lx npages %lu\n", tbl, vaddr,
+ "paddr %lx npages %lu\n", tbl, paddr,
npages);
goto failure;
}
@@ -496,7 +496,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
/* Insert into HW table */
build_fail = tbl->it_ops->set(tbl, entry, npages,
- vaddr & IOMMU_PAGE_MASK(tbl),
+ paddr & IOMMU_PAGE_MASK(tbl),
direction, attrs);
if(unlikely(build_fail))
goto failure;
--
1.9.1
Use sg_phys() instead of __pa(sg_virt(sg)) so that we don't
require a kernel virtual address.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sparc/kernel/iommu.c | 2 +-
arch/sparc/kernel/iommu_common.h | 4 +---
arch/sparc/kernel/pci_sun4v.c | 2 +-
3 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5320689..2ad89d2 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -486,7 +486,7 @@ static int dma_4u_map_sg(struct device *dev, struct scatterlist *sglist,
continue;
}
/* Allocate iommu entries for that segment */
- paddr = (unsigned long) SG_ENT_PHYS_ADDRESS(s);
+ paddr = sg_phys(s);
npages = iommu_num_pages(paddr, slen, IO_PAGE_SIZE);
entry = iommu_tbl_range_alloc(dev, &iommu->tbl, npages,
&handle, (unsigned long)(-1), 0);
diff --git a/arch/sparc/kernel/iommu_common.h b/arch/sparc/kernel/iommu_common.h
index b40cec2..8e2c211 100644
--- a/arch/sparc/kernel/iommu_common.h
+++ b/arch/sparc/kernel/iommu_common.h
@@ -33,15 +33,13 @@
*/
#define IOMMU_PAGE_SHIFT 13
-#define SG_ENT_PHYS_ADDRESS(SG) (__pa(sg_virt((SG))))
-
static inline int is_span_boundary(unsigned long entry,
unsigned long shift,
unsigned long boundary_size,
struct scatterlist *outs,
struct scatterlist *sg)
{
- unsigned long paddr = SG_ENT_PHYS_ADDRESS(outs);
+ unsigned long paddr = sg_phys(outs);
int nr = iommu_num_pages(paddr, outs->dma_length + sg->length,
IO_PAGE_SIZE);
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index d2fe57d..a7a6e41 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -370,7 +370,7 @@ static int dma_4v_map_sg(struct device *dev, struct scatterlist *sglist,
continue;
}
/* Allocate iommu entries for that segment */
- paddr = (unsigned long) SG_ENT_PHYS_ADDRESS(s);
+ paddr = sg_phys(s);
npages = iommu_num_pages(paddr, slen, IO_PAGE_SIZE);
entry = iommu_tbl_range_alloc(dev, &iommu->tbl, npages,
&handle, (unsigned long)(-1), 0);
--
1.9.1
Just remove a BUG_ON, the code handles them just fine as-is.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/mn10300/include/asm/dma-mapping.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/mn10300/include/asm/dma-mapping.h b/arch/mn10300/include/asm/dma-mapping.h
index a18abfc..b1b1050 100644
--- a/arch/mn10300/include/asm/dma-mapping.h
+++ b/arch/mn10300/include/asm/dma-mapping.h
@@ -57,11 +57,8 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
BUG_ON(!valid_dma_direction(direction));
WARN_ON(nents == 0 || sglist[0].length == 0);
- for_each_sg(sglist, sg, nents, i) {
- BUG_ON(!sg_page(sg));
-
+ for_each_sg(sglist, sg, nents, i)
sg->dma_address = sg_phys(sg);
- }
mn10300_dcache_flush_inv();
return nents;
--
1.9.1
Use
sg_phys(sg) & PAGE_MASK
instead of
page_to_pfn(sg_page(sg)) << PAGE_SHIFT
to get at the page-aligned physical address ofa SG entry, so that
we don't require a page backing for SG entries.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sparc/kernel/ldc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c
index 1ae5eb1..0a29974 100644
--- a/arch/sparc/kernel/ldc.c
+++ b/arch/sparc/kernel/ldc.c
@@ -2051,7 +2051,7 @@ static void fill_cookies(struct cookie_state *sp, unsigned long pa,
static int sg_count_one(struct scatterlist *sg)
{
- unsigned long base = page_to_pfn(sg_page(sg)) << PAGE_SHIFT;
+ unsigned long base = sg_phys(sg) & PAGE_MASK;
long len = sg->length;
if ((sg->offset | len) & (8UL - 1))
@@ -2114,7 +2114,7 @@ int ldc_map_sg(struct ldc_channel *lp,
state.nc = 0;
for_each_sg(sg, s, num_sg, i) {
- fill_cookies(&state, page_to_pfn(sg_page(s)) << PAGE_SHIFT,
+ fill_cookies(&state, sg_phys(s) & PAGE_MASK,
s->offset, s->length);
}
--
1.9.1
For the iommu offset we just need and offset into the page. Calculate
that using the physical address instead of using the virtual address
so that we don't require a virtual mapping.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sparc/mm/io-unit.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/arch/sparc/mm/io-unit.c b/arch/sparc/mm/io-unit.c
index f311bf2..82f97ae 100644
--- a/arch/sparc/mm/io-unit.c
+++ b/arch/sparc/mm/io-unit.c
@@ -91,13 +91,14 @@ static int __init iounit_init(void)
subsys_initcall(iounit_init);
/* One has to hold iounit->lock to call this */
-static unsigned long iounit_get_area(struct iounit_struct *iounit, unsigned long vaddr, int size)
+static dma_addr_t iounit_get_area(struct iounit_struct *iounit,
+ unsigned long paddr, int size)
{
int i, j, k, npages;
- unsigned long rotor, scan, limit;
+ unsigned long rotor, scan, limit, dma_addr;
iopte_t iopte;
- npages = ((vaddr & ~PAGE_MASK) + size + (PAGE_SIZE-1)) >> PAGE_SHIFT;
+ npages = ((paddr & ~PAGE_MASK) + size + (PAGE_SIZE-1)) >> PAGE_SHIFT;
/* A tiny bit of magic ingredience :) */
switch (npages) {
@@ -106,7 +107,7 @@ static unsigned long iounit_get_area(struct iounit_struct *iounit, unsigned long
default: i = 0x0213; break;
}
- IOD(("iounit_get_area(%08lx,%d[%d])=", vaddr, size, npages));
+ IOD(("iounit_get_area(%08lx,%d[%d])=", paddr, size, npages));
next: j = (i & 15);
rotor = iounit->rotor[j - 1];
@@ -121,7 +122,7 @@ nexti: scan = find_next_zero_bit(iounit->bmap, limit, scan);
}
i >>= 4;
if (!(i & 15))
- panic("iounit_get_area: Couldn't find free iopte slots for (%08lx,%d)\n", vaddr, size);
+ panic("iounit_get_area: Couldn't find free iopte slots for (%08lx,%d)\n", paddr, size);
goto next;
}
for (k = 1, scan++; k < npages; k++)
@@ -129,14 +130,14 @@ nexti: scan = find_next_zero_bit(iounit->bmap, limit, scan);
goto nexti;
iounit->rotor[j - 1] = (scan < limit) ? scan : iounit->limit[j - 1];
scan -= npages;
- iopte = MKIOPTE(__pa(vaddr & PAGE_MASK));
- vaddr = IOUNIT_DMA_BASE + (scan << PAGE_SHIFT) + (vaddr & ~PAGE_MASK);
+ iopte = MKIOPTE(paddr & PAGE_MASK);
+ dma_addr = IOUNIT_DMA_BASE + (scan << PAGE_SHIFT) + (paddr & ~PAGE_MASK);
for (k = 0; k < npages; k++, iopte = __iopte(iopte_val(iopte) + 0x100), scan++) {
set_bit(scan, iounit->bmap);
sbus_writel(iopte, &iounit->page_table[scan]);
}
- IOD(("%08lx\n", vaddr));
- return vaddr;
+ IOD(("%08lx\n", dma_addr));
+ return dma_addr;
}
static __u32 iounit_get_scsi_one(struct device *dev, char *vaddr, unsigned long len)
@@ -145,7 +146,7 @@ static __u32 iounit_get_scsi_one(struct device *dev, char *vaddr, unsigned long
unsigned long ret, flags;
spin_lock_irqsave(&iounit->lock, flags);
- ret = iounit_get_area(iounit, (unsigned long)vaddr, len);
+ ret = iounit_get_area(iounit, virt_to_phys(vaddr), len);
spin_unlock_irqrestore(&iounit->lock, flags);
return ret;
}
@@ -159,7 +160,7 @@ static void iounit_get_scsi_sgl(struct device *dev, struct scatterlist *sg, int
spin_lock_irqsave(&iounit->lock, flags);
while (sz != 0) {
--sz;
- sg->dma_address = iounit_get_area(iounit, (unsigned long) sg_virt(sg), sg->length);
+ sg->dma_address = iounit_get_area(iounit, sg_phys(sg), sg->length);
sg->dma_length = sg->length;
sg = sg_next(sg);
}
--
1.9.1
Pass a PFN to iommu_get_one instad of calculating it locall from a
page structure so that we don't need pages for every address we can
DMA to or from.
Also further restrict the cache flushing as we now have a non-highmem
way of not kernel virtual mapped physical addresses.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sparc/mm/iommu.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c
index 491511d..3ed53d7 100644
--- a/arch/sparc/mm/iommu.c
+++ b/arch/sparc/mm/iommu.c
@@ -174,7 +174,7 @@ static void iommu_flush_iotlb(iopte_t *iopte, unsigned int niopte)
}
}
-static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
+static u32 iommu_get_one(struct device *dev, unsigned long pfn, int npages)
{
struct iommu_struct *iommu = dev->archdata.iommu;
int ioptex;
@@ -183,7 +183,7 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
int i;
/* page color = pfn of page */
- ioptex = bit_map_string_get(&iommu->usemap, npages, page_to_pfn(page));
+ ioptex = bit_map_string_get(&iommu->usemap, npages, pfn);
if (ioptex < 0)
panic("iommu out");
busa0 = iommu->start + (ioptex << PAGE_SHIFT);
@@ -192,11 +192,11 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
busa = busa0;
iopte = iopte0;
for (i = 0; i < npages; i++) {
- iopte_val(*iopte) = MKIOPTE(page_to_pfn(page), IOPERM);
+ iopte_val(*iopte) = MKIOPTE(pfn, IOPERM);
iommu_invalidate_page(iommu->regs, busa);
busa += PAGE_SIZE;
iopte++;
- page++;
+ pfn++;
}
iommu_flush_iotlb(iopte0, npages);
@@ -214,7 +214,7 @@ static u32 iommu_get_scsi_one(struct device *dev, char *vaddr, unsigned int len)
off = (unsigned long)vaddr & ~PAGE_MASK;
npages = (off + len + PAGE_SIZE-1) >> PAGE_SHIFT;
page = virt_to_page((unsigned long)vaddr & PAGE_MASK);
- busa = iommu_get_one(dev, page, npages);
+ busa = iommu_get_one(dev, page_to_pfn(page), npages);
return busa + off;
}
@@ -243,7 +243,7 @@ static void iommu_get_scsi_sgl_gflush(struct device *dev, struct scatterlist *sg
while (sz != 0) {
--sz;
n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
- sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset;
+ sg->dma_address = iommu_get_one(dev, sg_pfn(sg), n) + sg->offset;
sg->dma_length = sg->length;
sg = sg_next(sg);
}
@@ -264,7 +264,8 @@ static void iommu_get_scsi_sgl_pflush(struct device *dev, struct scatterlist *sg
* XXX Is this a good assumption?
* XXX What if someone else unmaps it here and races us?
*/
- if ((page = (unsigned long) page_address(sg_page(sg))) != 0) {
+ if (sg_has_page(sg) &&
+ (page = (unsigned long) page_address(sg_page(sg))) != 0) {
for (i = 0; i < n; i++) {
if (page != oldpage) { /* Already flushed? */
flush_page_for_dma(page);
@@ -274,7 +275,7 @@ static void iommu_get_scsi_sgl_pflush(struct device *dev, struct scatterlist *sg
}
}
- sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset;
+ sg->dma_address = iommu_get_one(dev, sg_pfn(sg), n) + sg->offset;
sg->dma_length = sg->length;
sg = sg_next(sg);
}
--
1.9.1
Use sg_phys() instead of page_to_phys(sg_page(sg)) so that we don't
require a page structure for all DMA memory.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/s390/pci/pci_dma.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 6fd8d58..aae5a47 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -272,14 +272,13 @@ int dma_set_mask(struct device *dev, u64 mask)
}
EXPORT_SYMBOL_GPL(dma_set_mask);
-static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
+static dma_addr_t s390_dma_map_phys(struct device *dev, unsigned long pa,
+ size_t size,
enum dma_data_direction direction,
struct dma_attrs *attrs)
{
struct zpci_dev *zdev = get_zdev(to_pci_dev(dev));
unsigned long nr_pages, iommu_page_index;
- unsigned long pa = page_to_phys(page) + offset;
int flags = ZPCI_PTE_VALID;
dma_addr_t dma_addr;
@@ -301,7 +300,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
if (!dma_update_trans(zdev, pa, dma_addr, size, flags)) {
atomic64_add(nr_pages, &zdev->mapped_pages);
- return dma_addr + (offset & ~PAGE_MASK);
+ return dma_addr + (pa & ~PAGE_MASK);
}
out_free:
@@ -312,6 +311,16 @@ out_err:
return DMA_ERROR_CODE;
}
+static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction direction,
+ struct dma_attrs *attrs)
+{
+ unsigned long pa = page_to_phys(page) + offset;
+
+ return s390_dma_map_phys(dev, pa, size, direction, attrs);
+}
+
static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr,
size_t size, enum dma_data_direction direction,
struct dma_attrs *attrs)
@@ -384,8 +393,7 @@ static int s390_dma_map_sg(struct device *dev, struct scatterlist *sg,
int i;
for_each_sg(sg, s, nr_elements, i) {
- struct page *page = sg_page(s);
- s->dma_address = s390_dma_map_pages(dev, page, s->offset,
+ s->dma_address = s390_dma_map_phys(dev, sg_phys(s),
s->length, dir, NULL);
if (!dma_mapping_error(dev, s->dma_address)) {
s->dma_length = s->length;
--
1.9.1
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/ia64/hp/common/sba_iommu.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 344387a..9e5aa8e 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -248,8 +248,6 @@ static int reserve_sba_gart = 1;
static SBA_INLINE void sba_mark_invalid(struct ioc *, dma_addr_t, size_t);
static SBA_INLINE void sba_free_range(struct ioc *, dma_addr_t, size_t);
-#define sba_sg_address(sg) sg_virt((sg))
-
#ifdef FULL_VALID_PDIR
static u64 prefetch_spill_page;
#endif
@@ -397,7 +395,7 @@ sba_dump_sg( struct ioc *ioc, struct scatterlist *startsg, int nents)
while (nents-- > 0) {
printk(KERN_DEBUG " %d : DMA %08lx/%05x CPU %p\n", nents,
startsg->dma_address, startsg->dma_length,
- sba_sg_address(startsg));
+ sg_virt(startsg));
startsg = sg_next(startsg);
}
}
@@ -409,7 +407,7 @@ sba_check_sg( struct ioc *ioc, struct scatterlist *startsg, int nents)
int the_nents = nents;
while (the_nents-- > 0) {
- if (sba_sg_address(the_sg) == 0x0UL)
+ if (sg_virt(the_sg) == 0x0UL)
sba_dump_sg(NULL, startsg, nents);
the_sg = sg_next(the_sg);
}
@@ -1243,11 +1241,11 @@ sba_fill_pdir(
if (dump_run_sg)
printk(" %2d : %08lx/%05x %p\n",
nents, startsg->dma_address, cnt,
- sba_sg_address(startsg));
+ sg_virt(startsg));
#else
DBG_RUN_SG(" %d : %08lx/%05x %p\n",
nents, startsg->dma_address, cnt,
- sba_sg_address(startsg));
+ sg_virt(startsg));
#endif
/*
** Look for the start of a new DMA stream
@@ -1267,7 +1265,7 @@ sba_fill_pdir(
** Look for a VCONTIG chunk
*/
if (cnt) {
- unsigned long vaddr = (unsigned long) sba_sg_address(startsg);
+ unsigned long vaddr = (unsigned long) sg_virt(startsg);
ASSERT(pdirp);
/* Since multiple Vcontig blocks could make up
@@ -1335,7 +1333,7 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
int idx;
while (nents > 0) {
- unsigned long vaddr = (unsigned long) sba_sg_address(startsg);
+ unsigned long vaddr = (unsigned long) sg_virt(startsg);
/*
** Prepare for first/next DMA stream
@@ -1380,7 +1378,7 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
**
** append the next transaction?
*/
- vaddr = (unsigned long) sba_sg_address(startsg);
+ vaddr = (unsigned long) sg_virt(startsg);
if (vcontig_end == vaddr)
{
vcontig_len += startsg->length;
@@ -1479,7 +1477,7 @@ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist,
if (likely((ioc->dma_mask & ~to_pci_dev(dev)->dma_mask) == 0)) {
for_each_sg(sglist, sg, nents, filled) {
sg->dma_length = sg->length;
- sg->dma_address = virt_to_phys(sba_sg_address(sg));
+ sg->dma_address = virt_to_phys(sg_virt(sg));
}
return filled;
}
@@ -1487,7 +1485,7 @@ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist,
/* Fast path single entry scatterlists. */
if (nents == 1) {
sglist->dma_length = sglist->length;
- sglist->dma_address = sba_map_single_attrs(dev, sba_sg_address(sglist), sglist->length, dir, attrs);
+ sglist->dma_address = sba_map_single_attrs(dev, sg_virt(sglist), sglist->length, dir, attrs);
return 1;
}
@@ -1563,7 +1561,7 @@ static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist,
#endif
DBG_RUN_SG("%s() START %d entries, %p,%x\n",
- __func__, nents, sba_sg_address(sglist), sglist->length);
+ __func__, nents, sg_virt(sglist), sglist->length);
#ifdef ASSERT_PDIR_SANITY
ioc = GET_IOC(dev);
--
1.9.1
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/nios2/mm/dma-mapping.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c
index ac5da75..1a0a68d 100644
--- a/arch/nios2/mm/dma-mapping.c
+++ b/arch/nios2/mm/dma-mapping.c
@@ -64,13 +64,11 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
BUG_ON(!valid_dma_direction(direction));
for_each_sg(sg, sg, nents, i) {
- void *addr;
-
- addr = sg_virt(sg);
- if (addr) {
- __dma_sync_for_device(addr, sg->length, direction);
- sg->dma_address = sg_phys(sg);
+ if (sg_has_page(sg)) {
+ __dma_sync_for_device(sg_virt(sg), sg->length,
+ direction);
}
+ sg->dma_address = sg_phys(sg);
}
return nents;
@@ -113,9 +111,8 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
return;
for_each_sg(sg, sg, nhwentries, i) {
- addr = sg_virt(sg);
- if (addr)
- __dma_sync_for_cpu(addr, sg->length, direction);
+ if (sg_has_page(sg))
+ __dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
}
}
EXPORT_SYMBOL(dma_unmap_sg);
@@ -166,8 +163,10 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
BUG_ON(!valid_dma_direction(direction));
/* Make sure that gcc doesn't leave the empty loop body. */
- for_each_sg(sg, sg, nelems, i)
- __dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
+ for_each_sg(sg, sg, nelems, i) {
+ if (sg_has_page(sg))
+ __dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
+ }
}
EXPORT_SYMBOL(dma_sync_sg_for_cpu);
@@ -179,8 +178,10 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
BUG_ON(!valid_dma_direction(direction));
/* Make sure that gcc doesn't leave the empty loop body. */
- for_each_sg(sg, sg, nelems, i)
- __dma_sync_for_device(sg_virt(sg), sg->length, direction);
-
+ for_each_sg(sg, sg, nelems, i) {
+ if (sg_has_page(sg))
+ __dma_sync_for_device(sg_virt(sg), sg->length,
+ direction);
+ }
}
EXPORT_SYMBOL(dma_sync_sg_for_device);
--
1.9.1
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/arc/include/asm/dma-mapping.h | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h
index 2d28ba9..42eb526 100644
--- a/arch/arc/include/asm/dma-mapping.h
+++ b/arch/arc/include/asm/dma-mapping.h
@@ -108,9 +108,13 @@ dma_map_sg(struct device *dev, struct scatterlist *sg,
struct scatterlist *s;
int i;
- for_each_sg(sg, s, nents, i)
- s->dma_address = dma_map_page(dev, sg_page(s), s->offset,
- s->length, dir);
+ for_each_sg(sg, s, nents, i) {
+ if (sg_has_page(s)) {
+ _dma_cache_sync((unsigned long)sg_virt(s), s->length,
+ dir);
+ }
+ s->dma_address = sg_phys(s);
+ }
return nents;
}
@@ -163,8 +167,12 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nelems, i)
- _dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir);
+ for_each_sg(sglist, sg, nelems, i) {
+ if (sg_has_page(sg)) {
+ _dma_cache_sync((unsigned int)sg_virt(sg), sg->length,
+ dir);
+ }
+ }
}
static inline void
@@ -174,8 +182,12 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nelems, i)
- _dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir);
+ for_each_sg(sglist, sg, nelems, i) {
+ if (sg_has_page(sg)) {
+ _dma_cache_sync((unsigned int)sg_virt(sg), sg->length,
+ dir);
+ }
+ }
}
static inline int dma_supported(struct device *dev, u64 dma_mask)
--
1.9.1
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly, bypassing the noop
page_to_bus.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/avr32/include/asm/dma-mapping.h | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/avr32/include/asm/dma-mapping.h b/arch/avr32/include/asm/dma-mapping.h
index ae7ac92..a662ce2 100644
--- a/arch/avr32/include/asm/dma-mapping.h
+++ b/arch/avr32/include/asm/dma-mapping.h
@@ -216,11 +216,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
struct scatterlist *sg;
for_each_sg(sglist, sg, nents, i) {
- char *virt;
-
- sg->dma_address = page_to_bus(sg_page(sg)) + sg->offset;
- virt = sg_virt(sg);
- dma_cache_sync(dev, virt, sg->length, direction);
+ sg->dma_address = sg_phys(sg);
+ if (sg_has_page(sg))
+ dma_cache_sync(dev, sg_virt(sg), sg->length, direction);
}
return nents;
@@ -328,8 +326,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nents, i)
- dma_cache_sync(dev, sg_virt(sg), sg->length, direction);
+ for_each_sg(sglist, sg, nents, i) {
+ if (sg_has_page(sg))
+ dma_cache_sync(dev, sg_virt(sg), sg->length, direction);
+ }
}
/* Now for the API extensions over the pci_ one */
--
1.9.1
Switch from sg_virt to sg_phys as blackfin like all nommu architectures
has a 1:1 virtual to physical mapping.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/blackfin/kernel/dma-mapping.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/blackfin/kernel/dma-mapping.c b/arch/blackfin/kernel/dma-mapping.c
index df437e5..e2c4d1a 100644
--- a/arch/blackfin/kernel/dma-mapping.c
+++ b/arch/blackfin/kernel/dma-mapping.c
@@ -120,7 +120,7 @@ dma_map_sg(struct device *dev, struct scatterlist *sg_list, int nents,
int i;
for_each_sg(sg_list, sg, nents, i) {
- sg->dma_address = (dma_addr_t) sg_virt(sg);
+ sg->dma_address = sg_phys(sg);
__dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction);
}
@@ -135,7 +135,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg_list,
int i;
for_each_sg(sg_list, sg, nelems, i) {
- sg->dma_address = (dma_addr_t) sg_virt(sg);
+ sg->dma_address = sg_phys(sg);
__dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction);
}
}
--
1.9.1
Make all cache invalidation conditional on sg_has_page().
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/metag/include/asm/dma-mapping.h | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/arch/metag/include/asm/dma-mapping.h b/arch/metag/include/asm/dma-mapping.h
index eb5cdec..2ae9057 100644
--- a/arch/metag/include/asm/dma-mapping.h
+++ b/arch/metag/include/asm/dma-mapping.h
@@ -55,10 +55,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
WARN_ON(nents == 0 || sglist[0].length == 0);
for_each_sg(sglist, sg, nents, i) {
- BUG_ON(!sg_page(sg));
-
sg->dma_address = sg_phys(sg);
- dma_sync_for_device(sg_virt(sg), sg->length, direction);
+ if (sg_has_page(sg))
+ dma_sync_for_device(sg_virt(sg), sg->length, direction);
}
return nents;
@@ -94,10 +93,9 @@ dma_unmap_sg(struct device *dev, struct scatterlist *sglist, int nhwentries,
WARN_ON(nhwentries == 0 || sglist[0].length == 0);
for_each_sg(sglist, sg, nhwentries, i) {
- BUG_ON(!sg_page(sg));
-
sg->dma_address = sg_phys(sg);
- dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
+ if (sg_has_page(sg))
+ dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
}
}
@@ -140,8 +138,10 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nelems, i)
- dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
+ for_each_sg(sglist, sg, nelems, i) {
+ if (sg_has_page(sg))
+ dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
+ }
}
static inline void
@@ -151,8 +151,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nelems, i)
- dma_sync_for_device(sg_virt(sg), sg->length, direction);
+ for_each_sg(sglist, sg, nelems, i) {
+ if (sg_has_page(sg))
+ dma_sync_for_device(sg_virt(sg), sg->length, direction);
+ }
}
static inline int
--
1.9.1
Make all cache invalidation conditional on sg_has_page().
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/sh/kernel/dma-nommu.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 5b0bfcd..3b64dc7 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -33,9 +33,8 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg,
WARN_ON(nents == 0 || sg[0].length == 0);
for_each_sg(sg, s, nents, i) {
- BUG_ON(!sg_page(s));
-
- dma_cache_sync(dev, sg_virt(s), s->length, dir);
+ if (sg_has_page(s))
+ dma_cache_sync(dev, sg_virt(s), s->length, dir);
s->dma_address = sg_phys(s);
s->dma_length = s->length;
@@ -57,8 +56,10 @@ static void nommu_sync_sg(struct device *dev, struct scatterlist *sg,
struct scatterlist *s;
int i;
- for_each_sg(sg, s, nelems, i)
- dma_cache_sync(dev, sg_virt(s), s->length, dir);
+ for_each_sg(sg, s, nelems, i) {
+ if (sg_has_page(s))
+ dma_cache_sync(dev, sg_virt(s), s->length, dir);
+ }
}
#endif
--
1.9.1
Make all cache invalidation conditional on sg_has_page().
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/xtensa/include/asm/dma-mapping.h | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/xtensa/include/asm/dma-mapping.h b/arch/xtensa/include/asm/dma-mapping.h
index 1f5f6dc..262a1d1 100644
--- a/arch/xtensa/include/asm/dma-mapping.h
+++ b/arch/xtensa/include/asm/dma-mapping.h
@@ -61,10 +61,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
BUG_ON(direction == DMA_NONE);
for_each_sg(sglist, sg, nents, i) {
- BUG_ON(!sg_page(sg));
-
sg->dma_address = sg_phys(sg);
- consistent_sync(sg_virt(sg), sg->length, direction);
+ if (sg_has_page(sg))
+ consistent_sync(sg_virt(sg), sg->length, direction);
}
return nents;
@@ -131,8 +130,10 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nelems, i)
- consistent_sync(sg_virt(sg), sg->length, dir);
+ for_each_sg(sglist, sg, nelems, i) {
+ if (sg_has_page(sg))
+ consistent_sync(sg_virt(sg), sg->length, dir);
+ }
}
static inline void
@@ -142,8 +143,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
int i;
struct scatterlist *sg;
- for_each_sg(sglist, sg, nelems, i)
- consistent_sync(sg_virt(sg), sg->length, dir);
+ for_each_sg(sglist, sg, nelems, i) {
+ if (sg_has_page(sg))
+ consistent_sync(sg_virt(sg), sg->length, dir);
+ }
}
static inline int
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
--
1.9.1
Only call kmap_atomic_primary when the SG entry is mapped into
kernel virtual space.
XXX: the code already looks odd due to the lack of pairing between
kmap_atomic_primary and kunmap_atomic_primary. Does it work either
before or after this patch?
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/frv/mb93090-mb00/pci-dma.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c
index 4d1f01d..77b3a1c 100644
--- a/arch/frv/mb93090-mb00/pci-dma.c
+++ b/arch/frv/mb93090-mb00/pci-dma.c
@@ -63,6 +63,9 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
dampr2 = __get_DAMPR(2);
for_each_sg(sglist, sg, nents, i) {
+ if (!sg_has_page(sg))
+ continue;
+
vaddr = kmap_atomic_primary(sg_page(sg));
frv_dcache_writeback((unsigned long) vaddr,
--
1.9.1
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/openrisc/kernel/dma.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index 0b77ddb..94ed052 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -184,8 +184,13 @@ or1k_map_sg(struct device *dev, struct scatterlist *sg,
int i;
for_each_sg(sg, s, nents, i) {
- s->dma_address = or1k_map_page(dev, sg_page(s), s->offset,
- s->length, dir, NULL);
+ if (sg_has_page(s)) {
+ s->dma_address = or1k_map_page(dev, sg_page(s),
+ s->offset, s->length, dir,
+ NULL);
+ } else {
+ s->dma_address = sg_phys(s);
+ }
}
return nents;
--
1.9.1
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly. To do this consolidate
the two platform callouts using pages and virtual addresses into a
single one using a physical address.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/mips/bmips/dma.c | 9 ++------
arch/mips/include/asm/mach-ath25/dma-coherence.h | 10 ++-------
arch/mips/include/asm/mach-bmips/dma-coherence.h | 4 ++--
.../include/asm/mach-cavium-octeon/dma-coherence.h | 11 ++--------
arch/mips/include/asm/mach-generic/dma-coherence.h | 12 +++--------
arch/mips/include/asm/mach-ip27/dma-coherence.h | 16 +++-----------
arch/mips/include/asm/mach-ip32/dma-coherence.h | 19 +++-------------
arch/mips/include/asm/mach-jazz/dma-coherence.h | 11 +++-------
.../include/asm/mach-loongson64/dma-coherence.h | 16 +++-----------
arch/mips/mm/dma-default.c | 25 ++++++++++++----------
10 files changed, 37 insertions(+), 96 deletions(-)
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index 04790f4..13fc891 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -52,14 +52,9 @@ static dma_addr_t bmips_phys_to_dma(struct device *dev, phys_addr_t pa)
return pa;
}
-dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, size_t size)
+dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys, size_t size)
{
- return bmips_phys_to_dma(dev, virt_to_phys(addr));
-}
-
-dma_addr_t plat_map_dma_mem_page(struct device *dev, struct page *page)
-{
- return bmips_phys_to_dma(dev, page_to_phys(page));
+ return bmips_phys_to_dma(dev, phys);
}
unsigned long plat_dma_addr_to_phys(struct device *dev, dma_addr_t dma_addr)
diff --git a/arch/mips/include/asm/mach-ath25/dma-coherence.h b/arch/mips/include/asm/mach-ath25/dma-coherence.h
index d5defdd..4330de6 100644
--- a/arch/mips/include/asm/mach-ath25/dma-coherence.h
+++ b/arch/mips/include/asm/mach-ath25/dma-coherence.h
@@ -31,15 +31,9 @@ static inline dma_addr_t ath25_dev_offset(struct device *dev)
}
static inline dma_addr_t
-plat_map_dma_mem(struct device *dev, void *addr, size_t size)
+plat_map_dma_mem(struct device *dev, phys_addr_t phys, size_t size)
{
- return virt_to_phys(addr) + ath25_dev_offset(dev);
-}
-
-static inline dma_addr_t
-plat_map_dma_mem_page(struct device *dev, struct page *page)
-{
- return page_to_phys(page) + ath25_dev_offset(dev);
+ return phys + ath25_dev_offset(dev);
}
static inline unsigned long
diff --git a/arch/mips/include/asm/mach-bmips/dma-coherence.h b/arch/mips/include/asm/mach-bmips/dma-coherence.h
index d29781f..1b9a7f4 100644
--- a/arch/mips/include/asm/mach-bmips/dma-coherence.h
+++ b/arch/mips/include/asm/mach-bmips/dma-coherence.h
@@ -21,8 +21,8 @@
struct device;
-extern dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, size_t size);
-extern dma_addr_t plat_map_dma_mem_page(struct device *dev, struct page *page);
+extern dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
+ size_t size);
extern unsigned long plat_dma_addr_to_phys(struct device *dev,
dma_addr_t dma_addr);
diff --git a/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h b/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h
index 460042e..d0988c7 100644
--- a/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h
+++ b/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h
@@ -19,15 +19,8 @@ struct device;
extern void octeon_pci_dma_init(void);
-static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr,
- size_t size)
-{
- BUG();
- return 0;
-}
-
-static inline dma_addr_t plat_map_dma_mem_page(struct device *dev,
- struct page *page)
+static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
+ size_t size)
{
BUG();
return 0;
diff --git a/arch/mips/include/asm/mach-generic/dma-coherence.h b/arch/mips/include/asm/mach-generic/dma-coherence.h
index 0f8a354..2dfb133 100644
--- a/arch/mips/include/asm/mach-generic/dma-coherence.h
+++ b/arch/mips/include/asm/mach-generic/dma-coherence.h
@@ -11,16 +11,10 @@
struct device;
-static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr,
- size_t size)
+static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
+ size_t size)
{
- return virt_to_phys(addr);
-}
-
-static inline dma_addr_t plat_map_dma_mem_page(struct device *dev,
- struct page *page)
-{
- return page_to_phys(page);
+ return phys;
}
static inline unsigned long plat_dma_addr_to_phys(struct device *dev,
diff --git a/arch/mips/include/asm/mach-ip27/dma-coherence.h b/arch/mips/include/asm/mach-ip27/dma-coherence.h
index 1daa644..2578b9d 100644
--- a/arch/mips/include/asm/mach-ip27/dma-coherence.h
+++ b/arch/mips/include/asm/mach-ip27/dma-coherence.h
@@ -18,20 +18,10 @@
struct device;
-static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr,
- size_t size)
+static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
+ size_t size)
{
- dma_addr_t pa = dev_to_baddr(dev, virt_to_phys(addr));
-
- return pa;
-}
-
-static inline dma_addr_t plat_map_dma_mem_page(struct device *dev,
- struct page *page)
-{
- dma_addr_t pa = dev_to_baddr(dev, page_to_phys(page));
-
- return pa;
+ return dev_to_baddr(dev, phys);
}
static inline unsigned long plat_dma_addr_to_phys(struct device *dev,
diff --git a/arch/mips/include/asm/mach-ip32/dma-coherence.h b/arch/mips/include/asm/mach-ip32/dma-coherence.h
index 0a0b0e2..a5e8d75 100644
--- a/arch/mips/include/asm/mach-ip32/dma-coherence.h
+++ b/arch/mips/include/asm/mach-ip32/dma-coherence.h
@@ -26,23 +26,10 @@ struct device;
#define RAM_OFFSET_MASK 0x3fffffffUL
-static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr,
- size_t size)
+static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
+ size_t size)
{
- dma_addr_t pa = virt_to_phys(addr) & RAM_OFFSET_MASK;
-
- if (dev == NULL)
- pa += CRIME_HI_MEM_BASE;
-
- return pa;
-}
-
-static inline dma_addr_t plat_map_dma_mem_page(struct device *dev,
- struct page *page)
-{
- dma_addr_t pa;
-
- pa = page_to_phys(page) & RAM_OFFSET_MASK;
+ dma_addr_t pa = phys & RAM_OFFSET_MASK;
if (dev == NULL)
pa += CRIME_HI_MEM_BASE;
diff --git a/arch/mips/include/asm/mach-jazz/dma-coherence.h b/arch/mips/include/asm/mach-jazz/dma-coherence.h
index dc347c2..7739782 100644
--- a/arch/mips/include/asm/mach-jazz/dma-coherence.h
+++ b/arch/mips/include/asm/mach-jazz/dma-coherence.h
@@ -12,15 +12,10 @@
struct device;
-static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, size_t size)
+static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
+ size_t size)
{
- return vdma_alloc(virt_to_phys(addr), size);
-}
-
-static inline dma_addr_t plat_map_dma_mem_page(struct device *dev,
- struct page *page)
-{
- return vdma_alloc(page_to_phys(page), PAGE_SIZE);
+ return vdma_alloc(phys, size);
}
static inline unsigned long plat_dma_addr_to_phys(struct device *dev,
diff --git a/arch/mips/include/asm/mach-loongson64/dma-coherence.h b/arch/mips/include/asm/mach-loongson64/dma-coherence.h
index 1602a9e..a75d4ba 100644
--- a/arch/mips/include/asm/mach-loongson64/dma-coherence.h
+++ b/arch/mips/include/asm/mach-loongson64/dma-coherence.h
@@ -19,23 +19,13 @@ struct device;
extern dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
extern phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
-static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr,
+static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys,
size_t size)
{
#ifdef CONFIG_CPU_LOONGSON3
- return phys_to_dma(dev, virt_to_phys(addr));
+ return phys_to_dma(dev, phys);
#else
- return virt_to_phys(addr) | 0x80000000;
-#endif
-}
-
-static inline dma_addr_t plat_map_dma_mem_page(struct device *dev,
- struct page *page)
-{
-#ifdef CONFIG_CPU_LOONGSON3
- return phys_to_dma(dev, page_to_phys(page));
-#else
- return page_to_phys(page) | 0x80000000;
+ return phys | 0x80000000;
#endif
}
diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index eeaf024..409fdc8 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -123,7 +123,7 @@ void *dma_alloc_noncoherent(struct device *dev, size_t size,
if (ret != NULL) {
memset(ret, 0, size);
- *dma_handle = plat_map_dma_mem(dev, ret, size);
+ *dma_handle = plat_map_dma_mem(dev, virt_to_phys(ret), size);
}
return ret;
@@ -153,7 +153,7 @@ static void *mips_dma_alloc_coherent(struct device *dev, size_t size,
ret = page_address(page);
memset(ret, 0, size);
- *dma_handle = plat_map_dma_mem(dev, ret, size);
+ *dma_handle = plat_map_dma_mem(dev, virt_to_phys(ret), size);
if (!plat_device_is_coherent(dev)) {
dma_cache_wback_inv((unsigned long) ret, size);
if (!hw_coherentio)
@@ -269,14 +269,13 @@ static int mips_dma_map_sg(struct device *dev, struct scatterlist *sglist,
struct scatterlist *sg;
for_each_sg(sglist, sg, nents, i) {
- if (!plat_device_is_coherent(dev))
+ if (sg_has_page(sg) && !plat_device_is_coherent(dev))
__dma_sync(sg_page(sg), sg->offset, sg->length,
direction);
#ifdef CONFIG_NEED_SG_DMA_LENGTH
sg->dma_length = sg->length;
#endif
- sg->dma_address = plat_map_dma_mem_page(dev, sg_page(sg)) +
- sg->offset;
+ sg->dma_address = plat_map_dma_mem(dev, sg_phys(sg), PAGE_SIZE);
}
return nents;
@@ -289,7 +288,7 @@ static dma_addr_t mips_dma_map_page(struct device *dev, struct page *page,
if (!plat_device_is_coherent(dev))
__dma_sync(page, offset, size, direction);
- return plat_map_dma_mem_page(dev, page) + offset;
+ return plat_map_dma_mem(dev, page_to_phys(page), PAGE_SIZE) + offset;
}
static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
@@ -300,7 +299,7 @@ static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
struct scatterlist *sg;
for_each_sg(sglist, sg, nhwentries, i) {
- if (!plat_device_is_coherent(dev) &&
+ if (sg_has_page(sg) && !plat_device_is_coherent(dev) &&
direction != DMA_TO_DEVICE)
__dma_sync(sg_page(sg), sg->offset, sg->length,
direction);
@@ -334,8 +333,10 @@ static void mips_dma_sync_sg_for_cpu(struct device *dev,
if (cpu_needs_post_dma_flush(dev)) {
for_each_sg(sglist, sg, nelems, i) {
- __dma_sync(sg_page(sg), sg->offset, sg->length,
- direction);
+ if (sg_has_page(sg)) {
+ __dma_sync(sg_page(sg), sg->offset, sg->length,
+ direction);
+ }
}
}
plat_post_dma_flush(dev);
@@ -350,8 +351,10 @@ static void mips_dma_sync_sg_for_device(struct device *dev,
if (!plat_device_is_coherent(dev)) {
for_each_sg(sglist, sg, nelems, i) {
- __dma_sync(sg_page(sg), sg->offset, sg->length,
- direction);
+ if (sg_has_page(sg)) {
+ __dma_sync(sg_page(sg), sg->offset, sg->length,
+ direction);
+ }
}
}
}
--
1.9.1
Make all cache invalidation conditional on sg_has_page().
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/kernel/dma.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 35e4dcc..cece40b 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -135,7 +135,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = sg_phys(sg) + get_dma_offset(dev);
sg->dma_length = sg->length;
- __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+ if (sg_has_page(sg)) {
+ __dma_sync_page(sg_page(sg), sg->offset, sg->length,
+ direction);
+ }
}
return nents;
@@ -200,7 +203,10 @@ static inline void dma_direct_sync_sg(struct device *dev,
int i;
for_each_sg(sgl, sg, nents, i)
- __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+ if (sg_has_page(sg)) {
+ __dma_sync_page(sg_page(sg), sg->offset, sg->length,
+ direction);
+ }
}
static inline void dma_direct_sync_single(struct device *dev,
--
1.9.1
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly.
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/parisc/kernel/pci-dma.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index b9402c9..6cad0e0 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -483,11 +483,13 @@ static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, int n
BUG_ON(direction == DMA_NONE);
for_each_sg(sglist, sg, nents, i) {
- unsigned long vaddr = (unsigned long)sg_virt(sg);
-
- sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr);
+ sg_dma_address(sg) = sg_phys(sg);
sg_dma_len(sg) = sg->length;
- flush_kernel_dcache_range(vaddr, sg->length);
+
+ if (sg_has_page(sg)) {
+ flush_kernel_dcache_range((unsigned long)sg_virt(sg),
+ sg->length);
+ }
}
return nents;
}
@@ -504,9 +506,10 @@ static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, in
/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
- for_each_sg(sglist, sg, nents, i)
- flush_kernel_vmap_range(sg_virt(sg), sg->length);
- return;
+ for_each_sg(sglist, sg, nents, i) {
+ if (sg_has_page(sg))
+ flush_kernel_vmap_range(sg_virt(sg), sg->length);
+ }
}
static void pa11_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, unsigned long offset, size_t size, enum dma_data_direction direction)
@@ -530,8 +533,10 @@ static void pa11_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl
/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
- for_each_sg(sglist, sg, nents, i)
- flush_kernel_vmap_range(sg_virt(sg), sg->length);
+ for_each_sg(sglist, sg, nents, i) {
+ if (sg_has_page(sg))
+ flush_kernel_vmap_range(sg_virt(sg), sg->length);
+ }
}
static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction)
@@ -541,8 +546,10 @@ static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *
/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
- for_each_sg(sglist, sg, nents, i)
- flush_kernel_vmap_range(sg_virt(sg), sg->length);
+ for_each_sg(sglist, sg, nents, i) {
+ if (sg_has_page(sg))
+ flush_kernel_vmap_range(sg_virt(sg), sg->length);
+ }
}
struct hppa_dma_ops pcxl_dma_ops = {
--
1.9.1
Just remove a BUG_ON, the code handles them just fine as-is.
Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/iommu/intel-iommu.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3541d65..ae10573 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3622,7 +3622,6 @@ static int intel_nontranslate_map_sg(struct device *hddev,
struct scatterlist *sg;
for_each_sg(sglist, sg, nelems, i) {
- BUG_ON(!sg_page(sg));
sg->dma_address = sg_phys(sg);
sg->dma_length = sg->length;
}
--
1.9.1
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/asm-generic/dma-mapping-common.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h
index 940d5ec..afc3eaf 100644
--- a/include/asm-generic/dma-mapping-common.h
+++ b/include/asm-generic/dma-mapping-common.h
@@ -51,8 +51,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int i, ents;
struct scatterlist *s;
- for_each_sg(sg, s, nents, i)
- kmemcheck_mark_initialized(sg_virt(s), s->length);
+ for_each_sg(sg, s, nents, i) {
+ if (sg_has_page(s))
+ kmemcheck_mark_initialized(sg_virt(s), s->length);
+ }
BUG_ON(!valid_dma_direction(dir));
ents = ops->map_sg(dev, sg, nents, dir, attrs);
BUG_ON(ents < 0);
--
1.9.1
Around Wed 12 Aug 2015 09:05:39 +0200 or thereabout, Christoph Hellwig wrote:
> Make all cache invalidation conditional on sg_has_page() and use
> sg_phys to get the physical address directly, bypassing the noop
> page_to_bus.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Hans-Christian Egtvedt <[email protected]>
> ---
> arch/avr32/include/asm/dma-mapping.h | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/avr32/include/asm/dma-mapping.h b/arch/avr32/include/asm/dma-mapping.h
> index ae7ac92..a662ce2 100644
> --- a/arch/avr32/include/asm/dma-mapping.h
> +++ b/arch/avr32/include/asm/dma-mapping.h
> @@ -216,11 +216,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
> struct scatterlist *sg;
>
> for_each_sg(sglist, sg, nents, i) {
> - char *virt;
> -
> - sg->dma_address = page_to_bus(sg_page(sg)) + sg->offset;
> - virt = sg_virt(sg);
> - dma_cache_sync(dev, virt, sg->length, direction);
> + sg->dma_address = sg_phys(sg);
> + if (sg_has_page(sg))
> + dma_cache_sync(dev, sg_virt(sg), sg->length, direction);
> }
>
> return nents;
> @@ -328,8 +326,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
> int i;
> struct scatterlist *sg;
>
> - for_each_sg(sglist, sg, nents, i)
> - dma_cache_sync(dev, sg_virt(sg), sg->length, direction);
> + for_each_sg(sglist, sg, nents, i) {
> + if (sg_has_page(sg))
> + dma_cache_sync(dev, sg_virt(sg), sg->length, direction);
> + }
> }
>
> /* Now for the API extensions over the pci_ one */
--
mvh
Hans-Christian Egtvedt
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
> Just remove a BUG_ON, the code handles them just fine as-is.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: David Woodhouse <[email protected]>
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
On Wednesday 12 August 2015 12:39 PM, Christoph Hellwig wrote:
> Make all cache invalidation conditional on sg_has_page() and use
> sg_phys to get the physical address directly.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
With a minor nit below.
Acked-by: Vineet Gupta <[email protected]>
> ---
> arch/arc/include/asm/dma-mapping.h | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h
> index 2d28ba9..42eb526 100644
> --- a/arch/arc/include/asm/dma-mapping.h
> +++ b/arch/arc/include/asm/dma-mapping.h
> @@ -108,9 +108,13 @@ dma_map_sg(struct device *dev, struct scatterlist *sg,
> struct scatterlist *s;
> int i;
>
> - for_each_sg(sg, s, nents, i)
> - s->dma_address = dma_map_page(dev, sg_page(s), s->offset,
> - s->length, dir);
> + for_each_sg(sg, s, nents, i) {
> + if (sg_has_page(s)) {
> + _dma_cache_sync((unsigned long)sg_virt(s), s->length,
> + dir);
> + }
> + s->dma_address = sg_phys(s);
> + }
>
> return nents;
> }
> @@ -163,8 +167,12 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
> int i;
> struct scatterlist *sg;
>
> - for_each_sg(sglist, sg, nelems, i)
> - _dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir);
> + for_each_sg(sglist, sg, nelems, i) {
> + if (sg_has_page(sg)) {
> + _dma_cache_sync((unsigned int)sg_virt(sg), sg->length,
> + dir);
> + }
> + }
> }
>
> static inline void
> @@ -174,8 +182,12 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
> int i;
> struct scatterlist *sg;
>
> - for_each_sg(sglist, sg, nelems, i)
> - _dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir);
> + for_each_sg(sglist, sg, nelems, i) {
> + if (sg_has_page(sg)) {
> + _dma_cache_sync((unsigned int)sg_virt(sg), sg->length,
> + dir);
For consistency, could u please fix the left alignment of @dir above - another tab
perhaps ?
> + }
> + }
> }
>
> static inline int dma_supported(struct device *dev, u64 dma_mask)
On Wed, 12 Aug 2015, Christoph Hellwig wrote:
> Use sg_phys() instead of page_to_phys(sg_page(sg)) so that we don't
> require a page structure for all DMA memory.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Sebastian Ott <[email protected]>
> ---
> arch/s390/pci/pci_dma.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index 6fd8d58..aae5a47 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -272,14 +272,13 @@ int dma_set_mask(struct device *dev, u64 mask)
> }
> EXPORT_SYMBOL_GPL(dma_set_mask);
>
> -static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
> - unsigned long offset, size_t size,
> +static dma_addr_t s390_dma_map_phys(struct device *dev, unsigned long pa,
> + size_t size,
> enum dma_data_direction direction,
> struct dma_attrs *attrs)
> {
> struct zpci_dev *zdev = get_zdev(to_pci_dev(dev));
> unsigned long nr_pages, iommu_page_index;
> - unsigned long pa = page_to_phys(page) + offset;
> int flags = ZPCI_PTE_VALID;
> dma_addr_t dma_addr;
>
> @@ -301,7 +300,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
>
> if (!dma_update_trans(zdev, pa, dma_addr, size, flags)) {
> atomic64_add(nr_pages, &zdev->mapped_pages);
> - return dma_addr + (offset & ~PAGE_MASK);
> + return dma_addr + (pa & ~PAGE_MASK);
> }
>
> out_free:
> @@ -312,6 +311,16 @@ out_err:
> return DMA_ERROR_CODE;
> }
>
> +static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction direction,
> + struct dma_attrs *attrs)
> +{
> + unsigned long pa = page_to_phys(page) + offset;
> +
> + return s390_dma_map_phys(dev, pa, size, direction, attrs);
> +}
> +
> static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr,
> size_t size, enum dma_data_direction direction,
> struct dma_attrs *attrs)
> @@ -384,8 +393,7 @@ static int s390_dma_map_sg(struct device *dev, struct scatterlist *sg,
> int i;
>
> for_each_sg(sg, s, nr_elements, i) {
> - struct page *page = sg_page(s);
> - s->dma_address = s390_dma_map_pages(dev, page, s->offset,
> + s->dma_address = s390_dma_map_phys(dev, sg_phys(s),
> s->length, dir, NULL);
> if (!dma_mapping_error(dev, s->dma_address)) {
> s->dma_length = s->length;
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On 08/12/2015 10:05 AM, Christoph Hellwig wrote:
> Dan Williams started to look into addressing I/O to and from
> Persistent Memory in his series from June:
>
> http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
>
> I've started looking into DMA mapping of these SGLs specifically instead
> of the map_pfn method in there. In addition to supporting NVDIMM backed
> I/O I also suspect this would be highly useful for media drivers that
> go through nasty hoops to be able to DMA from/to their ioremapped regions,
> with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
> being a prime example for the unsafe hacks currently used.
>
The support I have suggested and submitted for zone-less sections.
(In my add_persistent_memory() patchset)
Would work perfectly well and transparent for all such multimedia cases.
(All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
a few times and it is great easy fun. (I wanted to experiment with cached
memory over a pcie)
> It turns out most DMA mapping implementation can handle SGLs without
> page structures with some fairly simple mechanical work. Most of it
> is just about consistently using sg_phys. For implementations that
> need to flush caches we need a new helper that skips these cache
> flushes if a entry doesn't have a kernel virtual address.
>
> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
> to be operate mostly on virtual addresses. It's a fairly odd concept
> that I don't fully grasp, so I'll need some help with those if we want
> to bring this forward.
>
> Additional this series skips ARM entirely for now. The reason is
> that most arm implementations of the .map_sg operation just iterate
> over all entries and call ->map_page for it, which means we'd need
> to convert those to a ->map_pfn similar to Dan's previous approach.
>
All this endless work for nothing more than uglyfing the Kernel, and
It will never end. When a real and fully working solution is right
here for more then a year.
If you are really up for a deep audit and a mammoth testing effort,
why not do a more worthy, and order of magnitude smaller work and support
2M and 1G variable sized "pages". All the virtual-vs-phisical-vs-caching
just works.
Most of the core work is there. Block layer and lots of other subsytems
already support sending a single page-pointer with bvec_offset bvec_len
bigger then 4K. Other system will be small fixes sprinkled around but
not at all this endless stream of subsystem after another of patches.
And for why.
The novelty of pages is the section object, the section is reached
from page* from virtual as well as physical planes. And is a center
that translate from all plains to all plains. You keep this concept
only make 2M-page sections and 1G-page sections.
It is a bit of work but is worth while, and accelerating tremendously
lots of workloads. Not like this abomination which only branches
things more and more, and making things fatter and slower.
It all feels like a typhoon, the inertia of tones and tons of
men hours work, in a huge wave. How will you ever stop such a
rushing mass. I'm trying to dock under but, surly it makes me sad.
Thanks
Boaz
On Wed, Aug 12, 2015 at 12:05 AM, Christoph Hellwig <[email protected]> wrote:
> Make all cache invalidation conditional on sg_has_page() and use
> sg_phys to get the physical address directly.
So this worries me a bit (I'm just reacting to one random patch in the series).
The reason?
I think this wants a big honking comment somewhere saying "non-sg_page
accesses are not necessarily cache coherent").
Now, I don't think that's _wrong_, but it's an important distinction:
if you look up pages in the page tables directly, there's a very
subtle difference between then saving just the pfn and saving the
"struct page" of the result.
On sane architectures, this whole cache flushing thing doesn't matter.
Which just means that it's going to be even more subtle on the odd
broken ones..
I'm assuming that anybody who wants to use the page-less
scatter-gather lists always does so on memory that isn't actually
virtually mapped at all, or only does so on sane architectures that
are cache coherent at a physical level, but I'd like that assumption
*documented* somewhere.
(And maybe it is, and I just didn't get to that patch yet)
Linus
On Wed, Aug 12, 2015 at 12:05 AM, Christoph Hellwig <[email protected]> wrote:
> + for_each_sg(sg, s, nents, i) {
> + if (sg_has_page(s))
> + kmemcheck_mark_initialized(sg_virt(s), s->length);
> + }
[ Again, I'm responding to one random patch - this pattern was in
other patches too. ]
A question: do we actually expect to mix page-less and pageful SG
entries in the same SG list?
How does that happen?
(I'm not saying it can't, I'm just wondering where people expect this
to happen).
IOW, maybe it would be valid to have a rule saying "a SG list is
either all pageful or pageless, never mixed", and then have the "if"
statement outside the loop rather than inside.
Linus
Christoph,
On 12 August 2015 at 08:05, Christoph Hellwig <[email protected]> wrote:
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> include/asm-generic/dma-mapping-common.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h
> index 940d5ec..afc3eaf 100644
> --- a/include/asm-generic/dma-mapping-common.h
> +++ b/include/asm-generic/dma-mapping-common.h
> @@ -51,8 +51,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> int i, ents;
> struct scatterlist *s;
>
> - for_each_sg(sg, s, nents, i)
> - kmemcheck_mark_initialized(sg_virt(s), s->length);
> + for_each_sg(sg, s, nents, i) {
> + if (sg_has_page(s))
> + kmemcheck_mark_initialized(sg_virt(s), s->length);
> + }
Just a nitpick for the subject, it should say "kmemcheck" rather than
"kmemleak" (different features ;)).
--
Catalin
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
> Dan Williams started to look into addressing I/O to and from
> Persistent Memory in his series from June:
>
> http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
>
> I've started looking into DMA mapping of these SGLs specifically instead
> of the map_pfn method in there. In addition to supporting NVDIMM backed
> I/O I also suspect this would be highly useful for media drivers that
> go through nasty hoops to be able to DMA from/to their ioremapped regions,
> with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
> being a prime example for the unsafe hacks currently used.
>
> It turns out most DMA mapping implementation can handle SGLs without
> page structures with some fairly simple mechanical work. Most of it
> is just about consistently using sg_phys. For implementations that
> need to flush caches we need a new helper that skips these cache
> flushes if a entry doesn't have a kernel virtual address.
>
> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
> to be operate mostly on virtual addresses. It's a fairly odd concept
> that I don't fully grasp, so I'll need some help with those if we want
> to bring this forward.
I can explain that. I think this doesn't apply to ia64 because it's
cache is PIPT, but on parisc, we have a VIPT cache.
On normal physically indexed architectures, when the iommu sees a DMA
transfer to/from physical memory, it also notifies the CPU to flush the
internal CPU caches of those lines. This is usually an interlocking
step of the transfer to make sure the page is coherent before transfer
to/from the device (it's why the ia32 for instance is a coherent
architecture). Because the system is physically indexed, there's no
need to worry about aliases.
On Virtually Indexed systems, like parisc, there is an aliasing problem.
The CCIO iommu unit (and all other iommu systems on parisc) have what's
called a local coherence index (LCI). You program it as part of the
IOMMU page table and it tells the system which Virtual line in the cache
to flush as part of the IO transaction, thus still ensuring cache
coherence. That's why we have to know the virtual as well as physical
addresses for the page. The problem we have in Linux is that we have
two virtual addresses, which are often incoherent aliases: the user
virtual address and a kernel virtual address but we can only make the
page coherent with a single alias (only one LCI). The way I/O on Linux
currently works is that get_user_pages actually flushes the user virtual
address, so that's expected to be coherent, so the address we program
into the VCI is the kernel virtual address. Usually nothing in the
kernel has ever touched the page, so there's nothing to flush, but we do
it just in case.
In theory, for these non kernel page backed SG entries, we can make the
process more efficient by not flushing in gup and instead programming
the user virtual address into the local coherence index. However,
simply zeroing the LCI will also work (except that poor VI zero line
will get flushed repeatedly, so it's probably best to pick a known
untouched line in the kernel).
James
On Wed, Aug 12, 2015 at 10:00 AM, James Bottomley
<[email protected]> wrote:
> On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
...
>> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
>> to be operate mostly on virtual addresses. It's a fairly odd concept
>> that I don't fully grasp, so I'll need some help with those if we want
>> to bring this forward.
James explained the primary function of IOMMUs on parisc (DMA-Cache
coherency) much better than I ever could.
Three more observations:
1) the IOMMU can be bypassed by 64-bit DMA devices on IA64.
2) IOMMU enables 32-bit DMA devices to reach > 32-bit physical memory
and thus avoiding bounce buffers. parisc and older IA-64 have some
32-bit PCI devices - e.g. IDE boot HDD.
3) IOMMU acts as a proxy for IO devices by fetching cachelines of data
for PA-RISC systems whose memory controllers ONLY serve cacheline
sized transactions. ie. 32-bit DMA results in the IOMMU fetching the
cacheline and updating just the 32-bits in a DMA cache coherent
fashion.
Bonus thought:
4) IOMMU can improve DMA performance in some cases using "hints"
provided by the OS (e.g. prefetching DMA data or using READ_CURRENT
bus transactions instead of normal memory fetches.)
cheers,
grant
Hi,
On Wed, Aug 12, 2015 at 10:42 PM, Boaz Harrosh <[email protected]> wrote:
> On 08/12/2015 10:05 AM, Christoph Hellwig wrote:
>> It turns out most DMA mapping implementation can handle SGLs without
>> page structures with some fairly simple mechanical work. Most of it
>> is just about consistently using sg_phys. For implementations that
>> need to flush caches we need a new helper that skips these cache
>> flushes if a entry doesn't have a kernel virtual address.
>>
>> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
>> to be operate mostly on virtual addresses. It's a fairly odd concept
>> that I don't fully grasp, so I'll need some help with those if we want
>> to bring this forward.
>>
>> Additional this series skips ARM entirely for now. The reason is
>> that most arm implementations of the .map_sg operation just iterate
>> over all entries and call ->map_page for it, which means we'd need
>> to convert those to a ->map_pfn similar to Dan's previous approach.
>>
>
[snip]
>
> It is a bit of work but is worth while, and accelerating tremendously
> lots of workloads. Not like this abomination which only branches
> things more and more, and making things fatter and slower.
As a random guy reading a big bunch of patches on code I know almost
nothing about, parts of this comment really resonated with me:
overall, we seem to be adding a lot of if statements to code that
appears to be in a hot path.
I.e. ~90% of this patch set seems to be just mechanically dropping
BUG_ON()s and converting open coded stuff to use accessor functions
(which should be macros or get inlined, right?) - and the remaining
bit is not flushing if we don't have a physical page somewhere.
Would it make sense to split this patch set into a few bits: one to
drop all the useless BUG_ON()s, one to convert all the open coded
stuff to accessor functions, then another to do the actual page-less
sg stuff?
Thanks,
--
Julian Calaby
Email: [email protected]
Profile: http://www.google.com/profiles/julian.calaby/
On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
> I'm assuming that anybody who wants to use the page-less
> scatter-gather lists always does so on memory that isn't actually
> virtually mapped at all, or only does so on sane architectures that
> are cache coherent at a physical level, but I'd like that assumption
> *documented* somewhere.
It's temporarily mapped by kmap-like helpers. That code isn't in
this series. The most recent version of it is here:
https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
note that it's not doing the cache flushing it would have to do yet, but
it's also only enabled for x86 at the moment.
On Wed, Aug 12, 2015 at 09:05:15AM -0700, Linus Torvalds wrote:
> [ Again, I'm responding to one random patch - this pattern was in
> other patches too. ]
>
> A question: do we actually expect to mix page-less and pageful SG
> entries in the same SG list?
>
> How does that happen?
Both for DAX and the video buffer case people could do direct I/O
spanning the boundary between such a VMA and a normal one unless
we add special code to prevent that. Right now I don't think it's
all that useful, but then again it doesn't seem harmful either
and adding those checks might add up.
On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote:
> I.e. ~90% of this patch set seems to be just mechanically dropping
> BUG_ON()s and converting open coded stuff to use accessor functions
> (which should be macros or get inlined, right?) - and the remaining
> bit is not flushing if we don't have a physical page somewhere.
Which is was 90%. By lines changed most actually is the diffs for
the cache flushing.
> Would it make sense to split this patch set into a few bits: one to
> drop all the useless BUG_ON()s, one to convert all the open coded
> stuff to accessor functions, then another to do the actual page-less
> sg stuff?
Without the ifs the BUG_ON() actually are useful to assert we
never feed the sort of physical addresses we can't otherwise support,
so I don't think that part is doable.
A simple series to make more use of sg_phys and add sg_pfn might
still be useful, though.
On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote:
> The support I have suggested and submitted for zone-less sections.
> (In my add_persistent_memory() patchset)
>
> Would work perfectly well and transparent for all such multimedia cases.
> (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
> a few times and it is great easy fun. (I wanted to experiment with cached
> memory over a pcie)
And everyone agree that it was both buggy and incomplete.
Dan has done a respin of the page backed nvdimm work with most of
these comments addressed.
I have to say I hate both pfn-based I/O [1] and page backed nvdimms with
passion, so we're looking into the lesser evil with an open mind.
[1] not the SGL part posted here, which I think is quite sane. The bio
side is much worse, though.
On 08/13/2015 05:40 PM, Christoph Hellwig wrote:
> On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote:
>> The support I have suggested and submitted for zone-less sections.
>> (In my add_persistent_memory() patchset)
>>
>> Would work perfectly well and transparent for all such multimedia cases.
>> (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
>> a few times and it is great easy fun. (I wanted to experiment with cached
>> memory over a pcie)
>
> And everyone agree that it was both buggy and incomplete.
>
What? No one ever said anything about bugs. Is the first ever I hear of it.
I was always in the notion that no one even tried it out.
I'm smoking these page-full nvidimms for more than a year. With RDMA to
pears and swap out to disks. So is not that bad I would say
> Dan has done a respin of the page backed nvdimm work with most of
> these comments addressed.
>
I would love some comments. All I got so far is silence. (And I do not
like Dan's patches comments will come next week)
> I have to say I hate both pfn-based I/O [1] and page backed nvdimms with
> passion, so we're looking into the lesser evil with an open mind.
>
> [1] not the SGL part posted here, which I think is quite sane. The bio
> side is much worse, though.
>
What can I say. I like the page-backed nvdimms. And the long term for me
is 2M pages. I hope we can sit one day soon and you explain to me whats
evil about it. I would really really like to understand
Thanks though
Boaz
Hi Christoph,
On Fri, Aug 14, 2015 at 12:35 AM, Christoph Hellwig <[email protected]> wrote:
> On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote:
>> I.e. ~90% of this patch set seems to be just mechanically dropping
>> BUG_ON()s and converting open coded stuff to use accessor functions
>> (which should be macros or get inlined, right?) - and the remaining
>> bit is not flushing if we don't have a physical page somewhere.
>
> Which is was 90%. By lines changed most actually is the diffs for
> the cache flushing.
I was talking in terms of changes made, not lines changed: by my
recollection, about a third of the patches didn't touch flush calls
and most of the lines changed looked like refactoring so that making
the flush call conditional would be easier.
I guess it smelled like you were doing lots of distinct changes in a
single patch and I got my numbers wrong.
>> Would it make sense to split this patch set into a few bits: one to
>> drop all the useless BUG_ON()s, one to convert all the open coded
>> stuff to accessor functions, then another to do the actual page-less
>> sg stuff?
>
> Without the ifs the BUG_ON() actually are useful to assert we
> never feed the sort of physical addresses we can't otherwise support,
> so I don't think that part is doable.
My point is that there's a couple of patches that only remove
BUG_ON()s, which implies that for that particular driver it doesn't
matter if there's a physical page or not, so therefore that code is
purely "documentation".
Thanks,
--
Julian Calaby
Email: [email protected]
Profile: http://www.google.com/profiles/julian.calaby/
On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <[email protected]> wrote:
> On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
>> I'm assuming that anybody who wants to use the page-less
>> scatter-gather lists always does so on memory that isn't actually
>> virtually mapped at all, or only does so on sane architectures that
>> are cache coherent at a physical level, but I'd like that assumption
>> *documented* somewhere.
>
> It's temporarily mapped by kmap-like helpers. That code isn't in
> this series. The most recent version of it is here:
>
> https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
>
> note that it's not doing the cache flushing it would have to do yet, but
> it's also only enabled for x86 at the moment.
For virtually tagged caches I assume we would temporarily map with
kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
powerpc support. However with DAX we could end up with multiple
virtual aliases for a page-less pfn.
On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote:
> On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <[email protected]> wrote:
> > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
> >> I'm assuming that anybody who wants to use the page-less
> >> scatter-gather lists always does so on memory that isn't actually
> >> virtually mapped at all, or only does so on sane architectures that
> >> are cache coherent at a physical level, but I'd like that assumption
> >> *documented* somewhere.
> >
> > It's temporarily mapped by kmap-like helpers. That code isn't in
> > this series. The most recent version of it is here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
> >
> > note that it's not doing the cache flushing it would have to do yet, but
> > it's also only enabled for x86 at the moment.
>
> For virtually tagged caches I assume we would temporarily map with
> kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
> powerpc support. However with DAX we could end up with multiple
> virtual aliases for a page-less pfn.
At least on some PA architectures, you have to be very careful.
Improperly managed, multiple aliases will cause the system to crash
(actually a machine check in the cache chequerboard). For the most
temperamental systems, we need the cache line flushed and the alias
mapping ejected from the TLB cache before we access the same page at an
inequivalent alias.
James
From: James Bottomley <[email protected]>
Date: Thu, 13 Aug 2015 20:59:20 -0700
> On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote:
>> On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <[email protected]> wrote:
>> > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
>> >> I'm assuming that anybody who wants to use the page-less
>> >> scatter-gather lists always does so on memory that isn't actually
>> >> virtually mapped at all, or only does so on sane architectures that
>> >> are cache coherent at a physical level, but I'd like that assumption
>> >> *documented* somewhere.
>> >
>> > It's temporarily mapped by kmap-like helpers. That code isn't in
>> > this series. The most recent version of it is here:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
>> >
>> > note that it's not doing the cache flushing it would have to do yet, but
>> > it's also only enabled for x86 at the moment.
>>
>> For virtually tagged caches I assume we would temporarily map with
>> kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
>> powerpc support. However with DAX we could end up with multiple
>> virtual aliases for a page-less pfn.
>
> At least on some PA architectures, you have to be very careful.
> Improperly managed, multiple aliases will cause the system to crash
> (actually a machine check in the cache chequerboard). For the most
> temperamental systems, we need the cache line flushed and the alias
> mapping ejected from the TLB cache before we access the same page at an
> inequivalent alias.
Also, I want to mention that on sparc64 we manage the cache aliasing
state in the page struct.
Until a page is mapped into userspace, we just record the most recent
cpu to store into that page with kernel side mappings. Once the page
ends up being mapped or the cpu doing kernel side stores changes, we
actually perform the cache flush.
Generally speaking, I think that all actual physical memory the kernel
operates on should have a struct page backing it. So this whole
discussion of operating on physical memory in scatter lists without
backing page structs feels really foreign to me.
On Thu, Aug 13, 2015 at 9:11 PM, David Miller <[email protected]> wrote:
> From: James Bottomley <[email protected]>
>> At least on some PA architectures, you have to be very careful.
>> Improperly managed, multiple aliases will cause the system to crash
>> (actually a machine check in the cache chequerboard). For the most
>> temperamental systems, we need the cache line flushed and the alias
>> mapping ejected from the TLB cache before we access the same page at an
>> inequivalent alias.
>
> Also, I want to mention that on sparc64 we manage the cache aliasing
> state in the page struct.
>
> Until a page is mapped into userspace, we just record the most recent
> cpu to store into that page with kernel side mappings. Once the page
> ends up being mapped or the cpu doing kernel side stores changes, we
> actually perform the cache flush.
>
> Generally speaking, I think that all actual physical memory the kernel
> operates on should have a struct page backing it. So this whole
> discussion of operating on physical memory in scatter lists without
> backing page structs feels really foreign to me.
So the only way for page-less pfns to enter the system is through the
->direct_access() method provided by a pmem device's struct
block_device_operations. Architectures that require struct page for
cache management to must disable ->direct_access() in this case.
If an arch still wants to support pmem+DAX then it needs something
like this patchset (feedback welcome) to map pmem pfns:
https://lkml.org/lkml/2015/8/12/970
Effectively this would disable ->direct_access() on /dev/pmem0, but
permit ->direct_access() on /dev/pmem0m.