The first 19 patches in the set add support for the DMA attribute
DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures. This is needed
so that we can flag the calls to dma_map/unmap_page so that we do not
invalidate cache lines that do not currently belong to the device. Instead
we have to take care of this in the driver via a call to
sync_single_range_for_cpu prior to freeing the Rx page.
Patch 20 adds support for dma_map_page_attrs and dma_unmap_page_attrs so
that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC
attribute.
Patch 21 adds support for freeing a page that has multiple references being
held by a single caller. This way we can free page fragments that were
allocated by a given driver.
The last 2 patches use these updates in the igb driver, and lay the
groundwork to allow for us to reimplement the use of build_skb.
v1: Minor fixes based on issues found by kernel build bot
Few minor changes for issues found on code review
Added Acked-by for patches that were acked and not changed
v2: Added a few more Acked-by
Submitting patches to mm instead of net-next
v3: Added Acked-by for PowerPC architecture
Dropped first 3 patches which were accepted into swiotlb tree
Dropped comments describing swiotlb changes.
---
Alexander Duyck (23):
arch/arc: Add option to skip sync on DMA mapping
arch/arm: Add option to skip sync on DMA map and unmap
arch/avr32: Add option to skip sync on DMA map
arch/blackfin: Add option to skip sync on DMA map
arch/c6x: Add option to skip sync on DMA map and unmap
arch/frv: Add option to skip sync on DMA map
arch/hexagon: Add option to skip DMA sync as a part of mapping
arch/m68k: Add option to skip DMA sync as a part of mapping
arch/metag: Add option to skip DMA sync as a part of map and unmap
arch/microblaze: Add option to skip DMA sync as a part of map and unmap
arch/mips: Add option to skip DMA sync as a part of map and unmap
arch/nios2: Add option to skip DMA sync as a part of map and unmap
arch/openrisc: Add option to skip DMA sync as a part of mapping
arch/parisc: Add option to skip DMA sync as a part of map and unmap
arch/powerpc: Add option to skip DMA sync as a part of mapping
arch/sh: Add option to skip DMA sync as a part of mapping
arch/sparc: Add option to skip DMA sync as a part of map and unmap
arch/tile: Add option to skip DMA sync as a part of map and unmap
arch/xtensa: Add option to skip DMA sync as a part of mapping
dma: Add calls for dma_map_page_attrs and dma_unmap_page_attrs
mm: Add support for releasing multiple instances of a page
igb: Update driver to make use of DMA_ATTR_SKIP_CPU_SYNC
igb: Update code to better handle incrementing page count
arch/arc/mm/dma.c | 5 ++
arch/arm/common/dmabounce.c | 16 ++++--
arch/avr32/mm/dma-coherent.c | 7 ++-
arch/blackfin/kernel/dma-mapping.c | 8 +++
arch/c6x/kernel/dma.c | 14 ++++-
arch/frv/mb93090-mb00/pci-dma-nommu.c | 14 ++++-
arch/frv/mb93090-mb00/pci-dma.c | 9 +++
arch/hexagon/kernel/dma.c | 6 ++
arch/m68k/kernel/dma.c | 8 +++
arch/metag/kernel/dma.c | 16 +++++-
arch/microblaze/kernel/dma.c | 10 +++-
arch/mips/loongson64/common/dma-swiotlb.c | 2 -
arch/mips/mm/dma-default.c | 8 ++-
arch/nios2/mm/dma-mapping.c | 26 +++++++---
arch/openrisc/kernel/dma.c | 3 +
arch/parisc/kernel/pci-dma.c | 20 ++++++--
arch/powerpc/kernel/dma.c | 9 +++
arch/sh/kernel/dma-nommu.c | 7 ++-
arch/sparc/kernel/iommu.c | 4 +-
arch/sparc/kernel/ioport.c | 4 +-
arch/tile/kernel/pci-dma.c | 12 ++++-
arch/xtensa/kernel/pci-dma.c | 7 ++-
drivers/net/ethernet/intel/igb/igb.h | 7 ++-
drivers/net/ethernet/intel/igb/igb_main.c | 77 +++++++++++++++++++----------
include/linux/dma-mapping.h | 20 +++++---
include/linux/gfp.h | 2 +
mm/page_alloc.c | 14 +++++
27 files changed, 246 insertions(+), 89 deletions(-)
--
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.
Acked-by: Vineet Gupta <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/arc/mm/dma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 20afc65..6303c34 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -133,7 +133,10 @@ static dma_addr_t arc_dma_map_page(struct device *dev, struct page *page,
unsigned long attrs)
{
phys_addr_t paddr = page_to_phys(page) + offset;
- _dma_cache_sync(paddr, size, dir);
+
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ _dma_cache_sync(paddr, size, dir);
+
return plat_phys_to_dma(dev, paddr);
}
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA
APIs in the arch/arm folder. This change is meant to correct that so that
we get consistent behavior.
Cc: Russell King <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/arm/common/dmabounce.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 3012816..75055df 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -243,7 +243,8 @@ static int needs_bounce(struct device *dev, dma_addr_t dma_addr, size_t size)
}
static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
- enum dma_data_direction dir)
+ enum dma_data_direction dir,
+ unsigned long attrs)
{
struct dmabounce_device_info *device_info = dev->archdata.dmabounce;
struct safe_buffer *buf;
@@ -262,7 +263,8 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
__func__, buf->ptr, virt_to_dma(dev, buf->ptr),
buf->safe, buf->safe_dma_addr);
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) {
+ if ((dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
dev_dbg(dev, "%s: copy unsafe %p to safe %p, size %d\n",
__func__, ptr, buf->safe, size);
memcpy(buf->safe, ptr, size);
@@ -272,7 +274,8 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
}
static inline void unmap_single(struct device *dev, struct safe_buffer *buf,
- size_t size, enum dma_data_direction dir)
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
{
BUG_ON(buf->size != size);
BUG_ON(buf->direction != dir);
@@ -283,7 +286,8 @@ static inline void unmap_single(struct device *dev, struct safe_buffer *buf,
DO_STATS(dev->archdata.dmabounce->bounce_count++);
- if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) {
+ if ((dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
void *ptr = buf->ptr;
dev_dbg(dev, "%s: copy back safe %p to unsafe %p size %d\n",
@@ -334,7 +338,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page,
return DMA_ERROR_CODE;
}
- return map_single(dev, page_address(page) + offset, size, dir);
+ return map_single(dev, page_address(page) + offset, size, dir, attrs);
}
/*
@@ -357,7 +361,7 @@ static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t
return;
}
- unmap_single(dev, buf, size, dir);
+ unmap_single(dev, buf, size, dir, attrs);
}
static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA
APIs in the arch/arm folder. This change is meant to correct that so that
we get consistent behavior.
Acked-by: Hans-Christian Noren Egtvedt <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/avr32/mm/dma-coherent.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/avr32/mm/dma-coherent.c b/arch/avr32/mm/dma-coherent.c
index 58610d0..54534e5 100644
--- a/arch/avr32/mm/dma-coherent.c
+++ b/arch/avr32/mm/dma-coherent.c
@@ -146,7 +146,8 @@ static dma_addr_t avr32_dma_map_page(struct device *dev, struct page *page,
{
void *cpu_addr = page_address(page) + offset;
- dma_cache_sync(dev, cpu_addr, size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_cache_sync(dev, cpu_addr, size, direction);
return virt_to_bus(cpu_addr);
}
@@ -162,6 +163,10 @@ static int avr32_dma_map_sg(struct device *dev, struct scatterlist *sglist,
sg->dma_address = page_to_bus(sg_page(sg)) + sg->offset;
virt = sg_virt(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
dma_cache_sync(dev, virt, sg->length, direction);
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.
Cc: Richard Kuo <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/hexagon/kernel/dma.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index b901778..dbc4f10 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -119,6 +119,9 @@ static int hexagon_map_sg(struct device *hwdev, struct scatterlist *sg,
s->dma_length = s->length;
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
flush_dcache_range(dma_addr_to_virt(s->dma_address),
dma_addr_to_virt(s->dma_address + s->length));
}
@@ -180,7 +183,8 @@ static dma_addr_t hexagon_map_page(struct device *dev, struct page *page,
if (!check_addr("map_single", dev, bus, size))
return bad_dma_address;
- dma_sync(dma_addr_to_virt(bus), size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_sync(dma_addr_to_virt(bus), size, dir);
return bus;
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.
Cc: Geert Uytterhoeven <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/m68k/kernel/dma.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c
index 8cf97cb..0707006 100644
--- a/arch/m68k/kernel/dma.c
+++ b/arch/m68k/kernel/dma.c
@@ -134,7 +134,9 @@ static dma_addr_t m68k_dma_map_page(struct device *dev, struct page *page,
{
dma_addr_t handle = page_to_phys(page) + offset;
- dma_sync_single_for_device(dev, handle, size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_sync_single_for_device(dev, handle, size, dir);
+
return handle;
}
@@ -146,6 +148,10 @@ static int m68k_dma_map_sg(struct device *dev, struct scatterlist *sglist,
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
dma_sync_single_for_device(dev, sg->dma_address, sg->length,
dir);
}
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA
APIs in the arch/arm folder. This change is meant to correct that so that
we get consistent behavior.
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/frv/mb93090-mb00/pci-dma-nommu.c | 14 ++++++++++----
arch/frv/mb93090-mb00/pci-dma.c | 9 +++++++--
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/arch/frv/mb93090-mb00/pci-dma-nommu.c b/arch/frv/mb93090-mb00/pci-dma-nommu.c
index 90f2e4c..1876881 100644
--- a/arch/frv/mb93090-mb00/pci-dma-nommu.c
+++ b/arch/frv/mb93090-mb00/pci-dma-nommu.c
@@ -109,16 +109,19 @@ static int frv_dma_map_sg(struct device *dev, struct scatterlist *sglist,
int nents, enum dma_data_direction direction,
unsigned long attrs)
{
- int i;
struct scatterlist *sg;
+ int i;
+
+ BUG_ON(direction == DMA_NONE);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return nents;
for_each_sg(sglist, sg, nents, i) {
frv_cache_wback_inv(sg_dma_address(sg),
sg_dma_address(sg) + sg_dma_len(sg));
}
- BUG_ON(direction == DMA_NONE);
-
return nents;
}
@@ -127,7 +130,10 @@ static dma_addr_t frv_dma_map_page(struct device *dev, struct page *page,
enum dma_data_direction direction, unsigned long attrs)
{
BUG_ON(direction == DMA_NONE);
- flush_dcache_page(page);
+
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ flush_dcache_page(page);
+
return (dma_addr_t) page_to_phys(page) + offset;
}
diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c
index f585745..dba7df9 100644
--- a/arch/frv/mb93090-mb00/pci-dma.c
+++ b/arch/frv/mb93090-mb00/pci-dma.c
@@ -40,13 +40,16 @@ static int frv_dma_map_sg(struct device *dev, struct scatterlist *sglist,
int nents, enum dma_data_direction direction,
unsigned long attrs)
{
+ struct scatterlist *sg;
unsigned long dampr2;
void *vaddr;
int i;
- struct scatterlist *sg;
BUG_ON(direction == DMA_NONE);
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return nents;
+
dampr2 = __get_DAMPR(2);
for_each_sg(sglist, sg, nents, i) {
@@ -70,7 +73,9 @@ static dma_addr_t frv_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction direction, unsigned long attrs)
{
- flush_dcache_page(page);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ flush_dcache_page(page);
+
return (dma_addr_t) page_to_phys(page) + offset;
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Ralf Baechle <[email protected]>
Cc: Keguang Zhang <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/mips/loongson64/common/dma-swiotlb.c | 2 +-
arch/mips/mm/dma-default.c | 8 +++++---
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/mips/loongson64/common/dma-swiotlb.c b/arch/mips/loongson64/common/dma-swiotlb.c
index 1a80b6f..aab4fd6 100644
--- a/arch/mips/loongson64/common/dma-swiotlb.c
+++ b/arch/mips/loongson64/common/dma-swiotlb.c
@@ -61,7 +61,7 @@ static int loongson_dma_map_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir,
unsigned long attrs)
{
- int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, 0);
+ int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, attrs);
mb();
return r;
diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index b2eadd6..dd998d7 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -293,7 +293,7 @@ static inline void __dma_sync(struct page *page,
static void mips_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
size_t size, enum dma_data_direction direction, unsigned long attrs)
{
- if (cpu_needs_post_dma_flush(dev))
+ if (cpu_needs_post_dma_flush(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
__dma_sync(dma_addr_to_page(dev, dma_addr),
dma_addr & ~PAGE_MASK, size, direction);
plat_post_dma_flush(dev);
@@ -307,7 +307,8 @@ static int mips_dma_map_sg(struct device *dev, struct scatterlist *sglist,
struct scatterlist *sg;
for_each_sg(sglist, sg, nents, i) {
- if (!plat_device_is_coherent(dev))
+ if (!plat_device_is_coherent(dev) &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
__dma_sync(sg_page(sg), sg->offset, sg->length,
direction);
#ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -324,7 +325,7 @@ static dma_addr_t mips_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, enum dma_data_direction direction,
unsigned long attrs)
{
- if (!plat_device_is_coherent(dev))
+ if (!plat_device_is_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
__dma_sync(page, offset, size, direction);
return plat_map_dma_mem_page(dev, page) + offset;
@@ -339,6 +340,7 @@ static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
for_each_sg(sglist, sg, nhwentries, i) {
if (!plat_device_is_coherent(dev) &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
direction != DMA_TO_DEVICE)
__dma_sync(sg_page(sg), sg->offset, sg->length,
direction);
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Ley Foon Tan <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/nios2/mm/dma-mapping.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c
index d800fad..f6a5dcf 100644
--- a/arch/nios2/mm/dma-mapping.c
+++ b/arch/nios2/mm/dma-mapping.c
@@ -98,13 +98,17 @@ static int nios2_dma_map_sg(struct device *dev, struct scatterlist *sg,
int i;
for_each_sg(sg, sg, nents, i) {
- void *addr;
+ void *addr = sg_virt(sg);
- addr = sg_virt(sg);
- if (addr) {
- __dma_sync_for_device(addr, sg->length, direction);
- sg->dma_address = sg_phys(sg);
- }
+ if (!addr)
+ continue;
+
+ sg->dma_address = sg_phys(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
+ __dma_sync_for_device(addr, sg->length, direction);
}
return nents;
@@ -117,7 +121,9 @@ static dma_addr_t nios2_dma_map_page(struct device *dev, struct page *page,
{
void *addr = page_address(page) + offset;
- __dma_sync_for_device(addr, size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ __dma_sync_for_device(addr, size, direction);
+
return page_to_phys(page) + offset;
}
@@ -125,7 +131,8 @@ static void nios2_dma_unmap_page(struct device *dev, dma_addr_t dma_address,
size_t size, enum dma_data_direction direction,
unsigned long attrs)
{
- __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction);
}
static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
@@ -138,6 +145,9 @@ static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
if (direction == DMA_TO_DEVICE)
return;
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return;
+
for_each_sg(sg, sg, nhwentries, i) {
addr = sg_virt(sg);
if (addr)
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Michal Simek <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/microblaze/kernel/dma.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index ec04dc1..818daf2 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -61,6 +61,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
/* FIXME this part of code is untested */
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = sg_phys(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
sg->length, direction);
}
@@ -80,7 +84,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
enum dma_data_direction direction,
unsigned long attrs)
{
- __dma_sync(page_to_phys(page) + offset, size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ __dma_sync(page_to_phys(page) + offset, size, direction);
return page_to_phys(page) + offset;
}
@@ -95,7 +100,8 @@ static inline void dma_direct_unmap_page(struct device *dev,
* phys_to_virt is here because in __dma_sync_page is __virt_to_phys and
* dma_address is physical address
*/
- __dma_sync(dma_address, size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ __dma_sync(dma_address, size, direction);
}
static inline void
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: James Hogan <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/metag/kernel/dma.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/metag/kernel/dma.c b/arch/metag/kernel/dma.c
index 0db31e2..91968d9 100644
--- a/arch/metag/kernel/dma.c
+++ b/arch/metag/kernel/dma.c
@@ -484,8 +484,9 @@ static dma_addr_t metag_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction direction, unsigned long attrs)
{
- dma_sync_for_device((void *)(page_to_phys(page) + offset), size,
- direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_sync_for_device((void *)(page_to_phys(page) + offset),
+ size, direction);
return page_to_phys(page) + offset;
}
@@ -493,7 +494,8 @@ static void metag_dma_unmap_page(struct device *dev, dma_addr_t dma_address,
size_t size, enum dma_data_direction direction,
unsigned long attrs)
{
- dma_sync_for_cpu(phys_to_virt(dma_address), size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_sync_for_cpu(phys_to_virt(dma_address), size, direction);
}
static int metag_dma_map_sg(struct device *dev, struct scatterlist *sglist,
@@ -507,6 +509,10 @@ static int metag_dma_map_sg(struct device *dev, struct scatterlist *sglist,
BUG_ON(!sg_page(sg));
sg->dma_address = sg_phys(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
dma_sync_for_device(sg_virt(sg), sg->length, direction);
}
@@ -525,6 +531,10 @@ static void metag_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
BUG_ON(!sg_page(sg));
sg->dma_address = sg_phys(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
}
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/sparc/kernel/iommu.c | 4 ++--
arch/sparc/kernel/ioport.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..8fda4e4 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -415,7 +415,7 @@ static void dma_4u_unmap_page(struct device *dev, dma_addr_t bus_addr,
ctx = (iopte_val(*base) & IOPTE_CONTEXT) >> 47UL;
/* Step 1: Kick data out of streaming buffers if necessary. */
- if (strbuf->strbuf_enabled)
+ if (strbuf->strbuf_enabled && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
strbuf_flush(strbuf, iommu, bus_addr, ctx,
npages, direction);
@@ -640,7 +640,7 @@ static void dma_4u_unmap_sg(struct device *dev, struct scatterlist *sglist,
base = iommu->page_table + entry;
dma_handle &= IO_PAGE_MASK;
- if (strbuf->strbuf_enabled)
+ if (strbuf->strbuf_enabled && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
strbuf_flush(strbuf, iommu, dma_handle, ctx,
npages, direction);
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index 2344103..6ffaec4 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -527,7 +527,7 @@ static dma_addr_t pci32_map_page(struct device *dev, struct page *page,
static void pci32_unmap_page(struct device *dev, dma_addr_t ba, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
- if (dir != PCI_DMA_TODEVICE)
+ if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_make_coherent(ba, PAGE_ALIGN(size));
}
@@ -572,7 +572,7 @@ static void pci32_unmap_sg(struct device *dev, struct scatterlist *sgl,
struct scatterlist *sg;
int n;
- if (dir != PCI_DMA_TODEVICE) {
+ if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
for_each_sg(sgl, sg, nents, n) {
dma_make_coherent(sg_phys(sg), PAGE_ALIGN(sg->length));
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Max Filippov <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/xtensa/kernel/pci-dma.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 1e68806..6a16dec 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -189,7 +189,9 @@ static dma_addr_t xtensa_map_page(struct device *dev, struct page *page,
{
dma_addr_t dma_handle = page_to_phys(page) + offset;
- xtensa_sync_single_for_device(dev, dma_handle, size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ xtensa_sync_single_for_device(dev, dma_handle, size, dir);
+
return dma_handle;
}
@@ -197,7 +199,8 @@ static void xtensa_unmap_page(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- xtensa_sync_single_for_cpu(dev, dma_handle, size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ xtensa_sync_single_for_cpu(dev, dma_handle, size, dir);
}
static int xtensa_map_sg(struct device *dev, struct scatterlist *sg,
Add support for mapping and unmapping a page with attributes. The primary
use for this is currently to allow for us to pass the
DMA_ATTR_SKIP_CPU_SYNC attribute when mapping and unmapping a page. On
some architectures such as ARM the synchronization has significant overhead
and if we are already taking care of the sync_for_cpu and sync_for_device
from the driver there isn't much need to handle this in the map/unmap calls
as well.
Signed-off-by: Alexander Duyck <[email protected]>
---
include/linux/dma-mapping.h | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 15400b4..2520a1d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -237,29 +237,33 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
ops->unmap_sg(dev, sg, nents, dir, attrs);
}
-static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- enum dma_data_direction dir)
+static inline dma_addr_t dma_map_page_attrs(struct device *dev,
+ struct page *page,
+ size_t offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
{
struct dma_map_ops *ops = get_dma_ops(dev);
dma_addr_t addr;
kmemcheck_mark_initialized(page_address(page) + offset, size);
BUG_ON(!valid_dma_direction(dir));
- addr = ops->map_page(dev, page, offset, size, dir, 0);
+ addr = ops->map_page(dev, page, offset, size, dir, attrs);
debug_dma_map_page(dev, page, offset, size, dir, addr, false);
return addr;
}
-static inline void dma_unmap_page(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir)
+static inline void dma_unmap_page_attrs(struct device *dev,
+ dma_addr_t addr, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
{
struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
if (ops->unmap_page)
- ops->unmap_page(dev, addr, size, dir, 0);
+ ops->unmap_page(dev, addr, size, dir, attrs);
debug_dma_unmap_page(dev, addr, size, dir, false);
}
@@ -344,6 +348,8 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
#define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
#define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
#define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
+#define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
+#define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size);
This patch updates the driver code so that we do bulk updates of the page
reference count instead of just incrementing it by one reference at a time.
The advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet. In
addition if we eventually move this over to using build_skb the gains will
be more noticeable.
Acked-by: Jeff Kirsher <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
drivers/net/ethernet/intel/igb/igb.h | 7 ++++++-
drivers/net/ethernet/intel/igb/igb_main.c | 24 +++++++++++++++++-------
2 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 5387b3a..786de01 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -210,7 +210,12 @@ struct igb_tx_buffer {
struct igb_rx_buffer {
dma_addr_t dma;
struct page *page;
- unsigned int page_offset;
+#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
+ __u32 page_offset;
+#else
+ __u16 page_offset;
+#endif
+ __u16 pagecnt_bias;
};
struct igb_tx_queue_stats {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ba97392..f5a9fd6 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3937,7 +3937,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
PAGE_SIZE,
DMA_FROM_DEVICE,
DMA_ATTR_SKIP_CPU_SYNC);
- __free_page(buffer_info->page);
+ __page_frag_drain(buffer_info->page, 0,
+ buffer_info->pagecnt_bias);
buffer_info->page = NULL;
}
@@ -6813,13 +6814,15 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
struct page *page,
unsigned int truesize)
{
+ unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+
/* avoid re-using remote pages */
if (unlikely(igb_page_is_reserved(page)))
return false;
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != 1))
+ if (unlikely(page_ref_count(page) != pagecnt_bias))
return false;
/* flip page offset to other buffer */
@@ -6832,10 +6835,14 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
return false;
#endif
- /* Even if we own the page, we are not allowed to use atomic_set()
- * This would break get_page_unless_zero() users.
+ /* If we have drained the page fragment pool we need to update
+ * the pagecnt_bias and page count so that we fully restock the
+ * number of references the driver holds.
*/
- page_ref_inc(page);
+ if (unlikely(pagecnt_bias == 1)) {
+ page_ref_add(page, USHRT_MAX);
+ rx_buffer->pagecnt_bias = USHRT_MAX;
+ }
return true;
}
@@ -6887,7 +6894,6 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
return true;
/* this page cannot be reused so discard it */
- __free_page(page);
return false;
}
@@ -6958,10 +6964,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
/* hand second half of page back to the ring */
igb_reuse_rx_page(rx_ring, rx_buffer);
} else {
- /* we are not reusing the buffer so unmap it */
+ /* We are not reusing the buffer so unmap it and free
+ * any references we are holding to it
+ */
dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
PAGE_SIZE, DMA_FROM_DEVICE,
DMA_ATTR_SKIP_CPU_SYNC);
+ __page_frag_drain(page, 0, rx_buffer->pagecnt_bias);
}
/* clear contents of rx_buffer */
@@ -7235,6 +7244,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
bi->dma = dma;
bi->page = page;
bi->page_offset = 0;
+ bi->pagecnt_bias = 1;
return true;
}
The ARM architecture provides a mechanism for deferring cache line
invalidation in the case of map/unmap. This patch makes use of this
mechanism to avoid unnecessary synchronization.
A secondary effect of this change is that the portion of the page that has
been synchronized for use by the CPU should be writable and could be passed
up the stack (at least on ARM).
The last bit that occurred to me is that on architectures where the
sync_for_cpu call invalidates cache lines we were prefetching and then
invalidating the first 128 bytes of the packet. To avoid that I have moved
the sync up to before we perform the prefetch and allocate the skbuff so
that we can actually make use of it.
Acked-by: Jeff Kirsher <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
drivers/net/ethernet/intel/igb/igb_main.c | 53 ++++++++++++++++++-----------
1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 942a89f..ba97392 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3922,10 +3922,21 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
if (!buffer_info->page)
continue;
- dma_unmap_page(rx_ring->dev,
- buffer_info->dma,
- PAGE_SIZE,
- DMA_FROM_DEVICE);
+ /* Invalidate cache lines that may have been written to by
+ * device so that we avoid corrupting memory.
+ */
+ dma_sync_single_range_for_cpu(rx_ring->dev,
+ buffer_info->dma,
+ buffer_info->page_offset,
+ IGB_RX_BUFSZ,
+ DMA_FROM_DEVICE);
+
+ /* free resources associated with mapping */
+ dma_unmap_page_attrs(rx_ring->dev,
+ buffer_info->dma,
+ PAGE_SIZE,
+ DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
__free_page(buffer_info->page);
buffer_info->page = NULL;
@@ -6791,12 +6802,6 @@ static void igb_reuse_rx_page(struct igb_ring *rx_ring,
/* transfer page from old buffer to new buffer */
*new_buff = *old_buff;
-
- /* sync the buffer for use by the device */
- dma_sync_single_range_for_device(rx_ring->dev, old_buff->dma,
- old_buff->page_offset,
- IGB_RX_BUFSZ,
- DMA_FROM_DEVICE);
}
static inline bool igb_page_is_reserved(struct page *page)
@@ -6917,6 +6922,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
page = rx_buffer->page;
prefetchw(page);
+ /* we are reusing so sync this buffer for CPU use */
+ dma_sync_single_range_for_cpu(rx_ring->dev,
+ rx_buffer->dma,
+ rx_buffer->page_offset,
+ size,
+ DMA_FROM_DEVICE);
+
if (likely(!skb)) {
void *page_addr = page_address(page) +
rx_buffer->page_offset;
@@ -6941,21 +6953,15 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
prefetchw(skb->data);
}
- /* we are reusing so sync this buffer for CPU use */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_buffer->dma,
- rx_buffer->page_offset,
- size,
- DMA_FROM_DEVICE);
-
/* pull page into skb */
if (igb_add_rx_frag(rx_ring, rx_buffer, size, rx_desc, skb)) {
/* hand second half of page back to the ring */
igb_reuse_rx_page(rx_ring, rx_buffer);
} else {
/* we are not reusing the buffer so unmap it */
- dma_unmap_page(rx_ring->dev, rx_buffer->dma,
- PAGE_SIZE, DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
+ PAGE_SIZE, DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
}
/* clear contents of rx_buffer */
@@ -7213,7 +7219,8 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
}
/* map page for use */
- dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+ dma = dma_map_page_attrs(rx_ring->dev, page, 0, PAGE_SIZE,
+ DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
/* if mapping failed free memory back to system since
* there isn't much point in holding memory we can't use
@@ -7254,6 +7261,12 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
if (!igb_alloc_mapped_page(rx_ring, bi))
break;
+ /* sync the buffer for use by the device */
+ dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+ bi->page_offset,
+ IGB_RX_BUFSZ,
+ DMA_FROM_DEVICE);
+
/* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
*/
This patch adds a function that allows us to batch free a page that has
multiple references outstanding. Specifically this function can be used to
drop a page being used in the page frag alloc cache. With this drivers can
make use of functionality similar to the page frag alloc cache without
having to do any workarounds for the fact that there is no function that
frees multiple references.
Signed-off-by: Alexander Duyck <[email protected]>
---
include/linux/gfp.h | 2 ++
mm/page_alloc.c | 14 ++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f8041f9de..4175dca 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -506,6 +506,8 @@ extern void free_hot_cold_page(struct page *page, bool cold);
extern void free_hot_cold_page_list(struct list_head *list, bool cold);
struct page_frag_cache;
+extern void __page_frag_drain(struct page *page, unsigned int order,
+ unsigned int count);
extern void *__alloc_page_frag(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask);
extern void __free_page_frag(void *addr);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fbfead..54fea40 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3912,6 +3912,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc,
return page;
}
+void __page_frag_drain(struct page *page, unsigned int order,
+ unsigned int count)
+{
+ VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
+
+ if (page_ref_sub_and_test(page, count)) {
+ if (order == 0)
+ free_hot_cold_page(page, false);
+ else
+ __free_pages_ok(page, order);
+ }
+}
+EXPORT_SYMBOL(__page_frag_drain);
+
void *__alloc_page_frag(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask)
{
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Chris Metcalf <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/tile/kernel/pci-dma.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
index 09bb774..24e0f8c 100644
--- a/arch/tile/kernel/pci-dma.c
+++ b/arch/tile/kernel/pci-dma.c
@@ -213,10 +213,12 @@ static int tile_dma_map_sg(struct device *dev, struct scatterlist *sglist,
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
- __dma_prep_pa_range(sg->dma_address, sg->length, direction);
#ifdef CONFIG_NEED_SG_DMA_LENGTH
sg->dma_length = sg->length;
#endif
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+ __dma_prep_pa_range(sg->dma_address, sg->length, direction);
}
return nents;
@@ -232,6 +234,8 @@ static void tile_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
BUG_ON(!valid_dma_direction(direction));
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
__dma_complete_pa_range(sg->dma_address, sg->length,
direction);
}
@@ -245,7 +249,8 @@ static dma_addr_t tile_dma_map_page(struct device *dev, struct page *page,
BUG_ON(!valid_dma_direction(direction));
BUG_ON(offset + size > PAGE_SIZE);
- __dma_prep_page(page, offset, size, direction);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ __dma_prep_page(page, offset, size, direction);
return page_to_pa(page) + offset;
}
@@ -256,6 +261,9 @@ static void tile_dma_unmap_page(struct device *dev, dma_addr_t dma_address,
{
BUG_ON(!valid_dma_direction(direction));
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return;
+
__dma_complete_page(pfn_to_page(PFN_DOWN(dma_address)),
dma_address & (PAGE_SIZE - 1), size, direction);
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/parisc/kernel/pci-dma.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 02d9ed0..be55ede 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -459,7 +459,9 @@ static dma_addr_t pa11_dma_map_page(struct device *dev, struct page *page,
void *addr = page_address(page) + offset;
BUG_ON(direction == DMA_NONE);
- flush_kernel_dcache_range((unsigned long) addr, size);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ flush_kernel_dcache_range((unsigned long) addr, size);
+
return virt_to_phys(addr);
}
@@ -469,8 +471,11 @@ static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
{
BUG_ON(direction == DMA_NONE);
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return;
+
if (direction == DMA_TO_DEVICE)
- return;
+ return;
/*
* For PCI_DMA_FROMDEVICE this flush is not necessary for the
@@ -479,7 +484,6 @@ static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
*/
flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle), size);
- return;
}
static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist,
@@ -496,6 +500,10 @@ static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist,
sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr);
sg_dma_len(sg) = sg->length;
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
flush_kernel_dcache_range(vaddr, sg->length);
}
return nents;
@@ -510,14 +518,16 @@ static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
BUG_ON(direction == DMA_NONE);
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return;
+
if (direction == DMA_TO_DEVICE)
- return;
+ return;
/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
for_each_sg(sglist, sg, nents, i)
flush_kernel_vmap_range(sg_virt(sg), sg->length);
- return;
}
static void pa11_dma_sync_single_for_cpu(struct device *dev,
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/powerpc/kernel/dma.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index e64a601..6877e3f 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -203,6 +203,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = sg_phys(sg) + get_dma_offset(dev);
sg->dma_length = sg->length;
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
}
@@ -235,7 +239,10 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
unsigned long attrs)
{
BUG_ON(dir == DMA_NONE);
- __dma_sync_page(page, offset, size, dir);
+
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ __dma_sync_page(page, offset, size, dir);
+
return page_to_phys(page) + offset + get_dma_offset(dev);
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: [email protected]
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/sh/kernel/dma-nommu.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index eadb669..47fee3b 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -18,7 +18,9 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
dma_addr_t addr = page_to_phys(page) + offset;
WARN_ON(size == 0);
- dma_cache_sync(dev, page_address(page) + offset, size, dir);
+
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_cache_sync(dev, page_address(page) + offset, size, dir);
return addr;
}
@@ -35,7 +37,8 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg,
for_each_sg(sg, s, nents, i) {
BUG_ON(!sg_page(s));
- dma_cache_sync(dev, sg_virt(s), s->length, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_cache_sync(dev, sg_virt(s), s->length, dir);
s->dma_address = sg_phys(s);
s->dma_length = s->length;
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Cc: Jonas Bonn <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/openrisc/kernel/dma.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index 140c991..906998b 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -141,6 +141,9 @@ or1k_map_page(struct device *dev, struct page *page,
unsigned long cl;
dma_addr_t addr = page_to_phys(page) + offset;
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return addr;
+
switch (dir) {
case DMA_TO_DEVICE:
/* Flush the dcache for the requested range */
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA
APIs in the arch/arm folder. This change is meant to correct that so that
we get consistent behavior.
Cc: Steven Miao <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/blackfin/kernel/dma-mapping.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/blackfin/kernel/dma-mapping.c b/arch/blackfin/kernel/dma-mapping.c
index 53fbbb6..a27a74a 100644
--- a/arch/blackfin/kernel/dma-mapping.c
+++ b/arch/blackfin/kernel/dma-mapping.c
@@ -118,6 +118,10 @@ static int bfin_dma_map_sg(struct device *dev, struct scatterlist *sg_list,
for_each_sg(sg_list, sg, nents, i) {
sg->dma_address = (dma_addr_t) sg_virt(sg);
+
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ continue;
+
__dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction);
}
@@ -143,7 +147,9 @@ static dma_addr_t bfin_dma_map_page(struct device *dev, struct page *page,
{
dma_addr_t handle = (dma_addr_t)(page_address(page) + offset);
- _dma_sync(handle, size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ _dma_sync(handle, size, dir);
+
return handle;
}
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.
Acked-by: Mark Salter <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
arch/c6x/kernel/dma.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c
index db4a6a3..6752df3 100644
--- a/arch/c6x/kernel/dma.c
+++ b/arch/c6x/kernel/dma.c
@@ -42,14 +42,17 @@ static dma_addr_t c6x_dma_map_page(struct device *dev, struct page *page,
{
dma_addr_t handle = virt_to_phys(page_address(page) + offset);
- c6x_dma_sync(handle, size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ c6x_dma_sync(handle, size, dir);
+
return handle;
}
static void c6x_dma_unmap_page(struct device *dev, dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- c6x_dma_sync(handle, size, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ c6x_dma_sync(handle, size, dir);
}
static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist,
@@ -60,7 +63,8 @@ static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist,
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
- c6x_dma_sync(sg->dma_address, sg->length, dir);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ c6x_dma_sync(sg->dma_address, sg->length, dir);
}
return nents;
@@ -72,9 +76,11 @@ static void c6x_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
struct scatterlist *sg;
int i;
+ if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+ return;
+
for_each_sg(sglist, sg, nents, i)
c6x_dma_sync(sg_dma_address(sg), sg->length, dir);
-
}
static void c6x_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
On Thu, Nov 10, 2016 at 06:34:52AM -0500, Alexander Duyck wrote:
> This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
> avoid invoking cache line invalidation if the driver will just handle it
> later via a sync_for_cpu or sync_for_device call.
>
> Cc: Richard Kuo <[email protected]>
> Cc: [email protected]
> Signed-off-by: Alexander Duyck <[email protected]>
> ---
> arch/hexagon/kernel/dma.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
For Hexagon:
Acked-by: Richard Kuo <[email protected]>
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On 2016-11-10 at 12:35:18 +0100, Alexander Duyck <[email protected]> wrote:
> This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
> avoid invoking cache line invalidation if the driver will just handle it
> via a sync_for_cpu or sync_for_device call.
>
> Cc: Ley Foon Tan <[email protected]>
> Signed-off-by: Alexander Duyck <[email protected]>
Reviewed-by: Tobias Klauser <[email protected]>
From: Alexander Duyck <[email protected]>
Date: Thu, 10 Nov 2016 06:35:45 -0500
> This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
> avoid invoking cache line invalidation if the driver will just handle it
> via a sync_for_cpu or sync_for_device call.
>
> Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: David S. Miller <[email protected]>
On Thu, Nov 10, 2016 at 3:34 AM, Alexander Duyck
<[email protected]> wrote:
> The first 19 patches in the set add support for the DMA attribute
> DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures. This is needed
> so that we can flag the calls to dma_map/unmap_page so that we do not
> invalidate cache lines that do not currently belong to the device. Instead
> we have to take care of this in the driver via a call to
> sync_single_range_for_cpu prior to freeing the Rx page.
>
> Patch 20 adds support for dma_map_page_attrs and dma_unmap_page_attrs so
> that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC
> attribute.
>
> Patch 21 adds support for freeing a page that has multiple references being
> held by a single caller. This way we can free page fragments that were
> allocated by a given driver.
>
> The last 2 patches use these updates in the igb driver, and lay the
> groundwork to allow for us to reimplement the use of build_skb.
>
> v1: Minor fixes based on issues found by kernel build bot
> Few minor changes for issues found on code review
> Added Acked-by for patches that were acked and not changed
>
> v2: Added a few more Acked-by
> Submitting patches to mm instead of net-next
>
> v3: Added Acked-by for PowerPC architecture
> Dropped first 3 patches which were accepted into swiotlb tree
> Dropped comments describing swiotlb changes.
>
> ---
>
> Alexander Duyck (23):
> arch/arc: Add option to skip sync on DMA mapping
> arch/arm: Add option to skip sync on DMA map and unmap
> arch/avr32: Add option to skip sync on DMA map
> arch/blackfin: Add option to skip sync on DMA map
> arch/c6x: Add option to skip sync on DMA map and unmap
> arch/frv: Add option to skip sync on DMA map
> arch/hexagon: Add option to skip DMA sync as a part of mapping
> arch/m68k: Add option to skip DMA sync as a part of mapping
> arch/metag: Add option to skip DMA sync as a part of map and unmap
> arch/microblaze: Add option to skip DMA sync as a part of map and unmap
> arch/mips: Add option to skip DMA sync as a part of map and unmap
> arch/nios2: Add option to skip DMA sync as a part of map and unmap
> arch/openrisc: Add option to skip DMA sync as a part of mapping
> arch/parisc: Add option to skip DMA sync as a part of map and unmap
> arch/powerpc: Add option to skip DMA sync as a part of mapping
> arch/sh: Add option to skip DMA sync as a part of mapping
> arch/sparc: Add option to skip DMA sync as a part of map and unmap
> arch/tile: Add option to skip DMA sync as a part of map and unmap
> arch/xtensa: Add option to skip DMA sync as a part of mapping
> dma: Add calls for dma_map_page_attrs and dma_unmap_page_attrs
> mm: Add support for releasing multiple instances of a page
> igb: Update driver to make use of DMA_ATTR_SKIP_CPU_SYNC
> igb: Update code to better handle incrementing page count
>
>
> arch/arc/mm/dma.c | 5 ++
> arch/arm/common/dmabounce.c | 16 ++++--
> arch/avr32/mm/dma-coherent.c | 7 ++-
> arch/blackfin/kernel/dma-mapping.c | 8 +++
> arch/c6x/kernel/dma.c | 14 ++++-
> arch/frv/mb93090-mb00/pci-dma-nommu.c | 14 ++++-
> arch/frv/mb93090-mb00/pci-dma.c | 9 +++
> arch/hexagon/kernel/dma.c | 6 ++
> arch/m68k/kernel/dma.c | 8 +++
> arch/metag/kernel/dma.c | 16 +++++-
> arch/microblaze/kernel/dma.c | 10 +++-
> arch/mips/loongson64/common/dma-swiotlb.c | 2 -
> arch/mips/mm/dma-default.c | 8 ++-
> arch/nios2/mm/dma-mapping.c | 26 +++++++---
> arch/openrisc/kernel/dma.c | 3 +
> arch/parisc/kernel/pci-dma.c | 20 ++++++--
> arch/powerpc/kernel/dma.c | 9 +++
> arch/sh/kernel/dma-nommu.c | 7 ++-
> arch/sparc/kernel/iommu.c | 4 +-
> arch/sparc/kernel/ioport.c | 4 +-
> arch/tile/kernel/pci-dma.c | 12 ++++-
> arch/xtensa/kernel/pci-dma.c | 7 ++-
> drivers/net/ethernet/intel/igb/igb.h | 7 ++-
> drivers/net/ethernet/intel/igb/igb_main.c | 77 +++++++++++++++++++----------
> include/linux/dma-mapping.h | 20 +++++---
> include/linux/gfp.h | 2 +
> mm/page_alloc.c | 14 +++++
> 27 files changed, 246 insertions(+), 89 deletions(-)
>
So I am just wondering if I need to resubmit this to pick up the new
"Acked-by"s or if I should just wait?
As I said in the description my hope is to get this into the -mm tree
and I am not familiar with what the process is for being accepted
there.
Thanks.
- Alex
On Thu, 10 Nov 2016 06:36:06 -0500 Alexander Duyck <[email protected]> wrote:
> This patch adds a function that allows us to batch free a page that has
> multiple references outstanding. Specifically this function can be used to
> drop a page being used in the page frag alloc cache. With this drivers can
> make use of functionality similar to the page frag alloc cache without
> having to do any workarounds for the fact that there is no function that
> frees multiple references.
>
> ...
>
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -506,6 +506,8 @@ extern void free_hot_cold_page(struct page *page, bool cold);
> extern void free_hot_cold_page_list(struct list_head *list, bool cold);
>
> struct page_frag_cache;
> +extern void __page_frag_drain(struct page *page, unsigned int order,
> + unsigned int count);
> extern void *__alloc_page_frag(struct page_frag_cache *nc,
> unsigned int fragsz, gfp_t gfp_mask);
> extern void __free_page_frag(void *addr);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0fbfead..54fea40 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3912,6 +3912,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc,
> return page;
> }
>
> +void __page_frag_drain(struct page *page, unsigned int order,
> + unsigned int count)
> +{
> + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> +
> + if (page_ref_sub_and_test(page, count)) {
> + if (order == 0)
> + free_hot_cold_page(page, false);
> + else
> + __free_pages_ok(page, order);
> + }
> +}
> +EXPORT_SYMBOL(__page_frag_drain);
It's an exported-to-modules library function. It should be documented,
please? The page-frag API is only partially documented, but that's no
excuse.
And perhaps documentation will help explain the naming choice. Why
"drain"? I'd have expected "put"?
And why the leading underscores. The page-frag API is pretty weird :(
And inconsistent. __alloc_page_frag -> page_frag_alloc,
__free_page_frag -> page_frag_free(), etc. I must have been asleep
when I let that lot through.
On Fri, Nov 18, 2016 at 3:27 PM, Andrew Morton
<[email protected]> wrote:
> On Thu, 10 Nov 2016 06:36:06 -0500 Alexander Duyck <[email protected]> wrote:
>
>> This patch adds a function that allows us to batch free a page that has
>> multiple references outstanding. Specifically this function can be used to
>> drop a page being used in the page frag alloc cache. With this drivers can
>> make use of functionality similar to the page frag alloc cache without
>> having to do any workarounds for the fact that there is no function that
>> frees multiple references.
>>
>> ...
>>
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -506,6 +506,8 @@ extern void free_hot_cold_page(struct page *page, bool cold);
>> extern void free_hot_cold_page_list(struct list_head *list, bool cold);
>>
>> struct page_frag_cache;
>> +extern void __page_frag_drain(struct page *page, unsigned int order,
>> + unsigned int count);
>> extern void *__alloc_page_frag(struct page_frag_cache *nc,
>> unsigned int fragsz, gfp_t gfp_mask);
>> extern void __free_page_frag(void *addr);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 0fbfead..54fea40 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3912,6 +3912,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc,
>> return page;
>> }
>>
>> +void __page_frag_drain(struct page *page, unsigned int order,
>> + unsigned int count)
>> +{
>> + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
>> +
>> + if (page_ref_sub_and_test(page, count)) {
>> + if (order == 0)
>> + free_hot_cold_page(page, false);
>> + else
>> + __free_pages_ok(page, order);
>> + }
>> +}
>> +EXPORT_SYMBOL(__page_frag_drain);
>
> It's an exported-to-modules library function. It should be documented,
> please? The page-frag API is only partially documented, but that's no
> excuse.
Okay. I assume you want the documentation as a follow-up patch since
I received a notice that the patch was added to -mm?
> And perhaps documentation will help explain the naming choice. Why
> "drain"? I'd have expected "put"?
The idea was that this is supposed to be a counterpart to
__page_frag_refill. Basically it is a function we can use if we need
to tear down the page frag cache and free the backing page. If you
want I could update the names for these functions to make that
clarification that this is meant to drain a frag cache versus just
freeing a page frag. I had originally thought about coming up with an
mput or something like that since we are dropping multiple references,
but then I figured since we already had __page_frag_refill I would go
for __page_frag_drain.
> And why the leading underscores. The page-frag API is pretty weird :(
>
> And inconsistent. __alloc_page_frag -> page_frag_alloc,
> __free_page_frag -> page_frag_free(), etc. I must have been asleep
> when I let that lot through.
The leading underscores are inherited. Most of it has to do with the
fact that this is a backing API for the netdev sk_buff allocator.
When this stuff existed in net it was already named this way and I
just moved it over. I'm not sure if you approved it or not as I don't
see an Ack-by or Signed-off-by from you on the patch. The timing of
it was such that I think Linus approved it and it was then pulled in
through Dave's tree.
If you would like I could look at doing a couple of renaming patches
so that we make the API a bit more consistent. I could move the
__alloc and __free to what you have suggested, and then take a look at
trying to rename the refill/drain to be a bit more consistent in terms
of what they are supposed to work on and how they are supposed to be
used.
- Alex
On Mon, 21 Nov 2016 08:21:39 -0800 Alexander Duyck <[email protected]> wrote:
> >> + __free_pages_ok(page, order);
> >> + }
> >> +}
> >> +EXPORT_SYMBOL(__page_frag_drain);
> >
> > It's an exported-to-modules library function. It should be documented,
> > please? The page-frag API is only partially documented, but that's no
> > excuse.
>
> Okay. I assume you want the documentation as a follow-up patch since
> I received a notice that the patch was added to -mm?
Yes please. Or a replacement patch which I'll temporarily turn into a
delta, either is fine.
> If you would like I could look at doing a couple of renaming patches
> so that we make the API a bit more consistent. I could move the
> __alloc and __free to what you have suggested, and then take a look at
> trying to rename the refill/drain to be a bit more consistent in terms
> of what they are supposed to work on and how they are supposed to be
> used.
I think that would be better - it's hardly high-priority but a bit of
attention to the documentation and naming conventions would help tidy
things up. When you can't find anything else to do ;)