2014-07-02 18:03:47

by Laura Abbott

[permalink] [raw]
Subject: [PATCHv4 0/5] DMA Atomic pool for arm64

Hi,

This is the latest in a series to add support for atomic DMA allocations
for non-coherent/CMA code paths in arm64. I did some refactoring to have
arm also use genalloc and pull out some of the remapping code.

This could probably use more testing on other platforms, especially those
that have CONFIG_ARM_DMA_USE_IOMMU set.

Thanks,
Laura

v4: Simplified the logic in gen_pool_first_fit_order_align which makes the
data argument actually unused.

v3: Now a patch series due to refactoring of arm code. arm and arm64 now both
use genalloc for atomic pool management. genalloc extensions added.
DMA remapping code factored out as well.

v2: Various bug fixes pointed out by David and Ritesh (CMA dependency, swapping
coherent, noncoherent). I'm still not sure how to address the devicetree
suggestion by Will [1][2]. I added the devicetree mailing list this time around
to get more input on this.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/249180.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/249528.html


Laura Abbott (5):
lib/genalloc.c: Add power aligned algorithm
lib/genalloc.c: Add genpool range check function
common: dma-mapping: Introduce common remapping functions
arm: use genalloc for the atomic pool
arm64: Add atomic pool for non-coherent and CMA allocations.

arch/arm/Kconfig | 1 +
arch/arm/mm/dma-mapping.c | 200 ++++++++-----------------------
arch/arm64/Kconfig | 1 +
arch/arm64/mm/dma-mapping.c | 154 +++++++++++++++++++++---
drivers/base/dma-mapping.c | 67 +++++++++++
include/asm-generic/dma-mapping-common.h | 9 ++
include/linux/genalloc.h | 7 ++
lib/genalloc.c | 49 ++++++++
8 files changed, 323 insertions(+), 165 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation


2014-07-02 18:03:49

by Laura Abbott

[permalink] [raw]
Subject: [PATCHv4 1/5] lib/genalloc.c: Add power aligned algorithm


One of the more common algorithms used for allocation
is to align the start address of the allocation to
the order of size requested. Add this as an algorithm
option for genalloc.

Signed-off-by: Laura Abbott <[email protected]>
---
include/linux/genalloc.h | 4 ++++
lib/genalloc.c | 20 ++++++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 1c2fdaa..3cd0934 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -110,6 +110,10 @@ extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo,
extern unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data);

+extern unsigned long gen_pool_first_fit_order_align(unsigned long *map,
+ unsigned long size, unsigned long start, unsigned int nr,
+ void *data);
+
extern unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data);

diff --git a/lib/genalloc.c b/lib/genalloc.c
index bdb9a45..c2b3ad7 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -481,6 +481,26 @@ unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
EXPORT_SYMBOL(gen_pool_first_fit);

/**
+ * gen_pool_first_fit_order_align - find the first available region
+ * of memory matching the size requirement. The region will be aligned
+ * to the order of the size specified.
+ * @map: The address to base the search on
+ * @size: The bitmap size in bits
+ * @start: The bitnumber to start searching at
+ * @nr: The number of zeroed bits we're looking for
+ * @data: additional data - unused
+ */
+unsigned long gen_pool_first_fit_order_align(unsigned long *map,
+ unsigned long size, unsigned long start,
+ unsigned int nr, void *data)
+{
+ unsigned long align_mask = roundup_pow_of_two(nr) - 1;
+
+ return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+}
+EXPORT_SYMBOL(gen_pool_first_fit_order_align);
+
+/**
* gen_pool_best_fit - find the best fitting region of memory
* macthing the size requirement (no alignment constraint)
* @map: The address to base the search on
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-07-02 18:03:53

by Laura Abbott

[permalink] [raw]
Subject: [PATCHv4 4/5] arm: use genalloc for the atomic pool


ARM currently uses a bitmap for tracking atomic allocations.
genalloc already handles this type of memory pool allocation
so switch to using that instead.

Signed-off-by: Laura Abbott <[email protected]>
---
arch/arm/Kconfig | 1 +
arch/arm/mm/dma-mapping.c | 143 ++++++++++++++--------------------------------
2 files changed, 44 insertions(+), 100 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 245058b..190d085 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -13,6 +13,7 @@ config ARM
select CLONE_BACKWARDS
select CPU_PM if (SUSPEND || CPU_IDLE)
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
+ select GENERIC_ALLOCATOR
select GENERIC_ATOMIC64 if (CPU_V7M || CPU_V6 || !CPU_32v6K || !AEABI)
select GENERIC_CLOCKEVENTS_BROADCAST if SMP
select GENERIC_IDLE_POLL_SETUP
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f5190ac..02a1939 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -26,6 +26,7 @@
#include <linux/io.h>
#include <linux/vmalloc.h>
#include <linux/sizes.h>
+#include <linux/genalloc.h>

#include <asm/memory.h>
#include <asm/highmem.h>
@@ -313,40 +314,31 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
}

#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
+static struct gen_pool *atomic_pool;

-struct dma_pool {
- size_t size;
- spinlock_t lock;
- unsigned long *bitmap;
- unsigned long nr_pages;
- void *vaddr;
- struct page **pages;
-};
-
-static struct dma_pool atomic_pool = {
- .size = DEFAULT_DMA_COHERENT_POOL_SIZE,
-};
+static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;

static int __init early_coherent_pool(char *p)
{
- atomic_pool.size = memparse(p, &p);
+ atomic_pool_size = memparse(p, &p);
return 0;
}
early_param("coherent_pool", early_coherent_pool);

+
void __init init_dma_coherent_pool_size(unsigned long size)
{
/*
* Catch any attempt to set the pool size too late.
*/
- BUG_ON(atomic_pool.vaddr);
+ BUG_ON(atomic_pool);

/*
* Set architecture specific coherent pool size only if
* it has not been changed by kernel command line parameter.
*/
- if (atomic_pool.size == DEFAULT_DMA_COHERENT_POOL_SIZE)
- atomic_pool.size = size;
+ if (atomic_pool_size == DEFAULT_DMA_COHERENT_POOL_SIZE)
+ atomic_pool_size = size;
}

/*
@@ -354,52 +346,43 @@ void __init init_dma_coherent_pool_size(unsigned long size)
*/
static int __init atomic_pool_init(void)
{
- struct dma_pool *pool = &atomic_pool;
pgprot_t prot = pgprot_dmacoherent(PAGE_KERNEL);
gfp_t gfp = GFP_KERNEL | GFP_DMA;
- unsigned long nr_pages = pool->size >> PAGE_SHIFT;
- unsigned long *bitmap;
struct page *page;
- struct page **pages;
void *ptr;
- int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long);

- bitmap = kzalloc(bitmap_size, GFP_KERNEL);
- if (!bitmap)
- goto no_bitmap;
-
- pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL);
- if (!pages)
- goto no_pages;
+ atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (!atomic_pool)
+ goto out;

if (dev_get_cma_area(NULL))
- ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page,
- atomic_pool_init);
+ ptr = __alloc_from_contiguous(NULL, atomic_pool_size, prot,
+ &page, atomic_pool_init);
else
- ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page,
- atomic_pool_init);
+ ptr = __alloc_remap_buffer(NULL, atomic_pool_size, gfp, prot,
+ &page, atomic_pool_init);
if (ptr) {
- int i;
-
- for (i = 0; i < nr_pages; i++)
- pages[i] = page + i;
-
- spin_lock_init(&pool->lock);
- pool->vaddr = ptr;
- pool->pages = pages;
- pool->bitmap = bitmap;
- pool->nr_pages = nr_pages;
- pr_info("DMA: preallocated %u KiB pool for atomic coherent allocations\n",
- (unsigned)pool->size / 1024);
+ int ret;
+
+ ret = gen_pool_add_virt(atomic_pool, (unsigned long)ptr,
+ page_to_phys(page),
+ atomic_pool_size, -1);
+ if (ret)
+ goto destroy_genpool;
+
+ gen_pool_set_algo(atomic_pool,
+ gen_pool_first_fit_order_align, NULL);
+ pr_info("DMA: preallocated %zd KiB pool for atomic coherent allocations\n",
+ atomic_pool_size / 1024);
return 0;
}

- kfree(pages);
-no_pages:
- kfree(bitmap);
-no_bitmap:
- pr_err("DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
- (unsigned)pool->size / 1024);
+destroy_genpool:
+ gen_pool_destroy(atomic_pool);
+ atomic_pool = NULL;
+out:
+ pr_err("DMA: failed to allocate %zx KiB pool for atomic coherent allocation\n",
+ atomic_pool_size / 1024);
return -ENOMEM;
}
/*
@@ -494,76 +477,36 @@ static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,

static void *__alloc_from_pool(size_t size, struct page **ret_page)
{
- struct dma_pool *pool = &atomic_pool;
- unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
- unsigned int pageno;
- unsigned long flags;
+ unsigned long val;
void *ptr = NULL;
- unsigned long align_mask;

- if (!pool->vaddr) {
+ if (!atomic_pool) {
WARN(1, "coherent pool not initialised!\n");
return NULL;
}

- /*
- * Align the region allocation - allocations from pool are rather
- * small, so align them to their order in pages, minimum is a page
- * size. This helps reduce fragmentation of the DMA space.
- */
- align_mask = (1 << get_order(size)) - 1;
-
- spin_lock_irqsave(&pool->lock, flags);
- pageno = bitmap_find_next_zero_area(pool->bitmap, pool->nr_pages,
- 0, count, align_mask);
- if (pageno < pool->nr_pages) {
- bitmap_set(pool->bitmap, pageno, count);
- ptr = pool->vaddr + PAGE_SIZE * pageno;
- *ret_page = pool->pages[pageno];
- } else {
- pr_err_once("ERROR: %u KiB atomic DMA coherent pool is too small!\n"
- "Please increase it with coherent_pool= kernel parameter!\n",
- (unsigned)pool->size / 1024);
+ val = gen_pool_alloc(atomic_pool, size);
+ if (val) {
+ phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val);
+
+ *ret_page = phys_to_page(phys);
+ ptr = (void *)val;
}
- spin_unlock_irqrestore(&pool->lock, flags);

return ptr;
}

static bool __in_atomic_pool(void *start, size_t size)
{
- struct dma_pool *pool = &atomic_pool;
- void *end = start + size;
- void *pool_start = pool->vaddr;
- void *pool_end = pool->vaddr + pool->size;
-
- if (start < pool_start || start >= pool_end)
- return false;
-
- if (end <= pool_end)
- return true;
-
- WARN(1, "Wrong coherent size(%p-%p) from atomic pool(%p-%p)\n",
- start, end - 1, pool_start, pool_end - 1);
-
- return false;
+ return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
}

static int __free_from_pool(void *start, size_t size)
{
- struct dma_pool *pool = &atomic_pool;
- unsigned long pageno, count;
- unsigned long flags;
-
if (!__in_atomic_pool(start, size))
return 0;

- pageno = (start - pool->vaddr) >> PAGE_SHIFT;
- count = size >> PAGE_SHIFT;
-
- spin_lock_irqsave(&pool->lock, flags);
- bitmap_clear(pool->bitmap, pageno, count);
- spin_unlock_irqrestore(&pool->lock, flags);
+ gen_pool_free(atomic_pool, (unsigned long)start, size);

return 1;
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-07-02 18:03:52

by Laura Abbott

[permalink] [raw]
Subject: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.


Neither CMA nor noncoherent allocations support atomic allocations.
Add a dedicated atomic pool to support this.

Signed-off-by: Laura Abbott <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/mm/dma-mapping.c | 154 +++++++++++++++++++++++++++++++++++++++-----
2 files changed, 138 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a474de34..cd402b9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -15,6 +15,7 @@ config ARM64
select COMMON_CLK
select CPU_PM if (SUSPEND || CPU_IDLE)
select DCACHE_WORD_ACCESS
+ select GENERIC_ALLOCATOR
select GENERIC_CLOCKEVENTS
select GENERIC_CLOCKEVENTS_BROADCAST if SMP
select GENERIC_CPU_AUTOPROBE
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4164c5a..a2487f1 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -27,6 +27,7 @@
#include <linux/vmalloc.h>
#include <linux/swiotlb.h>
#include <linux/amba/bus.h>
+#include <linux/genalloc.h>

#include <asm/cacheflush.h>

@@ -41,6 +42,55 @@ static pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot,
return prot;
}

+static struct gen_pool *atomic_pool;
+
+#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
+static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
+
+static int __init early_coherent_pool(char *p)
+{
+ atomic_pool_size = memparse(p, &p);
+ return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+static void *__alloc_from_pool(size_t size, struct page **ret_page)
+{
+ unsigned long val;
+ void *ptr = NULL;
+
+ if (!atomic_pool) {
+ WARN(1, "coherent pool not initialised!\n");
+ return NULL;
+ }
+
+ val = gen_pool_alloc(atomic_pool, size);
+ if (val) {
+ phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val);
+
+ *ret_page = phys_to_page(phys);
+ ptr = (void *)val;
+ }
+
+ return ptr;
+}
+
+static bool __in_atomic_pool(void *start, size_t size)
+{
+ return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
+}
+
+static int __free_from_pool(void *start, size_t size)
+{
+ if (!__in_atomic_pool(start, size))
+ return 0;
+
+ gen_pool_free(atomic_pool, (unsigned long)start, size);
+
+ return 1;
+}
+
+
static void *__dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flags,
struct dma_attrs *attrs)
@@ -53,7 +103,8 @@ static void *__dma_alloc_coherent(struct device *dev, size_t size,
if (IS_ENABLED(CONFIG_ZONE_DMA) &&
dev->coherent_dma_mask <= DMA_BIT_MASK(32))
flags |= GFP_DMA;
- if (IS_ENABLED(CONFIG_DMA_CMA)) {
+
+ if (!(flags & __GFP_WAIT) && IS_ENABLED(CONFIG_DMA_CMA)) {
struct page *page;

size = PAGE_ALIGN(size);
@@ -73,50 +124,56 @@ static void __dma_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
struct dma_attrs *attrs)
{
+ bool freed;
+ phys_addr_t paddr = dma_to_phys(dev, dma_handle);
+
if (dev == NULL) {
WARN_ONCE(1, "Use an actual device structure for DMA allocation\n");
return;
}

- if (IS_ENABLED(CONFIG_DMA_CMA)) {
- phys_addr_t paddr = dma_to_phys(dev, dma_handle);

- dma_release_from_contiguous(dev,
+ freed = dma_release_from_contiguous(dev,
phys_to_page(paddr),
size >> PAGE_SHIFT);
- } else {
+ if (!freed)
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
- }
}

static void *__dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flags,
struct dma_attrs *attrs)
{
- struct page *page, **map;
+ struct page *page;
void *ptr, *coherent_ptr;
- int order, i;

size = PAGE_ALIGN(size);
- order = get_order(size);
+
+ if (!(flags & __GFP_WAIT)) {
+ struct page *page = NULL;
+ void *addr = __alloc_from_pool(size, &page);
+
+ if (addr)
+ *dma_handle = phys_to_dma(dev, page_to_phys(page));
+
+ return addr;
+
+ }

ptr = __dma_alloc_coherent(dev, size, dma_handle, flags, attrs);
if (!ptr)
goto no_mem;
- map = kmalloc(sizeof(struct page *) << order, flags & ~GFP_DMA);
- if (!map)
- goto no_map;

/* remove any dirty cache lines on the kernel alias */
__dma_flush_range(ptr, ptr + size);

+
/* create a coherent mapping */
page = virt_to_page(ptr);
- for (i = 0; i < (size >> PAGE_SHIFT); i++)
- map[i] = page + i;
- coherent_ptr = vmap(map, size >> PAGE_SHIFT, VM_MAP,
- __get_dma_pgprot(attrs, __pgprot(PROT_NORMAL_NC), false));
- kfree(map);
+ coherent_ptr = dma_common_contiguous_remap(page, size, VM_USERMAP,
+ __get_dma_pgprot(attrs,
+ __pgprot(PROT_NORMAL_NC), false),
+ NULL);
if (!coherent_ptr)
goto no_map;

@@ -135,6 +192,8 @@ static void __dma_free_noncoherent(struct device *dev, size_t size,
{
void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));

+ if (__free_from_pool(vaddr, size))
+ return;
vunmap(vaddr);
__dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
}
@@ -332,6 +391,67 @@ static struct notifier_block amba_bus_nb = {

extern int swiotlb_late_init_with_default_size(size_t default_size);

+static int __init atomic_pool_init(void)
+{
+ pgprot_t prot = __pgprot(PROT_NORMAL_NC);
+ unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
+ struct page *page;
+ void *addr;
+
+
+ if (dev_get_cma_area(NULL))
+ page = dma_alloc_from_contiguous(NULL, nr_pages,
+ get_order(atomic_pool_size));
+ else
+ page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
+
+
+ if (page) {
+ int ret;
+
+ atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (!atomic_pool)
+ goto free_page;
+
+ addr = dma_common_contiguous_remap(page, atomic_pool_size,
+ VM_USERMAP, prot, atomic_pool_init);
+
+ if (!addr)
+ goto destroy_genpool;
+
+ memset(addr, 0, atomic_pool_size);
+ __dma_flush_range(addr, addr + atomic_pool_size);
+
+ ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr,
+ page_to_phys(page),
+ atomic_pool_size, -1);
+ if (ret)
+ goto remove_mapping;
+
+ gen_pool_set_algo(atomic_pool,
+ gen_pool_first_fit_order_align, NULL);
+
+ pr_info("DMA: preallocated %zd KiB pool for atomic allocations\n",
+ atomic_pool_size / 1024);
+ return 0;
+ }
+ goto out;
+
+remove_mapping:
+ dma_common_free_remap(addr, atomic_pool_size, VM_USERMAP);
+destroy_genpool:
+ gen_pool_destroy(atomic_pool);
+ atomic_pool == NULL;
+free_page:
+ if (!dma_release_from_contiguous(NULL, page, nr_pages))
+ __free_pages(page, get_order(atomic_pool_size));
+out:
+ pr_err("DMA: failed to allocate %zx KiB pool for atomic coherent allocation\n",
+ atomic_pool_size / 1024);
+ return -ENOMEM;
+}
+postcore_initcall(atomic_pool_init);
+
static int __init swiotlb_late_init(void)
{
size_t swiotlb_size = min(SZ_64M, MAX_ORDER_NR_PAGES << PAGE_SHIFT);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-07-02 18:04:40

by Laura Abbott

[permalink] [raw]
Subject: [PATCHv4 3/5] common: dma-mapping: Introduce common remapping functions


For architectures without coherent DMA, memory for DMA may
need to be remapped with coherent attributes. Factor out
the the remapping code from arm and put it in a
common location to reduced code duplication.

Signed-off-by: Laura Abbott <[email protected]>
---
arch/arm/mm/dma-mapping.c | 57 +++++----------------------
drivers/base/dma-mapping.c | 67 ++++++++++++++++++++++++++++++++
include/asm-generic/dma-mapping-common.h | 9 +++++
3 files changed, 85 insertions(+), 48 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 4c88935..f5190ac 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -297,37 +297,19 @@ static void *
__dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot,
const void *caller)
{
- struct vm_struct *area;
- unsigned long addr;
-
/*
* DMA allocation can be mapped to user space, so lets
* set VM_USERMAP flags too.
*/
- area = get_vm_area_caller(size, VM_ARM_DMA_CONSISTENT | VM_USERMAP,
- caller);
- if (!area)
- return NULL;
- addr = (unsigned long)area->addr;
- area->phys_addr = __pfn_to_phys(page_to_pfn(page));
-
- if (ioremap_page_range(addr, addr + size, area->phys_addr, prot)) {
- vunmap((void *)addr);
- return NULL;
- }
- return (void *)addr;
+ return dma_common_contiguous_remap(page, size,
+ VM_ARM_DMA_CONSISTENT | VM_USERMAP,
+ prot, caller);
}

static void __dma_free_remap(void *cpu_addr, size_t size)
{
- unsigned int flags = VM_ARM_DMA_CONSISTENT | VM_USERMAP;
- struct vm_struct *area = find_vm_area(cpu_addr);
- if (!area || (area->flags & flags) != flags) {
- WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
- return;
- }
- unmap_kernel_range((unsigned long)cpu_addr, size);
- vunmap(cpu_addr);
+ dma_common_free_remap(cpu_addr, size,
+ VM_ARM_DMA_CONSISTENT | VM_USERMAP);
}

#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
@@ -1261,29 +1243,8 @@ static void *
__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot,
const void *caller)
{
- unsigned int i, nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
- struct vm_struct *area;
- unsigned long p;
-
- area = get_vm_area_caller(size, VM_ARM_DMA_CONSISTENT | VM_USERMAP,
- caller);
- if (!area)
- return NULL;
-
- area->pages = pages;
- area->nr_pages = nr_pages;
- p = (unsigned long)area->addr;
-
- for (i = 0; i < nr_pages; i++) {
- phys_addr_t phys = __pfn_to_phys(page_to_pfn(pages[i]));
- if (ioremap_page_range(p, p + PAGE_SIZE, phys, prot))
- goto err;
- p += PAGE_SIZE;
- }
- return area->addr;
-err:
- unmap_kernel_range((unsigned long)area->addr, size);
- vunmap(area->addr);
+ return dma_common_pages_remap(pages, size,
+ VM_ARM_DMA_CONSISTENT | VM_USERMAP, prot, caller);
return NULL;
}

@@ -1491,8 +1452,8 @@ void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
}

if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
- unmap_kernel_range((unsigned long)cpu_addr, size);
- vunmap(cpu_addr);
+ dma_common_free_remap(cpu_addr, size,
+ VM_ARM_DMA_CONSISTENT | VM_USERMAP);
}

__iommu_remove_mapping(dev, handle, size);
diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
index 6cd08e1..34265b5 100644
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -10,6 +10,8 @@
#include <linux/dma-mapping.h>
#include <linux/export.h>
#include <linux/gfp.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
#include <asm-generic/dma-coherent.h>

/*
@@ -267,3 +269,68 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
return ret;
}
EXPORT_SYMBOL(dma_common_mmap);
+
+/*
+ * remaps an allocated contiguous region into another vm_area.
+ * Cannot be used in non-sleeping contexts
+ */
+
+void *dma_common_contiguous_remap(struct page *page, size_t size,
+ unsigned long vm_flags,
+ pgprot_t prot, const void *caller)
+{
+ int i;
+ struct page **pages;
+ void *ptr;
+
+ pages = kmalloc(sizeof(struct page *) << get_order(size), GFP_KERNEL);
+ if (!pages)
+ return NULL;
+
+ for (i = 0; i < (size >> PAGE_SHIFT); i++)
+ pages[i] = page + i;
+
+ ptr = dma_common_pages_remap(pages, size, vm_flags, prot, caller);
+
+ kfree(pages);
+
+ return ptr;
+}
+
+/*
+ * remaps an array of PAGE_SIZE pages into another vm_area
+ * Cannot be used in non-sleeping contexts
+ */
+void *dma_common_pages_remap(struct page **pages, size_t size,
+ unsigned long vm_flags, pgprot_t prot,
+ const void *caller)
+{
+ struct vm_struct *area;
+
+ area = get_vm_area_caller(size, vm_flags, caller);
+ if (!area)
+ return NULL;
+
+ if (map_vm_area(area, prot, &pages)) {
+ vunmap(area->addr);
+ return NULL;
+ }
+
+ return area->addr;
+}
+
+/*
+ * unmaps a range previously mapped by dma_common_*_remap
+ */
+void dma_common_free_remap(void *cpu_addr, size_t size, unsigned long vm_flags)
+{
+ struct vm_struct *area = find_vm_area(cpu_addr);
+
+ if (!area || (area->flags & vm_flags) != vm_flags) {
+ WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
+ return;
+ }
+
+ unmap_kernel_range((unsigned long)cpu_addr, size);
+ vunmap(cpu_addr);
+}
diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h
index de8bf89..a9fd248 100644
--- a/include/asm-generic/dma-mapping-common.h
+++ b/include/asm-generic/dma-mapping-common.h
@@ -179,6 +179,15 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size);

+void *dma_common_contiguous_remap(struct page *page, size_t size,
+ unsigned long vm_flags,
+ pgprot_t prot, const void *caller);
+
+void *dma_common_pages_remap(struct page **pages, size_t size,
+ unsigned long vm_flags, pgprot_t prot,
+ const void *caller);
+void dma_common_free_remap(void *cpu_addr, size_t size, unsigned long vm_flags);
+
/**
* dma_mmap_attrs - map a coherent DMA allocation into user space
* @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-07-02 18:05:20

by Laura Abbott

[permalink] [raw]
Subject: [PATCHv4 2/5] lib/genalloc.c: Add genpool range check function


After allocating an address from a particular genpool,
there is no good way to verify if that address actually
belongs to a genpool. Introduce addr_in_gen_pool which
will return if an address plus size falls completely
within the genpool range.

Signed-off-by: Laura Abbott <[email protected]>
---
include/linux/genalloc.h | 3 +++
lib/genalloc.c | 29 +++++++++++++++++++++++++++++
2 files changed, 32 insertions(+)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 3cd0934..1ccaab4 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern struct gen_pool *devm_gen_pool_create(struct device *dev,
int min_alloc_order, int nid);
extern struct gen_pool *dev_get_gen_pool(struct device *dev);

+bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
+ size_t size);
+
#ifdef CONFIG_OF
extern struct gen_pool *of_get_named_gen_pool(struct device_node *np,
const char *propname, int index);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index c2b3ad7..6f03722 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -403,6 +403,35 @@ void gen_pool_for_each_chunk(struct gen_pool *pool,
EXPORT_SYMBOL(gen_pool_for_each_chunk);

/**
+ * addr_in_gen_pool - checks if an address falls within the range of a pool
+ * @pool: the generic memory pool
+ * @start: start address
+ * @size: size of the region
+ *
+ * Check if the range of addresses falls within the specified pool. Takes
+ * the rcu_read_lock for the duration of the check.
+ */
+bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
+ size_t size)
+{
+ bool found = false;
+ unsigned long end = start + size;
+ struct gen_pool_chunk *chunk;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(chunk, &(pool)->chunks, next_chunk) {
+ if (start >= chunk->start_addr && start <= chunk->end_addr) {
+ if (end <= chunk->end_addr) {
+ found = true;
+ break;
+ }
+ }
+ }
+ rcu_read_unlock();
+ return found;
+}
+
+/**
* gen_pool_avail - get available free space of the pool
* @pool: pool to get available free space
*
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-07-03 18:11:52

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCHv4 1/5] lib/genalloc.c: Add power aligned algorithm

On Wed, Jul 02, 2014 at 07:03:34PM +0100, Laura Abbott wrote:
>
> One of the more common algorithms used for allocation
> is to align the start address of the allocation to
> the order of size requested. Add this as an algorithm
> option for genalloc.
>
> Signed-off-by: Laura Abbott <[email protected]>
> ---
> include/linux/genalloc.h | 4 ++++
> lib/genalloc.c | 20 ++++++++++++++++++++
> 2 files changed, 24 insertions(+)
>
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index 1c2fdaa..3cd0934 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -110,6 +110,10 @@ extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo,
> extern unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
> unsigned long start, unsigned int nr, void *data);
>
> +extern unsigned long gen_pool_first_fit_order_align(unsigned long *map,
> + unsigned long size, unsigned long start, unsigned int nr,
> + void *data);
> +

You could also update gen_pool_first_fit to call this new function instead.

Anyway, that's up to you:

Acked-by: Will Deacon <[email protected]>

Will

2014-07-03 18:14:51

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCHv4 2/5] lib/genalloc.c: Add genpool range check function

On Wed, Jul 02, 2014 at 07:03:35PM +0100, Laura Abbott wrote:
>
> After allocating an address from a particular genpool,
> there is no good way to verify if that address actually
> belongs to a genpool. Introduce addr_in_gen_pool which
> will return if an address plus size falls completely
> within the genpool range.
>
> Signed-off-by: Laura Abbott <[email protected]>
> ---

Acked-by: Will Deacon <[email protected]>

Will

2014-07-04 13:35:37

by Thierry Reding

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On Wed, Jul 02, 2014 at 11:03:38AM -0700, Laura Abbott wrote:
[...]
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
[...]
> +static struct gen_pool *atomic_pool;
> +
> +#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
> +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;

There doesn't seem to be much use for this since it can't be overridden
via init_dma_coherent_pool_size like on ARM.

> +static int __free_from_pool(void *start, size_t size)
> +{
> + if (!__in_atomic_pool(start, size))
> + return 0;
> +
> + gen_pool_free(atomic_pool, (unsigned long)start, size);
> +
> + return 1;
> +}
> +
> +

There's a gratuituous blank line here.

> static void *__dma_alloc_coherent(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t flags,
> struct dma_attrs *attrs)
> @@ -53,7 +103,8 @@ static void *__dma_alloc_coherent(struct device *dev, size_t size,
> if (IS_ENABLED(CONFIG_ZONE_DMA) &&
> dev->coherent_dma_mask <= DMA_BIT_MASK(32))
> flags |= GFP_DMA;
> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
> +
> + if (!(flags & __GFP_WAIT) && IS_ENABLED(CONFIG_DMA_CMA)) {

I think the diff would be more readable here if this wasn't introducing
a blank linke and kept the IS_ENABLED() first.

> struct page *page;
>
> size = PAGE_ALIGN(size);
> @@ -73,50 +124,56 @@ static void __dma_free_coherent(struct device *dev, size_t size,
> void *vaddr, dma_addr_t dma_handle,
> struct dma_attrs *attrs)
> {
> + bool freed;
> + phys_addr_t paddr = dma_to_phys(dev, dma_handle);
> +
> if (dev == NULL) {
> WARN_ONCE(1, "Use an actual device structure for DMA allocation\n");
> return;
> }
>
> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
> - phys_addr_t paddr = dma_to_phys(dev, dma_handle);
>
> - dma_release_from_contiguous(dev,

The above leaves an unnecessary blank line in place.

> ptr = __dma_alloc_coherent(dev, size, dma_handle, flags, attrs);
> if (!ptr)
> goto no_mem;
> - map = kmalloc(sizeof(struct page *) << order, flags & ~GFP_DMA);
> - if (!map)
> - goto no_map;
>
> /* remove any dirty cache lines on the kernel alias */
> __dma_flush_range(ptr, ptr + size);
>
> +

Adds an unnecessary blank line.

> @@ -332,6 +391,67 @@ static struct notifier_block amba_bus_nb = {
>
> extern int swiotlb_late_init_with_default_size(size_t default_size);
>
> +static int __init atomic_pool_init(void)
> +{
> + pgprot_t prot = __pgprot(PROT_NORMAL_NC);
> + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
> + struct page *page;
> + void *addr;
> +
> +

There's another gratuituous blank line here...

> + if (dev_get_cma_area(NULL))
> + page = dma_alloc_from_contiguous(NULL, nr_pages,
> + get_order(atomic_pool_size));
> + else
> + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
> +
> +

and here.

> + if (page) {
> + int ret;
> +
> + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
> + if (!atomic_pool)
> + goto free_page;
> +
> + addr = dma_common_contiguous_remap(page, atomic_pool_size,
> + VM_USERMAP, prot, atomic_pool_init);
> +
> + if (!addr)
> + goto destroy_genpool;
> +
> + memset(addr, 0, atomic_pool_size);
> + __dma_flush_range(addr, addr + atomic_pool_size);
> +
> + ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr,
> + page_to_phys(page),
> + atomic_pool_size, -1);
> + if (ret)
> + goto remove_mapping;
> +
> + gen_pool_set_algo(atomic_pool,
> + gen_pool_first_fit_order_align, NULL);
> +
> + pr_info("DMA: preallocated %zd KiB pool for atomic allocations\n",

I think this should be "%zu" because atomic_pool_size is a size_t, not a
ssize_t.

> + atomic_pool_size / 1024);
> + return 0;
> + }
> + goto out;
> +
> +remove_mapping:
> + dma_common_free_remap(addr, atomic_pool_size, VM_USERMAP);
> +destroy_genpool:
> + gen_pool_destroy(atomic_pool);
> + atomic_pool == NULL;

This probably doesn't belong here.

> +free_page:
> + if (!dma_release_from_contiguous(NULL, page, nr_pages))
> + __free_pages(page, get_order(atomic_pool_size));

You use get_order(atomic_pool_size) a lot, perhaps it should be a
temporary variable?

> +out:
> + pr_err("DMA: failed to allocate %zx KiB pool for atomic coherent allocation\n",
> + atomic_pool_size / 1024);

Print in decimal rather than hexadecimal?

Thierry


Attachments:
(No filename) (4.20 kB)
(No filename) (819.00 B)
Download all attachments

2014-07-04 13:43:07

by Thierry Reding

[permalink] [raw]
Subject: Re: [PATCHv4 4/5] arm: use genalloc for the atomic pool

On Wed, Jul 02, 2014 at 11:03:37AM -0700, Laura Abbott wrote:
[...]
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
[...]
> index f5190ac..02a1939 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -26,6 +26,7 @@
> #include <linux/io.h>
> #include <linux/vmalloc.h>
> #include <linux/sizes.h>
> +#include <linux/genalloc.h>

Includes should be sorted alphabetically. I realize that's not the case
for this particular file, but the downside of that is that your patch no
longer applies cleanly on top of linux-next because some other patch did
add linux/cma.h at the same location.

> static int __init early_coherent_pool(char *p)
> {
> - atomic_pool.size = memparse(p, &p);
> + atomic_pool_size = memparse(p, &p);
> return 0;
> }
> early_param("coherent_pool", early_coherent_pool);
>
> +

There's a gratuituous blank line her.

I also need the below hunk on top of you patch to make this compile on
ARM.

Thierry

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b323032f0850..460aaf965a87 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1250,11 +1250,13 @@ static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t si

static struct page **__atomic_get_pages(void *addr)
{
- struct dma_pool *pool = &atomic_pool;
- struct page **pages = pool->pages;
- int offs = (addr - pool->vaddr) >> PAGE_SHIFT;
+ struct page *page;
+ phys_addr_t phys;
+
+ phys = gen_pool_virt_to_phys(atomic_pool, (unsigned long)addr);
+ page = phys_to_page(phys);

- return pages + offs;
+ return (struct page **)page;
}

static struct page **__iommu_get_pages(void *cpu_addr, struct dma_attrs *attrs)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a2487f12b2fc..764f53565958 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -441,7 +441,6 @@ remove_mapping:
dma_common_free_remap(addr, atomic_pool_size, VM_USERMAP);
destroy_genpool:
gen_pool_destroy(atomic_pool);
- atomic_pool == NULL;
free_page:
if (!dma_release_from_contiguous(NULL, page, nr_pages))
__free_pages(page, get_order(atomic_pool_size));


Attachments:
(No filename) (2.14 kB)
(No filename) (819.00 B)
Download all attachments

2014-07-09 22:33:21

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCHv4 2/5] lib/genalloc.c: Add genpool range check function

On Wed, Jul 2, 2014 at 11:03 AM, Laura Abbott <[email protected]> wrote:
>
> After allocating an address from a particular genpool,
> there is no good way to verify if that address actually
> belongs to a genpool. Introduce addr_in_gen_pool which
> will return if an address plus size falls completely
> within the genpool range.
>
> Signed-off-by: Laura Abbott <[email protected]>

Reviewed-by: Olof Johansson <[email protected]>

What's the merge path for this code? Part of the arm64 code that needs
it, I presume?


-Olof

2014-07-09 22:35:23

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCHv4 1/5] lib/genalloc.c: Add power aligned algorithm

On Thu, Jul 3, 2014 at 11:10 AM, Will Deacon <[email protected]> wrote:
> On Wed, Jul 02, 2014 at 07:03:34PM +0100, Laura Abbott wrote:
>>
>> One of the more common algorithms used for allocation
>> is to align the start address of the allocation to
>> the order of size requested. Add this as an algorithm
>> option for genalloc.
>>
>> Signed-off-by: Laura Abbott <[email protected]>
>> ---
>> include/linux/genalloc.h | 4 ++++
>> lib/genalloc.c | 20 ++++++++++++++++++++
>> 2 files changed, 24 insertions(+)
>>
>> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
>> index 1c2fdaa..3cd0934 100644
>> --- a/include/linux/genalloc.h
>> +++ b/include/linux/genalloc.h
>> @@ -110,6 +110,10 @@ extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo,
>> extern unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
>> unsigned long start, unsigned int nr, void *data);
>>
>> +extern unsigned long gen_pool_first_fit_order_align(unsigned long *map,
>> + unsigned long size, unsigned long start, unsigned int nr,
>> + void *data);
>> +
>
> You could also update gen_pool_first_fit to call this new function instead.

+1, that'd be slightly nicer and remove one exported symbol.

But, as Will says, up to you. Either option:

Acked-by: Olof Johansson <[email protected]>

2014-07-09 22:46:58

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCHv4 3/5] common: dma-mapping: Introduce common remapping functions

On Wed, Jul 2, 2014 at 11:03 AM, Laura Abbott <[email protected]> wrote:
>
> For architectures without coherent DMA, memory for DMA may
> need to be remapped with coherent attributes. Factor out
> the the remapping code from arm and put it in a
> common location to reduced code duplication.
>
> Signed-off-by: Laura Abbott <[email protected]>

Hm. The switch from ioremap to map_vm_area() here seems to imply that
lib/ioremap can/should be reworked to use just wrap the vmalloc
functions too?

Unrelated to this change.

I did a pass of review here. Nothing stands out as wrong but I don't
claim to know this area well these days.

What's the merge/ack plan here? It might reduce the complexity of
merging to add the common functions in your series, then move the ARM
code over separately?


-Olof

2014-07-18 13:44:16

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On Wed, Jul 02, 2014 at 07:03:38PM +0100, Laura Abbott wrote:
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 4164c5a..a2487f1 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
[...]
> static void *__dma_alloc_coherent(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t flags,
> struct dma_attrs *attrs)
> @@ -53,7 +103,8 @@ static void *__dma_alloc_coherent(struct device *dev, size_t size,
> if (IS_ENABLED(CONFIG_ZONE_DMA) &&
> dev->coherent_dma_mask <= DMA_BIT_MASK(32))
> flags |= GFP_DMA;
> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
> +
> + if (!(flags & __GFP_WAIT) && IS_ENABLED(CONFIG_DMA_CMA)) {
> struct page *page;
>
> size = PAGE_ALIGN(size);

I think that's the wrong condition here. You want to use CMA if
(flags & __GFP_WAIT). CMA does not support atomic allocations so it can
fall back to swiotlb_alloc_coherent().

> @@ -73,50 +124,56 @@ static void __dma_free_coherent(struct device *dev, size_t size,
> void *vaddr, dma_addr_t dma_handle,
> struct dma_attrs *attrs)
> {
> + bool freed;
> + phys_addr_t paddr = dma_to_phys(dev, dma_handle);
> +
> if (dev == NULL) {
> WARN_ONCE(1, "Use an actual device structure for DMA allocation\n");
> return;
> }
>
> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
> - phys_addr_t paddr = dma_to_phys(dev, dma_handle);
>
> - dma_release_from_contiguous(dev,
> + freed = dma_release_from_contiguous(dev,
> phys_to_page(paddr),
> size >> PAGE_SHIFT);
> - } else {
> + if (!freed)
> swiotlb_free_coherent(dev, size, vaddr, dma_handle);
> - }
> }

Is __dma_free_coherent() ever called in atomic context? If yes, the
dma_release_from_contiguous() may not like it since it tries to acquire
a mutex. But since we don't have the gfp flags here, we don't have an
easy way to know what to call.

So the initial idea of always calling __alloc_from_pool() for both
coherent/non-coherent cases would work better (but still with a single
shared pool, see below).

> static void *__dma_alloc_noncoherent(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t flags,
> struct dma_attrs *attrs)
> {
> - struct page *page, **map;
> + struct page *page;
> void *ptr, *coherent_ptr;
> - int order, i;
>
> size = PAGE_ALIGN(size);
> - order = get_order(size);
> +
> + if (!(flags & __GFP_WAIT)) {
> + struct page *page = NULL;
> + void *addr = __alloc_from_pool(size, &page);
> +
> + if (addr)
> + *dma_handle = phys_to_dma(dev, page_to_phys(page));
> +
> + return addr;
> +
> + }

If we do the above for the __dma_alloc_coherent() case, we could use the
same pool but instead of returning addr it could just return
page_address(page). The downside of sharing the pool is that you need
cache flushing for every allocation (which we already do for the
non-atomic case).

> @@ -332,6 +391,67 @@ static struct notifier_block amba_bus_nb = {
>
> extern int swiotlb_late_init_with_default_size(size_t default_size);
>
> +static int __init atomic_pool_init(void)
> +{
> + pgprot_t prot = __pgprot(PROT_NORMAL_NC);
> + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
> + struct page *page;
> + void *addr;
> +
> +
> + if (dev_get_cma_area(NULL))

Is it worth using this condition for other places where we check
IS_ENABLED(CONFIG_DMA_CMA) (maybe as a separate patch).

> + page = dma_alloc_from_contiguous(NULL, nr_pages,
> + get_order(atomic_pool_size));
> + else
> + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));

One problem here is that the atomic pool wouldn't be able to honour
GFP_DMA (in the latest kernel, CMA is by default in ZONE_DMA). You
should probably pass GFP_KERNEL|GFP_DMA here. You could also use the
swiotlb_alloc_coherent() which, with a NULL dev, assumes 32-bit DMA mask
but it still expects GFP_DMA to be passed.

> + if (page) {
> + int ret;
> +
> + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
> + if (!atomic_pool)
> + goto free_page;
> +
> + addr = dma_common_contiguous_remap(page, atomic_pool_size,
> + VM_USERMAP, prot, atomic_pool_init);
> +
> + if (!addr)
> + goto destroy_genpool;
> +
> + memset(addr, 0, atomic_pool_size);
> + __dma_flush_range(addr, addr + atomic_pool_size);

If you add the flushing in the __dma_alloc_noncoherent(), it won't be
needed here (of course, more efficient here but it would not work if we
share the pool).

> +postcore_initcall(atomic_pool_init);

Why not arch_initcall? Or even better, we could have a common DMA init
function that calls swiotlb_late_init() and atomic_pool_init() (in this
order if you decide to use swiotlb allocation above).

--
Catalin

2014-07-18 13:54:14

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 3/5] common: dma-mapping: Introduce common remapping functions

On Wed, Jul 02, 2014 at 07:03:36PM +0100, Laura Abbott wrote:
> +void *dma_common_pages_remap(struct page **pages, size_t size,
> + unsigned long vm_flags, pgprot_t prot,
> + const void *caller)
> +{
> + struct vm_struct *area;
> +
> + area = get_vm_area_caller(size, vm_flags, caller);
> + if (!area)
> + return NULL;
> +
> + if (map_vm_area(area, prot, &pages)) {
> + vunmap(area->addr);
> + return NULL;
> + }
> +
> + return area->addr;
> +}

Why not just replace this function with vmap()? It is nearly identical.

--
Catalin

2014-07-18 14:14:05

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 3/5] common: dma-mapping: Introduce common remapping functions

On Wed, Jul 09, 2014 at 11:46:56PM +0100, Olof Johansson wrote:
> On Wed, Jul 2, 2014 at 11:03 AM, Laura Abbott <[email protected]> wrote:
> > For architectures without coherent DMA, memory for DMA may
> > need to be remapped with coherent attributes. Factor out
> > the the remapping code from arm and put it in a
> > common location to reduced code duplication.
> >
> > Signed-off-by: Laura Abbott <[email protected]>
>
> Hm. The switch from ioremap to map_vm_area() here seems to imply that
> lib/ioremap can/should be reworked to use just wrap the vmalloc
> functions too?

ioremap_page_range() does not require the allocation of a map page array
and assumes that the mapped memory is physically contiguous. This would
be more efficient than the vmap() implementation which is generic enough
to allow non-contiguous memory allocations.

At some point, we'll have to support SMMU on arm64 and we can have 4
combinations of coherency and IOMMU. When an IOMMU is present, we don't
require physically contiguous memory but we may require non-cacheable
mappings, in which case vmap comes in handy.

--
Catalin

2014-07-21 19:33:51

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCHv4 3/5] common: dma-mapping: Introduce common remapping functions

On 7/18/2014 6:53 AM, Catalin Marinas wrote:
> On Wed, Jul 02, 2014 at 07:03:36PM +0100, Laura Abbott wrote:
>> +void *dma_common_pages_remap(struct page **pages, size_t size,
>> + unsigned long vm_flags, pgprot_t prot,
>> + const void *caller)
>> +{
>> + struct vm_struct *area;
>> +
>> + area = get_vm_area_caller(size, vm_flags, caller);
>> + if (!area)
>> + return NULL;
>> +
>> + if (map_vm_area(area, prot, &pages)) {
>> + vunmap(area->addr);
>> + return NULL;
>> + }
>> +
>> + return area->addr;
>> +}
>
> Why not just replace this function with vmap()? It is nearly identical.
>

With this version, the caller stored and printed via /proc/vmallocinfo
is the actual caller of the DMA API whereas if we just call vmap we
don't get any useful caller information. Going to vmap would change
the existing behavior on ARM so it seems unwise to switch. Another
option is to move this into vmalloc.c and add vmap_caller.

Thanks,
Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-07-21 19:51:08

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCHv4 2/5] lib/genalloc.c: Add genpool range check function

On 7/9/2014 3:33 PM, Olof Johansson wrote:
> On Wed, Jul 2, 2014 at 11:03 AM, Laura Abbott <[email protected]> wrote:
>>
>> After allocating an address from a particular genpool,
>> there is no good way to verify if that address actually
>> belongs to a genpool. Introduce addr_in_gen_pool which
>> will return if an address plus size falls completely
>> within the genpool range.
>>
>> Signed-off-by: Laura Abbott <[email protected]>
>
> Reviewed-by: Olof Johansson <[email protected]>
>
> What's the merge path for this code? Part of the arm64 code that needs
> it, I presume?
>

My plan was to have the entire series go through the arm64 tree unless
someone has a better idea.

Thanks,
Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-07-21 21:22:36

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCHv4 4/5] arm: use genalloc for the atomic pool

On 7/4/2014 6:42 AM, Thierry Reding wrote:
> On Wed, Jul 02, 2014 at 11:03:37AM -0700, Laura Abbott wrote:
> [...]
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> [...]
>> index f5190ac..02a1939 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -26,6 +26,7 @@
>> #include <linux/io.h>
>> #include <linux/vmalloc.h>
>> #include <linux/sizes.h>
>> +#include <linux/genalloc.h>
>
> Includes should be sorted alphabetically. I realize that's not the case
> for this particular file, but the downside of that is that your patch no
> longer applies cleanly on top of linux-next because some other patch did
> add linux/cma.h at the same location.
>

Yes, I'll fix that up. I'll put genalloc.h before gfp.h.

>> static int __init early_coherent_pool(char *p)
>> {
>> - atomic_pool.size = memparse(p, &p);
>> + atomic_pool_size = memparse(p, &p);
>> return 0;
>> }
>> early_param("coherent_pool", early_coherent_pool);
>>
>> +
>
> There's a gratuituous blank line her.
>
> I also need the below hunk on top of you patch to make this compile on
> ARM.
>

Yes, that does indeed need to be fixed up.

Thanks,
Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-07-21 22:00:55

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On 7/4/2014 6:35 AM, Thierry Reding wrote:
> On Wed, Jul 02, 2014 at 11:03:38AM -0700, Laura Abbott wrote:
> [...]
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> [...]
>> +static struct gen_pool *atomic_pool;
>> +
>> +#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
>> +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
>
> There doesn't seem to be much use for this since it can't be overridden
> via init_dma_coherent_pool_size like on ARM.
>

There is still the command line option coherent_pool=<size> though

[...]
>> + if (page) {
>> + int ret;
>> +
>> + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
>> + if (!atomic_pool)
>> + goto free_page;
>> +
>> + addr = dma_common_contiguous_remap(page, atomic_pool_size,
>> + VM_USERMAP, prot, atomic_pool_init);
>> +
>> + if (!addr)
>> + goto destroy_genpool;
>> +
>> + memset(addr, 0, atomic_pool_size);
>> + __dma_flush_range(addr, addr + atomic_pool_size);
>> +
>> + ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr,
>> + page_to_phys(page),
>> + atomic_pool_size, -1);
>> + if (ret)
>> + goto remove_mapping;
>> +
>> + gen_pool_set_algo(atomic_pool,
>> + gen_pool_first_fit_order_align, NULL);
>> +
>> + pr_info("DMA: preallocated %zd KiB pool for atomic allocations\n",
>
> I think this should be "%zu" because atomic_pool_size is a size_t, not a
> ssize_t.
>

Yes, will fix.

>> + atomic_pool_size / 1024);
>> + return 0;
>> + }
>> + goto out;
>> +
>> +remove_mapping:
>> + dma_common_free_remap(addr, atomic_pool_size, VM_USERMAP);
>> +destroy_genpool:
>> + gen_pool_destroy(atomic_pool);
>> + atomic_pool == NULL;
>
> This probably doesn't belong here.
>

Dastardly typo.

>> +free_page:
>> + if (!dma_release_from_contiguous(NULL, page, nr_pages))
>> + __free_pages(page, get_order(atomic_pool_size));
>
> You use get_order(atomic_pool_size) a lot, perhaps it should be a
> temporary variable?
>

Yes, three usages is probably enough.

>> +out:
>> + pr_err("DMA: failed to allocate %zx KiB pool for atomic coherent allocation\n",
>> + atomic_pool_size / 1024);
>
> Print in decimal rather than hexadecimal?
>

I actually prefer hexadecimal but I should at least be consistent between
error and non-error paths.

> Thierry
>

Thanks,
Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-07-21 22:36:53

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On 7/18/2014 6:43 AM, Catalin Marinas wrote:
> On Wed, Jul 02, 2014 at 07:03:38PM +0100, Laura Abbott wrote:
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index 4164c5a..a2487f1 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
> [...]
>> static void *__dma_alloc_coherent(struct device *dev, size_t size,
>> dma_addr_t *dma_handle, gfp_t flags,
>> struct dma_attrs *attrs)
>> @@ -53,7 +103,8 @@ static void *__dma_alloc_coherent(struct device *dev, size_t size,
>> if (IS_ENABLED(CONFIG_ZONE_DMA) &&
>> dev->coherent_dma_mask <= DMA_BIT_MASK(32))
>> flags |= GFP_DMA;
>> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
>> +
>> + if (!(flags & __GFP_WAIT) && IS_ENABLED(CONFIG_DMA_CMA)) {
>> struct page *page;
>>
>> size = PAGE_ALIGN(size);
>
> I think that's the wrong condition here. You want to use CMA if
> (flags & __GFP_WAIT). CMA does not support atomic allocations so it can
> fall back to swiotlb_alloc_coherent().
>
>> @@ -73,50 +124,56 @@ static void __dma_free_coherent(struct device *dev, size_t size,
>> void *vaddr, dma_addr_t dma_handle,
>> struct dma_attrs *attrs)
>> {
>> + bool freed;
>> + phys_addr_t paddr = dma_to_phys(dev, dma_handle);
>> +
>> if (dev == NULL) {
>> WARN_ONCE(1, "Use an actual device structure for DMA allocation\n");
>> return;
>> }
>>
>> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
>> - phys_addr_t paddr = dma_to_phys(dev, dma_handle);
>>
>> - dma_release_from_contiguous(dev,
>> + freed = dma_release_from_contiguous(dev,
>> phys_to_page(paddr),
>> size >> PAGE_SHIFT);
>> - } else {
>> + if (!freed)
>> swiotlb_free_coherent(dev, size, vaddr, dma_handle);
>> - }
>> }
>
> Is __dma_free_coherent() ever called in atomic context? If yes, the
> dma_release_from_contiguous() may not like it since it tries to acquire
> a mutex. But since we don't have the gfp flags here, we don't have an
> easy way to know what to call.
>
> So the initial idea of always calling __alloc_from_pool() for both
> coherent/non-coherent cases would work better (but still with a single
> shared pool, see below).
>

We should be okay

__dma_free_coherent -> dma_release_from_contiguous -> cma_release which
bounds checks the CMA region before taking any mutexes unless I missed
something.

The existing behavior on arm is to not allow non-atomic allocations to be
freed atomic context when CMA is enabled so we'd be giving arm64 more
leeway there. Is being able to free non-atomic allocations in atomic
context really necessary?

>> static void *__dma_alloc_noncoherent(struct device *dev, size_t size,
>> dma_addr_t *dma_handle, gfp_t flags,
>> struct dma_attrs *attrs)
>> {
>> - struct page *page, **map;
>> + struct page *page;
>> void *ptr, *coherent_ptr;
>> - int order, i;
>>
>> size = PAGE_ALIGN(size);
>> - order = get_order(size);
>> +
>> + if (!(flags & __GFP_WAIT)) {
>> + struct page *page = NULL;
>> + void *addr = __alloc_from_pool(size, &page);
>> +
>> + if (addr)
>> + *dma_handle = phys_to_dma(dev, page_to_phys(page));
>> +
>> + return addr;
>> +
>> + }
>
> If we do the above for the __dma_alloc_coherent() case, we could use the
> same pool but instead of returning addr it could just return
> page_address(page). The downside of sharing the pool is that you need
> cache flushing for every allocation (which we already do for the
> non-atomic case).
>
>> @@ -332,6 +391,67 @@ static struct notifier_block amba_bus_nb = {
>>
>> extern int swiotlb_late_init_with_default_size(size_t default_size);
>>
>> +static int __init atomic_pool_init(void)
>> +{
>> + pgprot_t prot = __pgprot(PROT_NORMAL_NC);
>> + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
>> + struct page *page;
>> + void *addr;
>> +
>> +
>> + if (dev_get_cma_area(NULL))
>
> Is it worth using this condition for other places where we check
> IS_ENABLED(CONFIG_DMA_CMA) (maybe as a separate patch).
>

Yes, it would be good to match arm in that respect.

>> + page = dma_alloc_from_contiguous(NULL, nr_pages,
>> + get_order(atomic_pool_size));
>> + else
>> + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
>
> One problem here is that the atomic pool wouldn't be able to honour
> GFP_DMA (in the latest kernel, CMA is by default in ZONE_DMA). You
> should probably pass GFP_KERNEL|GFP_DMA here. You could also use the
> swiotlb_alloc_coherent() which, with a NULL dev, assumes 32-bit DMA mask
> but it still expects GFP_DMA to be passed.
>

I think I missed updating this to GFP_DMA. The only advantage I would see
to using swiotlb_alloc_coherent vs. alloc_pages directly would be to
allow the fallback to using a bounce buffer if __get_free_pages failed.
I'll keep this as alloc_pages for now; it can be changed later if there
is a particular need for swiotlb behavior.

>> + if (page) {
>> + int ret;
>> +
>> + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
>> + if (!atomic_pool)
>> + goto free_page;
>> +
>> + addr = dma_common_contiguous_remap(page, atomic_pool_size,
>> + VM_USERMAP, prot, atomic_pool_init);
>> +
>> + if (!addr)
>> + goto destroy_genpool;
>> +
>> + memset(addr, 0, atomic_pool_size);
>> + __dma_flush_range(addr, addr + atomic_pool_size);
>
> If you add the flushing in the __dma_alloc_noncoherent(), it won't be
> needed here (of course, more efficient here but it would not work if we
> share the pool).
>
>> +postcore_initcall(atomic_pool_init);
>
> Why not arch_initcall? Or even better, we could have a common DMA init
> function that calls swiotlb_late_init() and atomic_pool_init() (in this
> order if you decide to use swiotlb allocation above).
>

Good point. I'll combine the two.

Thanks,
Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-07-22 15:50:42

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 2/5] lib/genalloc.c: Add genpool range check function

On Mon, Jul 21, 2014 at 08:51:04PM +0100, Laura Abbott wrote:
> On 7/9/2014 3:33 PM, Olof Johansson wrote:
> > On Wed, Jul 2, 2014 at 11:03 AM, Laura Abbott <[email protected]> wrote:
> >>
> >> After allocating an address from a particular genpool,
> >> there is no good way to verify if that address actually
> >> belongs to a genpool. Introduce addr_in_gen_pool which
> >> will return if an address plus size falls completely
> >> within the genpool range.
> >>
> >> Signed-off-by: Laura Abbott <[email protected]>
> >
> > Reviewed-by: Olof Johansson <[email protected]>
> >
> > What's the merge path for this code? Part of the arm64 code that needs
> > it, I presume?
>
> My plan was to have the entire series go through the arm64 tree unless
> someone has a better idea.

It's fine by me. But since it touches core arch/arm code, I would like
to see an Ack from Russell.

--
Catalin

2014-07-22 15:56:42

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On Mon, Jul 21, 2014 at 11:36:49PM +0100, Laura Abbott wrote:
> On 7/18/2014 6:43 AM, Catalin Marinas wrote:
> > On Wed, Jul 02, 2014 at 07:03:38PM +0100, Laura Abbott wrote:
> >> @@ -73,50 +124,56 @@ static void __dma_free_coherent(struct device *dev, size_t size,
> >> void *vaddr, dma_addr_t dma_handle,
> >> struct dma_attrs *attrs)
> >> {
> >> + bool freed;
> >> + phys_addr_t paddr = dma_to_phys(dev, dma_handle);
> >> +
> >> if (dev == NULL) {
> >> WARN_ONCE(1, "Use an actual device structure for DMA allocation\n");
> >> return;
> >> }
> >>
> >> - if (IS_ENABLED(CONFIG_DMA_CMA)) {
> >> - phys_addr_t paddr = dma_to_phys(dev, dma_handle);
> >>
> >> - dma_release_from_contiguous(dev,
> >> + freed = dma_release_from_contiguous(dev,
> >> phys_to_page(paddr),
> >> size >> PAGE_SHIFT);
> >> - } else {
> >> + if (!freed)
> >> swiotlb_free_coherent(dev, size, vaddr, dma_handle);
> >> - }
> >> }
> >
> > Is __dma_free_coherent() ever called in atomic context? If yes, the
> > dma_release_from_contiguous() may not like it since it tries to acquire
> > a mutex. But since we don't have the gfp flags here, we don't have an
> > easy way to know what to call.
> >
> > So the initial idea of always calling __alloc_from_pool() for both
> > coherent/non-coherent cases would work better (but still with a single
> > shared pool, see below).
>
> We should be okay
>
> __dma_free_coherent -> dma_release_from_contiguous -> cma_release which
> bounds checks the CMA region before taking any mutexes unless I missed
> something.

Ah, good point. I missed the pfn range check in
dma_release_from_contiguous.

> The existing behavior on arm is to not allow non-atomic allocations to be
> freed atomic context when CMA is enabled so we'd be giving arm64 more
> leeway there. Is being able to free non-atomic allocations in atomic
> context really necessary?

No. I was worried that an atomic coherent allocation (falling back to
swiotlb) would trigger some CMA mutex in atomic context on the freeing
path. But you are right, it shouldn't happen.

> >> + page = dma_alloc_from_contiguous(NULL, nr_pages,
> >> + get_order(atomic_pool_size));
> >> + else
> >> + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
> >
> > One problem here is that the atomic pool wouldn't be able to honour
> > GFP_DMA (in the latest kernel, CMA is by default in ZONE_DMA). You
> > should probably pass GFP_KERNEL|GFP_DMA here. You could also use the
> > swiotlb_alloc_coherent() which, with a NULL dev, assumes 32-bit DMA mask
> > but it still expects GFP_DMA to be passed.
> >
>
> I think I missed updating this to GFP_DMA. The only advantage I would see
> to using swiotlb_alloc_coherent vs. alloc_pages directly would be to
> allow the fallback to using a bounce buffer if __get_free_pages failed.
> I'll keep this as alloc_pages for now; it can be changed later if there
> is a particular need for swiotlb behavior.

That's fine. Since we don't have a device at this point, I don't see how
swiotlb could fall back to the bounce buffer.

Thanks.

--
Catalin

2014-07-22 16:04:43

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 3/5] common: dma-mapping: Introduce common remapping functions

On Mon, Jul 21, 2014 at 08:33:48PM +0100, Laura Abbott wrote:
> On 7/18/2014 6:53 AM, Catalin Marinas wrote:
> > On Wed, Jul 02, 2014 at 07:03:36PM +0100, Laura Abbott wrote:
> >> +void *dma_common_pages_remap(struct page **pages, size_t size,
> >> + unsigned long vm_flags, pgprot_t prot,
> >> + const void *caller)
> >> +{
> >> + struct vm_struct *area;
> >> +
> >> + area = get_vm_area_caller(size, vm_flags, caller);
> >> + if (!area)
> >> + return NULL;
> >> +
> >> + if (map_vm_area(area, prot, &pages)) {
> >> + vunmap(area->addr);
> >> + return NULL;
> >> + }
> >> +
> >> + return area->addr;
> >> +}
> >
> > Why not just replace this function with vmap()? It is nearly identical.
>
> With this version, the caller stored and printed via /proc/vmallocinfo
> is the actual caller of the DMA API whereas if we just call vmap we
> don't get any useful caller information. Going to vmap would change
> the existing behavior on ARM so it seems unwise to switch.

OK.

> Another option is to move this into vmalloc.c and add vmap_caller.

Maybe as a subsequent clean-up (once this series gets merged).

--
Catalin

2014-07-22 18:07:32

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On Wednesday 02 July 2014, Laura Abbott wrote:
> + pgprot_t prot = __pgprot(PROT_NORMAL_NC);
> + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
> + struct page *page;
> + void *addr;
> +
> +
> + if (dev_get_cma_area(NULL))
> + page = dma_alloc_from_contiguous(NULL, nr_pages,
> + get_order(atomic_pool_size));
> + else
> + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
> +
> +
> + if (page) {
> + int ret;
> +
> + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
> + if (!atomic_pool)
> + goto free_page;
> +
> + addr = dma_common_contiguous_remap(page, atomic_pool_size,
> + VM_USERMAP, prot, atomic_pool_init);
> +

I just stumbled over this thread and noticed the code here: When you do
alloc_pages() above, you actually get pages that are already mapped into
the linear kernel mapping as cacheable pages. Your new
dma_common_contiguous_remap tries to map them as noncacheable. This
seems broken because it allows the CPU to treat both mappings as
cacheable, and that won't be coherent with device DMA.

> + if (!addr)
> + goto destroy_genpool;
> +
> + memset(addr, 0, atomic_pool_size);
> + __dma_flush_range(addr, addr + atomic_pool_size);

It also seems weird to flush the cache on a virtual address of
an uncacheable mapping. Is that well-defined? In the CMA case, the
original mapping should already be uncached here, so you don't need
to flush it. In the alloc_pages() case, I think you need to unmap
the pages from the linear mapping instead.

Arnd

2014-07-22 21:04:22

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On Tue, Jul 22, 2014 at 07:06:44PM +0100, Arnd Bergmann wrote:
> On Wednesday 02 July 2014, Laura Abbott wrote:
> > + pgprot_t prot = __pgprot(PROT_NORMAL_NC);
> > + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
> > + struct page *page;
> > + void *addr;
> > +
> > +
> > + if (dev_get_cma_area(NULL))
> > + page = dma_alloc_from_contiguous(NULL, nr_pages,
> > + get_order(atomic_pool_size));
> > + else
> > + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
> > +
> > +
> > + if (page) {
> > + int ret;
> > +
> > + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
> > + if (!atomic_pool)
> > + goto free_page;
> > +
> > + addr = dma_common_contiguous_remap(page, atomic_pool_size,
> > + VM_USERMAP, prot, atomic_pool_init);
> > +
>
> I just stumbled over this thread and noticed the code here: When you do
> alloc_pages() above, you actually get pages that are already mapped into
> the linear kernel mapping as cacheable pages. Your new
> dma_common_contiguous_remap tries to map them as noncacheable. This
> seems broken because it allows the CPU to treat both mappings as
> cacheable, and that won't be coherent with device DMA.

It does *not* allow the CPU to treat both as cacheable. It treats the
non-cacheable mapping as non-cacheable (and the cacheable one as
cacheable). The only requirements the ARM ARM makes in this situation
(B2.9 point 5 in the ARMv8 ARM):

- Before writing to a location not using the Write-Back attribute,
software must invalidate, or clean, a location from the caches if any
agent might have written to the location with the Write-Back
attribute. This avoids the possibility of overwriting the location
with stale data.
- After writing to a location with the Write-Back attribute, software
must clean the location from the caches, to make the write visible to
external memory.
- Before reading the location with a cacheable attribute, software must
invalidate the location from the caches, to ensure that any value held
in the caches reflects the last value made visible in external memory.

So we as long as the CPU accesses such memory only via the non-cacheable
mapping, the only requirement is to flush the cache so that there are no
dirty lines that could be evicted.

(if the mismatched attributes were for example Normal vs Device, the
Device guarantees would be lost but in the cacheable vs non-cacheable
it's not too bad; same for ARMv7).

> > + if (!addr)
> > + goto destroy_genpool;
> > +
> > + memset(addr, 0, atomic_pool_size);
> > + __dma_flush_range(addr, addr + atomic_pool_size);
>
> It also seems weird to flush the cache on a virtual address of
> an uncacheable mapping. Is that well-defined?

Yes. According to D5.8.1 (Data and unified caches), "if cache
maintenance is performed on a memory location, the effect of that cache
maintenance is visible to all aliases of that physical memory location.
These properties are consistent with implementing all caches that can
handle data accesses as Physically-indexed, physically-tagged (PIPT)
caches".

> In the CMA case, the
> original mapping should already be uncached here, so you don't need
> to flush it.

I don't think it is non-cacheable already, at least not for arm64 (CMA
can be used on coherent architectures as well).

> In the alloc_pages() case, I think you need to unmap
> the pages from the linear mapping instead.

Even if unmapped, it would not remove dirty cache lines (which are
associated with physical addresses anyway). But we don't need to worry
about unmapping anyway, see above (that's unless we find some
architecture implementation where having such cacheable/non-cacheable
aliases is not efficient enough, the efficiency is not guaranteed by the
ARM ARM, just the correct behaviour).

--
Catalin

2014-07-22 23:51:09

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On 7/22/2014 2:03 PM, Catalin Marinas wrote:
> On Tue, Jul 22, 2014 at 07:06:44PM +0100, Arnd Bergmann wrote:
[...]
>>> + if (!addr)
>>> + goto destroy_genpool;
>>> +
>>> + memset(addr, 0, atomic_pool_size);
>>> + __dma_flush_range(addr, addr + atomic_pool_size);
>>
>> It also seems weird to flush the cache on a virtual address of
>> an uncacheable mapping. Is that well-defined?
>
> Yes. According to D5.8.1 (Data and unified caches), "if cache
> maintenance is performed on a memory location, the effect of that cache
> maintenance is visible to all aliases of that physical memory location.
> These properties are consistent with implementing all caches that can
> handle data accesses as Physically-indexed, physically-tagged (PIPT)
> caches".
>

This was actually unintentional on my part. I'm going to clean this up
to flush via the existing cached mapping to make it clearer what's going
on.

>> In the CMA case, the
>> original mapping should already be uncached here, so you don't need
>> to flush it.
>
> I don't think it is non-cacheable already, at least not for arm64 (CMA
> can be used on coherent architectures as well).
>

Memory allocated via dma_alloc_from_contiguous is not guaranteed to be
uncached. On arm, we allocate the page of memory and the remap it as
appropriate.

>> In the alloc_pages() case, I think you need to unmap
>> the pages from the linear mapping instead.
>
> Even if unmapped, it would not remove dirty cache lines (which are
> associated with physical addresses anyway). But we don't need to worry
> about unmapping anyway, see above (that's unless we find some
> architecture implementation where having such cacheable/non-cacheable
> aliases is not efficient enough, the efficiency is not guaranteed by the
> ARM ARM, just the correct behaviour).
>

Let's hope that never happens.

Thanks,
Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-07-23 11:13:37

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] arm64: Add atomic pool for non-coherent and CMA allocations.

On Tuesday 22 July 2014 22:03:52 Catalin Marinas wrote:
> On Tue, Jul 22, 2014 at 07:06:44PM +0100, Arnd Bergmann wrote:
> > On Wednesday 02 July 2014, Laura Abbott wrote:
> > > + pgprot_t prot = __pgprot(PROT_NORMAL_NC);
> > > + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
> > > + struct page *page;
> > > + void *addr;
> > > +
> > > +
> > > + if (dev_get_cma_area(NULL))
> > > + page = dma_alloc_from_contiguous(NULL, nr_pages,
> > > + get_order(atomic_pool_size));
> > > + else
> > > + page = alloc_pages(GFP_KERNEL, get_order(atomic_pool_size));
> > > +
> > > +
> > > + if (page) {
> > > + int ret;
> > > +
> > > + atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
> > > + if (!atomic_pool)
> > > + goto free_page;
> > > +
> > > + addr = dma_common_contiguous_remap(page, atomic_pool_size,
> > > + VM_USERMAP, prot, atomic_pool_init);
> > > +
> >
> > I just stumbled over this thread and noticed the code here: When you do
> > alloc_pages() above, you actually get pages that are already mapped into
> > the linear kernel mapping as cacheable pages. Your new
> > dma_common_contiguous_remap tries to map them as noncacheable. This
> > seems broken because it allows the CPU to treat both mappings as
> > cacheable, and that won't be coherent with device DMA.
>
> It does *not* allow the CPU to treat both as cacheable. It treats the
> non-cacheable mapping as non-cacheable (and the cacheable one as
> cacheable). The only requirements the ARM ARM makes in this situation
> (B2.9 point 5 in the ARMv8 ARM):
>
> - Before writing to a location not using the Write-Back attribute,
> software must invalidate, or clean, a location from the caches if any
> agent might have written to the location with the Write-Back
> attribute. This avoids the possibility of overwriting the location
> with stale data.
> - After writing to a location with the Write-Back attribute, software
> must clean the location from the caches, to make the write visible to
> external memory.
> - Before reading the location with a cacheable attribute, software must
> invalidate the location from the caches, to ensure that any value held
> in the caches reflects the last value made visible in external memory.
>
> So we as long as the CPU accesses such memory only via the non-cacheable
> mapping, the only requirement is to flush the cache so that there are no
> dirty lines that could be evicted.

Ok, thanks for the explanation.

> (if the mismatched attributes were for example Normal vs Device, the
> Device guarantees would be lost but in the cacheable vs non-cacheable
> it's not too bad; same for ARMv7).

Right, that's probabably what I misremembered.

> > > + if (!addr)
> > > + goto destroy_genpool;
> > > +
> > > + memset(addr, 0, atomic_pool_size);
> > > + __dma_flush_range(addr, addr + atomic_pool_size);
> >
> > It also seems weird to flush the cache on a virtual address of
> > an uncacheable mapping. Is that well-defined?
>
> Yes. According to D5.8.1 (Data and unified caches), "if cache
> maintenance is performed on a memory location, the effect of that cache
> maintenance is visible to all aliases of that physical memory location.
> These properties are consistent with implementing all caches that can
> handle data accesses as Physically-indexed, physically-tagged (PIPT)
> caches".

interesting.

> > In the CMA case, the
> > original mapping should already be uncached here, so you don't need
> > to flush it.
>
> I don't think it is non-cacheable already, at least not for arm64 (CMA
> can be used on coherent architectures as well).

Ok, I see it now.

Sorry for all the confusion on my part.

Arnd