2020-10-03 04:04:42

by John Stultz

[permalink] [raw]
Subject: [PATCH v3 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation

Hey All,
So this is another revision of my patch series to performance
optimizations to the dma-buf system heap.

Unfortunately, in working these up, I realized the heap-helpers
infrastructure we tried to add to miniimize code duplication is
not as generic as we intended. For some heaps it makes sense to
deal with page lists, for other heaps it makes more sense to
track things with sgtables.

So this series reworks the system heap to use sgtables, and then
consolidates the pagelist method from the heap-helpers into the
CMA heap. After which the heap-helpers logic is removed (as it
is unused). I'd still like to find a better way to avoid some of
the logic duplication in implementing the entire dma_buf_ops
handlers per heap. But unfortunately that code is tied somewhat
to how the buffer's memory is tracked.

After this, the series introduces an optimization that
Ørjan Eide implemented for ION that avoids calling sync on
attachments that don't have a mapping.

Next, an optimization to use larger order pages for the system
heap. This change brings us closer to the current performance
of the ION code.

Unfortunately, after submitting the last round, I realized that
part of the reason the page-pooling patch I had included was
providing such great performance numbers, was because the
network page-pool implementation doesn't zero pages that it
pulls from the cache. This is very inappropriate for buffers we
pass to userland and was what gave it an unfair advantage
(almost constant time performance) relative to ION's allocation
performance numbers. I added some patches to zero the buffers
manually similar to how ION does it, but I found this resulted
in basically no performance improvement from the standard page
allocator. Thus I've dropped that patch in this series for now.

Unfortunately this means we still have a performance delta from
the ION system heap as measured by my microbenchmark, and this
delta comes from ION system_heap's use of deferred freeing of
pages. So less work is done in the measured interval of the
microbenchmark. I'll be looking at adding similar code
eventually but I don't want to hold the rest of the patches up
on this, as it is still a good improvement over the current
code.

I've updated the chart I shared earlier with current numbers
(including with the unsubmitted net pagepool implementation, and
with a different unsubmitted pagepool implementation borrowed
from ION) here:
https://docs.google.com/spreadsheets/d/1-1C8ZQpmkl_0DISkI6z4xelE08MlNAN7oEu34AnO4Ao/edit?usp=sharing

I did add to this series a reworked version of my uncached
system heap implementation I was submitting a few weeks back.
Since it duplicated a lot of the now reworked system heap code,
I realized it would be much simpler to add the functionality to
the system_heap implementaiton itself.

While not improving the core allocation performance, the
uncached heap allocations do result in *much* improved
performance on HiKey960 as it avoids a lot of flushing and
invalidating buffers that the cpu doesn't touch often.

Feedback on these would be great!

thanks
-john


New in v3:
* Dropped page-pool patches as after correcting the code to
zero buffers, they provided no net performance gain.
* Added system-uncached implementation ontop of reworked
system-heap.
* Use the new sgtable mapping functions, in the system and cma
code as Suggested-by: Daniel Mentz <[email protected]>
* Cleanup: Use page_size() rather then open-coding it



Cc: Sumit Semwal <[email protected]>
Cc: Liam Mark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Brian Starkey <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: Chris Goldsworthy <[email protected]>
Cc: Ørjan Eide <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ezequiel Garcia <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: James Jones <[email protected]>
Cc: [email protected]
Cc: [email protected]

John Stultz (7):
dma-buf: system_heap: Rework system heap to use sgtables instead of
pagelists
dma-buf: heaps: Move heap-helper logic into the cma_heap
implementation
dma-buf: heaps: Remove heap-helpers code
dma-buf: heaps: Skip sync if not mapped
dma-buf: system_heap: Allocate higher order pages if available
dma-buf: dma-heap: Keep track of the heap device struct
dma-buf: system_heap: Add a system-uncached heap re-using the system
heap

drivers/dma-buf/dma-heap.c | 33 +-
drivers/dma-buf/heaps/Makefile | 1 -
drivers/dma-buf/heaps/cma_heap.c | 327 +++++++++++++++---
drivers/dma-buf/heaps/heap-helpers.c | 271 ---------------
drivers/dma-buf/heaps/heap-helpers.h | 53 ---
drivers/dma-buf/heaps/system_heap.c | 480 ++++++++++++++++++++++++---
include/linux/dma-heap.h | 9 +
7 files changed, 741 insertions(+), 433 deletions(-)
delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h

--
2.17.1


2020-10-03 04:04:52

by John Stultz

[permalink] [raw]
Subject: [PATCH v3 6/7] dma-buf: dma-heap: Keep track of the heap device struct

Keep track of the heap device struct.

This will be useful for special DMA allocations
and actions.

Cc: Sumit Semwal <[email protected]>
Cc: Liam Mark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Brian Starkey <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: Chris Goldsworthy <[email protected]>
Cc: Ørjan Eide <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ezequiel Garcia <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: James Jones <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
drivers/dma-buf/dma-heap.c | 33 +++++++++++++++++++++++++--------
include/linux/dma-heap.h | 9 +++++++++
2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index afd22c9dbdcf..72c746755d89 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -30,6 +30,7 @@
* @heap_devt heap device node
* @list list head connecting to list of heaps
* @heap_cdev heap char device
+ * @heap_dev heap device struct
*
* Represents a heap of memory from which buffers can be made.
*/
@@ -40,6 +41,7 @@ struct dma_heap {
dev_t heap_devt;
struct list_head list;
struct cdev heap_cdev;
+ struct device *heap_dev;
};

static LIST_HEAD(heap_list);
@@ -190,10 +192,21 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
return heap->priv;
}

+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap)
+{
+ return heap->heap_dev;
+}
+
struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
{
struct dma_heap *heap, *h, *err_ret;
- struct device *dev_ret;
unsigned int minor;
int ret;

@@ -247,16 +260,20 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
goto err1;
}

- dev_ret = device_create(dma_heap_class,
- NULL,
- heap->heap_devt,
- NULL,
- heap->name);
- if (IS_ERR(dev_ret)) {
+ heap->heap_dev = device_create(dma_heap_class,
+ NULL,
+ heap->heap_devt,
+ NULL,
+ heap->name);
+ if (IS_ERR(heap->heap_dev)) {
pr_err("dma_heap: Unable to create device\n");
- err_ret = ERR_CAST(dev_ret);
+ err_ret = ERR_CAST(heap->heap_dev);
goto err2;
}
+
+ /* Make sure it doesn't disappear on us */
+ heap->heap_dev = get_device(heap->heap_dev);
+
/* Add heap to the list */
mutex_lock(&heap_list_lock);
list_add(&heap->list, &heap_list);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 454e354d1ffb..82857e096910 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -50,6 +50,15 @@ struct dma_heap_export_info {
*/
void *dma_heap_get_drvdata(struct dma_heap *heap);

+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap);
+
/**
* dma_heap_add - adds a heap to dmabuf heaps
* @exp_info: information needed to register this heap
--
2.17.1

2020-10-03 04:04:59

by John Stultz

[permalink] [raw]
Subject: [PATCH v3 5/7] dma-buf: system_heap: Allocate higher order pages if available

While the system heap can return non-contiguous pages,
try to allocate larger order pages if possible.

This will allow slight performance gains and make implementing
page pooling easier.

Cc: Sumit Semwal <[email protected]>
Cc: Liam Mark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Brian Starkey <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: Chris Goldsworthy <[email protected]>
Cc: Ørjan Eide <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ezequiel Garcia <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: James Jones <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
v3:
* Use page_size() rather then opencoding it
---
drivers/dma-buf/heaps/system_heap.c | 83 ++++++++++++++++++++++-------
1 file changed, 65 insertions(+), 18 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ef8d47e5a7ff..2b8d4b6abacb 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -40,6 +40,14 @@ struct dma_heap_attachment {
bool mapped;
};

+#define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
+ | __GFP_NORETRY) & ~__GFP_RECLAIM) \
+ | __GFP_COMP)
+#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
+static gfp_t order_flags[] = {HIGH_ORDER_GFP, LOW_ORDER_GFP, LOW_ORDER_GFP};
+static const unsigned int orders[] = {8, 4, 0};
+#define NUM_ORDERS ARRAY_SIZE(orders)
+
static struct sg_table *dup_sg_table(struct sg_table *table)
{
struct sg_table *new_table;
@@ -270,8 +278,11 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
int i;

table = &buffer->sg_table;
- for_each_sgtable_sg(table, sg, i)
- __free_page(sg_page(sg));
+ for_each_sg(table->sgl, sg, table->nents, i) {
+ struct page *page = sg_page(sg);
+
+ __free_pages(page, compound_order(page));
+ }
sg_free_table(table);
kfree(buffer);
}
@@ -289,6 +300,26 @@ static const struct dma_buf_ops system_heap_buf_ops = {
.release = system_heap_dma_buf_release,
};

+static struct page *alloc_largest_available(unsigned long size,
+ unsigned int max_order)
+{
+ struct page *page;
+ int i;
+
+ for (i = 0; i < NUM_ORDERS; i++) {
+ if (size < (PAGE_SIZE << orders[i]))
+ continue;
+ if (max_order < orders[i])
+ continue;
+
+ page = alloc_pages(order_flags[i], orders[i]);
+ if (!page)
+ continue;
+ return page;
+ }
+ return NULL;
+}
+
static int system_heap_allocate(struct dma_heap *heap,
unsigned long len,
unsigned long fd_flags,
@@ -296,11 +327,13 @@ static int system_heap_allocate(struct dma_heap *heap,
{
struct system_heap_buffer *buffer;
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+ unsigned long size_remaining = len;
+ unsigned int max_order = orders[0];
struct dma_buf *dmabuf;
struct sg_table *table;
struct scatterlist *sg;
- pgoff_t pagecount;
- pgoff_t pg;
+ struct list_head pages;
+ struct page *page, *tmp_page;
int i, ret = -ENOMEM;

buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
@@ -312,25 +345,35 @@ static int system_heap_allocate(struct dma_heap *heap,
buffer->heap = heap;
buffer->len = len;

- table = &buffer->sg_table;
- pagecount = len / PAGE_SIZE;
- if (sg_alloc_table(table, pagecount, GFP_KERNEL))
- goto free_buffer;
-
- sg = table->sgl;
- for (pg = 0; pg < pagecount; pg++) {
- struct page *page;
+ INIT_LIST_HEAD(&pages);
+ i = 0;
+ while (size_remaining > 0) {
/*
* Avoid trying to allocate memory if the process
* has been killed by SIGKILL
*/
if (fatal_signal_pending(current))
- goto free_pages;
- page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ goto free_buffer;
+
+ page = alloc_largest_available(size_remaining, max_order);
if (!page)
- goto free_pages;
+ goto free_buffer;
+
+ list_add_tail(&page->lru, &pages);
+ size_remaining -= page_size(page);
+ max_order = compound_order(page);
+ i++;
+ }
+
+ table = &buffer->sg_table;
+ if (sg_alloc_table(table, i, GFP_KERNEL))
+ goto free_buffer;
+
+ sg = table->sgl;
+ list_for_each_entry_safe(page, tmp_page, &pages, lru) {
sg_set_page(sg, page, page_size(page), 0);
sg = sg_next(sg);
+ list_del(&page->lru);
}

/* create the dmabuf */
@@ -350,14 +393,18 @@ static int system_heap_allocate(struct dma_heap *heap,
/* just return, as put will call release and that will free */
return ret;
}
-
return ret;

free_pages:
- for_each_sgtable_sg(table, sg, i)
- __free_page(sg_page(sg));
+ for_each_sgtable_sg(table, sg, i) {
+ struct page *p = sg_page(sg);
+
+ __free_pages(p, compound_order(p));
+ }
sg_free_table(table);
free_buffer:
+ list_for_each_entry_safe(page, tmp_page, &pages, lru)
+ __free_pages(page, compound_order(page));
kfree(buffer);

return ret;
--
2.17.1

2020-10-03 04:06:30

by John Stultz

[permalink] [raw]
Subject: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

This adds a heap that allocates non-contiguous buffers that are
marked as writecombined, so they are not cached by the CPU.

This is useful, as most graphics buffers are usually not touched
by the CPU or only written into once by the CPU. So when mapping
the buffer over and over between devices, we can skip the CPU
syncing, which saves a lot of cache management overhead, greatly
improving performance.

For folk using ION, there was a ION_FLAG_CACHED flag, which
signaled if the returned buffer should be CPU cacheable or not.
With DMA-BUF heaps, we do not yet have such a flag, and by default
the current heaps (system and cma) produce CPU cachable buffers.
So for folks transitioning from ION to DMA-BUF Heaps, this fills
in some of that missing functionality.

There has been a suggestion to make this functionality a flag
(DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
ION used the ION_FLAG_CACHED. But I want to make sure an
_UNCACHED flag would truely be a generic attribute across all
heaps. So far that has been unclear, so having it as a separate
heap seemes better for now. (But I'm open to discussion on this
point!)

This is a rework of earlier efforts to add a uncached system heap,
done utilizing the exisitng system heap, adding just a bit of
logic to handle the uncached case.

Feedback would be very welcome!

Many thanks to Liam Mark for his help to get this working.

Pending opensource users of this code include:
* AOSP HiKey960 gralloc:
- https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
- Visibly improves performance over the system heap
* AOSP Codec2 (possibly, needs more review):
- https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325

Cc: Sumit Semwal <[email protected]>
Cc: Liam Mark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Brian Starkey <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: Chris Goldsworthy <[email protected]>
Cc: Ørjan Eide <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ezequiel Garcia <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: James Jones <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
drivers/dma-buf/heaps/system_heap.c | 87 ++++++++++++++++++++++++++---
1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 2b8d4b6abacb..952f1fd9dacf 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -22,6 +22,7 @@
#include <linux/vmalloc.h>

struct dma_heap *sys_heap;
+struct dma_heap *sys_uncached_heap;

struct system_heap_buffer {
struct dma_heap *heap;
@@ -31,6 +32,8 @@ struct system_heap_buffer {
struct sg_table sg_table;
int vmap_cnt;
void *vaddr;
+
+ bool uncached;
};

struct dma_heap_attachment {
@@ -38,6 +41,8 @@ struct dma_heap_attachment {
struct sg_table *table;
struct list_head list;
bool mapped;
+
+ bool uncached;
};

#define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
@@ -94,7 +99,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
a->dev = attachment->dev;
INIT_LIST_HEAD(&a->list);
a->mapped = false;
-
+ a->uncached = buffer->uncached;
attachment->priv = a;

mutex_lock(&buffer->lock);
@@ -124,9 +129,13 @@ static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attac
{
struct dma_heap_attachment *a = attachment->priv;
struct sg_table *table = a->table;
+ int attr = 0;
int ret;

- ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+ if (a->uncached)
+ attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+ ret = dma_map_sgtable(attachment->dev, table, direction, attr);
if (ret)
return ERR_PTR(ret);

@@ -139,9 +148,12 @@ static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
enum dma_data_direction direction)
{
struct dma_heap_attachment *a = attachment->priv;
+ int attr = 0;

+ if (a->uncached)
+ attr = DMA_ATTR_SKIP_CPU_SYNC;
a->mapped = false;
- dma_unmap_sgtable(attachment->dev, table, direction, 0);
+ dma_unmap_sgtable(attachment->dev, table, direction, attr);
}

static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -150,6 +162,9 @@ static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
struct system_heap_buffer *buffer = dmabuf->priv;
struct dma_heap_attachment *a;

+ if (buffer->uncached)
+ return 0;
+
mutex_lock(&buffer->lock);

if (buffer->vmap_cnt)
@@ -171,6 +186,9 @@ static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
struct system_heap_buffer *buffer = dmabuf->priv;
struct dma_heap_attachment *a;

+ if (buffer->uncached)
+ return 0;
+
mutex_lock(&buffer->lock);

if (buffer->vmap_cnt)
@@ -194,6 +212,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
struct sg_page_iter piter;
int ret;

+ if (buffer->uncached)
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
struct page *page = sg_page_iter_page(&piter);

@@ -215,8 +236,12 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
struct page **pages = vmalloc(sizeof(struct page *) * npages);
struct page **tmp = pages;
struct sg_page_iter piter;
+ pgprot_t pgprot = PAGE_KERNEL;
void *vaddr;

+ if (buffer->uncached)
+ pgprot = pgprot_writecombine(PAGE_KERNEL);
+
if (!pages)
return ERR_PTR(-ENOMEM);

@@ -225,7 +250,7 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
*tmp++ = sg_page_iter_page(&piter);
}

- vaddr = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+ vaddr = vmap(pages, npages, VM_MAP, pgprot);
vfree(pages);

if (!vaddr)
@@ -278,6 +303,10 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
int i;

table = &buffer->sg_table;
+ /* Unmap the uncached buffers from the heap device (pairs with the map on allocate) */
+ if (buffer->uncached)
+ dma_unmap_sgtable(dma_heap_get_dev(buffer->heap), table, DMA_BIDIRECTIONAL, 0);
+
for_each_sg(table->sgl, sg, table->nents, i) {
struct page *page = sg_page(sg);

@@ -320,10 +349,11 @@ static struct page *alloc_largest_available(unsigned long size,
return NULL;
}

-static int system_heap_allocate(struct dma_heap *heap,
- unsigned long len,
- unsigned long fd_flags,
- unsigned long heap_flags)
+static int system_heap_do_allocate(struct dma_heap *heap,
+ unsigned long len,
+ unsigned long fd_flags,
+ unsigned long heap_flags,
+ bool uncached)
{
struct system_heap_buffer *buffer;
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
@@ -344,6 +374,7 @@ static int system_heap_allocate(struct dma_heap *heap,
mutex_init(&buffer->lock);
buffer->heap = heap;
buffer->len = len;
+ buffer->uncached = uncached;

INIT_LIST_HEAD(&pages);
i = 0;
@@ -393,6 +424,16 @@ static int system_heap_allocate(struct dma_heap *heap,
/* just return, as put will call release and that will free */
return ret;
}
+
+ /*
+ * For uncached buffers, we need to initially flush cpu cache, since
+ * the __GFP_ZERO on the allocation means the zeroing was done by the
+ * cpu and thus it is likely cached. Map (and implicitly flush) it out
+ * now so we don't get corruption later on.
+ */
+ if (buffer->uncached)
+ dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
+
return ret;

free_pages:
@@ -410,10 +451,30 @@ static int system_heap_allocate(struct dma_heap *heap,
return ret;
}

+static int system_heap_allocate(struct dma_heap *heap,
+ unsigned long len,
+ unsigned long fd_flags,
+ unsigned long heap_flags)
+{
+ return system_heap_do_allocate(heap, len, fd_flags, heap_flags, false);
+}
+
static const struct dma_heap_ops system_heap_ops = {
.allocate = system_heap_allocate,
};

+static int system_uncached_heap_allocate(struct dma_heap *heap,
+ unsigned long len,
+ unsigned long fd_flags,
+ unsigned long heap_flags)
+{
+ return system_heap_do_allocate(heap, len, fd_flags, heap_flags, true);
+}
+
+static const struct dma_heap_ops system_uncached_heap_ops = {
+ .allocate = system_uncached_heap_allocate,
+};
+
static int system_heap_create(void)
{
struct dma_heap_export_info exp_info;
@@ -426,6 +487,16 @@ static int system_heap_create(void)
if (IS_ERR(sys_heap))
return PTR_ERR(sys_heap);

+ exp_info.name = "system-uncached";
+ exp_info.ops = &system_uncached_heap_ops;
+ exp_info.priv = NULL;
+
+ sys_uncached_heap = dma_heap_add(&exp_info);
+ if (IS_ERR(sys_uncached_heap))
+ return PTR_ERR(sys_heap);
+
+ dma_coerce_mask_and_coherent(dma_heap_get_dev(sys_uncached_heap), DMA_BIT_MASK(64));
+
return 0;
}
module_init(system_heap_create);
--
2.17.1

2020-10-03 04:06:36

by John Stultz

[permalink] [raw]
Subject: [PATCH v3 4/7] dma-buf: heaps: Skip sync if not mapped

This patch is basically a port of Ørjan Eide's similar patch for ION
https://lore.kernel.org/lkml/[email protected]/

Only sync the sg-list of dma-buf heap attachment when the attachment
is actually mapped on the device.

dma-bufs may be synced at any time. It can be reached from user space
via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when
syncs may be attempted, and dma_buf_end_cpu_access() and
dma_buf_begin_cpu_access() may not be paired.

Since the sg_list's dma_address isn't set up until the buffer is used
on the device, and dma_map_sg() is called on it, the dma_address will be
NULL if sync is attempted on the dma-buf before it's mapped on a device.

Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops
into the dma_direct code")) this was a problem as the dma-api (at least
the swiotlb_dma_ops on arm64) would use the potentially invalid
dma_address. How that failed depended on how the device handled physical
address 0. If 0 was a valid address to physical ram, that page would get
flushed a lot, while the actual pages in the buffer would not get synced
correctly. While if 0 is an invalid physical address it may cause a
fault and trigger a crash.

In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct:
merge swiotlb_dma_ops into the dma_direct code"), as this moved the
dma-api to use the page pointer in the sg_list, and (for Ion buffers at
least) this will always be valid if the sg_list exists at all.

But, this issue is re-introduced in v5.3 with
commit 449fa54d6815 ("dma-direct: correct the physical addr in
dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old
behaviour and picks the dma_address that may be invalid.

dma-buf core doesn't ensure that the buffer is mapped on the device, and
thus have a valid sg_list, before calling the exporter's
begin_cpu_access.

Logic and commit message originally by: Ørjan Eide <[email protected]>

Cc: Sumit Semwal <[email protected]>
Cc: Liam Mark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Brian Starkey <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: Chris Goldsworthy <[email protected]>
Cc: Ørjan Eide <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ezequiel Garcia <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: James Jones <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
drivers/dma-buf/heaps/cma_heap.c | 10 ++++++++++
drivers/dma-buf/heaps/system_heap.c | 10 ++++++++++
2 files changed, 20 insertions(+)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 366963b94c72..fece9d7739ae 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -44,6 +44,7 @@ struct dma_heap_attachment {
struct device *dev;
struct sg_table table;
struct list_head list;
+ bool mapped;
};

static int cma_heap_attach(struct dma_buf *dmabuf,
@@ -68,6 +69,7 @@ static int cma_heap_attach(struct dma_buf *dmabuf,

a->dev = attachment->dev;
INIT_LIST_HEAD(&a->list);
+ a->mapped = false;

attachment->priv = a;

@@ -102,6 +104,7 @@ static struct sg_table *cma_heap_map_dma_buf(struct dma_buf_attachment *attachme
ret = dma_map_sgtable(attachment->dev, table, direction, 0);
if (ret)
return ERR_PTR(-ENOMEM);
+ a->mapped = true;
return table;
}

@@ -109,6 +112,9 @@ static void cma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
{
+ struct dma_heap_attachment *a = attachment->priv;
+
+ a->mapped = false;
dma_unmap_sgtable(attachment->dev, table, direction, 0);
}

@@ -123,6 +129,8 @@ static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,

mutex_lock(&buffer->lock);
list_for_each_entry(a, &buffer->attachments, list) {
+ if (!a->mapped)
+ continue;
dma_sync_sgtable_for_cpu(a->dev, &a->table, direction);
}
mutex_unlock(&buffer->lock);
@@ -141,6 +149,8 @@ static int cma_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,

mutex_lock(&buffer->lock);
list_for_each_entry(a, &buffer->attachments, list) {
+ if (!a->mapped)
+ continue;
dma_sync_sgtable_for_device(a->dev, &a->table, direction);
}
mutex_unlock(&buffer->lock);
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 00ed107b3b76..ef8d47e5a7ff 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -37,6 +37,7 @@ struct dma_heap_attachment {
struct device *dev;
struct sg_table *table;
struct list_head list;
+ bool mapped;
};

static struct sg_table *dup_sg_table(struct sg_table *table)
@@ -84,6 +85,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
a->table = table;
a->dev = attachment->dev;
INIT_LIST_HEAD(&a->list);
+ a->mapped = false;

attachment->priv = a;

@@ -120,6 +122,7 @@ static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attac
if (ret)
return ERR_PTR(ret);

+ a->mapped = true;
return table;
}

@@ -127,6 +130,9 @@ static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
{
+ struct dma_heap_attachment *a = attachment->priv;
+
+ a->mapped = false;
dma_unmap_sgtable(attachment->dev, table, direction, 0);
}

@@ -142,6 +148,8 @@ static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
invalidate_kernel_vmap_range(buffer->vaddr, buffer->len);

list_for_each_entry(a, &buffer->attachments, list) {
+ if (!a->mapped)
+ continue;
dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
}
mutex_unlock(&buffer->lock);
@@ -161,6 +169,8 @@ static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
flush_kernel_vmap_range(buffer->vaddr, buffer->len);

list_for_each_entry(a, &buffer->attachments, list) {
+ if (!a->mapped)
+ continue;
dma_sync_sgtable_for_device(a->dev, a->table, direction);
}
mutex_unlock(&buffer->lock);
--
2.17.1

2020-10-03 04:07:40

by John Stultz

[permalink] [raw]
Subject: [PATCH v3 3/7] dma-buf: heaps: Remove heap-helpers code

The heap-helpers code was not as generic as initially hoped
and it is now not being used, so remove it from the tree.

Cc: Sumit Semwal <[email protected]>
Cc: Liam Mark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Brian Starkey <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: Chris Goldsworthy <[email protected]>
Cc: Ørjan Eide <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ezequiel Garcia <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: James Jones <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
drivers/dma-buf/heaps/Makefile | 1 -
drivers/dma-buf/heaps/heap-helpers.c | 271 ---------------------------
drivers/dma-buf/heaps/heap-helpers.h | 53 ------
3 files changed, 325 deletions(-)
delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h

diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index 6e54cdec3da0..974467791032 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y += heap-helpers.o
obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o
obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o
diff --git a/drivers/dma-buf/heaps/heap-helpers.c b/drivers/dma-buf/heaps/heap-helpers.c
deleted file mode 100644
index 9f964ca3f59c..000000000000
--- a/drivers/dma-buf/heaps/heap-helpers.c
+++ /dev/null
@@ -1,271 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/device.h>
-#include <linux/dma-buf.h>
-#include <linux/err.h>
-#include <linux/highmem.h>
-#include <linux/idr.h>
-#include <linux/list.h>
-#include <linux/slab.h>
-#include <linux/uaccess.h>
-#include <linux/vmalloc.h>
-#include <uapi/linux/dma-heap.h>
-
-#include "heap-helpers.h"
-
-void init_heap_helper_buffer(struct heap_helper_buffer *buffer,
- void (*free)(struct heap_helper_buffer *))
-{
- buffer->priv_virt = NULL;
- mutex_init(&buffer->lock);
- buffer->vmap_cnt = 0;
- buffer->vaddr = NULL;
- buffer->pagecount = 0;
- buffer->pages = NULL;
- INIT_LIST_HEAD(&buffer->attachments);
- buffer->free = free;
-}
-
-struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer,
- int fd_flags)
-{
- DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
-
- exp_info.ops = &heap_helper_ops;
- exp_info.size = buffer->size;
- exp_info.flags = fd_flags;
- exp_info.priv = buffer;
-
- return dma_buf_export(&exp_info);
-}
-
-static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer)
-{
- void *vaddr;
-
- vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
- if (!vaddr)
- return ERR_PTR(-ENOMEM);
-
- return vaddr;
-}
-
-static void dma_heap_buffer_destroy(struct heap_helper_buffer *buffer)
-{
- if (buffer->vmap_cnt > 0) {
- WARN(1, "%s: buffer still mapped in the kernel\n", __func__);
- vunmap(buffer->vaddr);
- }
-
- buffer->free(buffer);
-}
-
-static void *dma_heap_buffer_vmap_get(struct heap_helper_buffer *buffer)
-{
- void *vaddr;
-
- if (buffer->vmap_cnt) {
- buffer->vmap_cnt++;
- return buffer->vaddr;
- }
- vaddr = dma_heap_map_kernel(buffer);
- if (IS_ERR(vaddr))
- return vaddr;
- buffer->vaddr = vaddr;
- buffer->vmap_cnt++;
- return vaddr;
-}
-
-static void dma_heap_buffer_vmap_put(struct heap_helper_buffer *buffer)
-{
- if (!--buffer->vmap_cnt) {
- vunmap(buffer->vaddr);
- buffer->vaddr = NULL;
- }
-}
-
-struct dma_heaps_attachment {
- struct device *dev;
- struct sg_table table;
- struct list_head list;
-};
-
-static int dma_heap_attach(struct dma_buf *dmabuf,
- struct dma_buf_attachment *attachment)
-{
- struct dma_heaps_attachment *a;
- struct heap_helper_buffer *buffer = dmabuf->priv;
- int ret;
-
- a = kzalloc(sizeof(*a), GFP_KERNEL);
- if (!a)
- return -ENOMEM;
-
- ret = sg_alloc_table_from_pages(&a->table, buffer->pages,
- buffer->pagecount, 0,
- buffer->pagecount << PAGE_SHIFT,
- GFP_KERNEL);
- if (ret) {
- kfree(a);
- return ret;
- }
-
- a->dev = attachment->dev;
- INIT_LIST_HEAD(&a->list);
-
- attachment->priv = a;
-
- mutex_lock(&buffer->lock);
- list_add(&a->list, &buffer->attachments);
- mutex_unlock(&buffer->lock);
-
- return 0;
-}
-
-static void dma_heap_detach(struct dma_buf *dmabuf,
- struct dma_buf_attachment *attachment)
-{
- struct dma_heaps_attachment *a = attachment->priv;
- struct heap_helper_buffer *buffer = dmabuf->priv;
-
- mutex_lock(&buffer->lock);
- list_del(&a->list);
- mutex_unlock(&buffer->lock);
-
- sg_free_table(&a->table);
- kfree(a);
-}
-
-static
-struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment,
- enum dma_data_direction direction)
-{
- struct dma_heaps_attachment *a = attachment->priv;
- struct sg_table *table;
-
- table = &a->table;
-
- if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
- direction))
- table = ERR_PTR(-ENOMEM);
- return table;
-}
-
-static void dma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
- struct sg_table *table,
- enum dma_data_direction direction)
-{
- dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
-}
-
-static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf)
-{
- struct vm_area_struct *vma = vmf->vma;
- struct heap_helper_buffer *buffer = vma->vm_private_data;
-
- if (vmf->pgoff > buffer->pagecount)
- return VM_FAULT_SIGBUS;
-
- vmf->page = buffer->pages[vmf->pgoff];
- get_page(vmf->page);
-
- return 0;
-}
-
-static const struct vm_operations_struct dma_heap_vm_ops = {
- .fault = dma_heap_vm_fault,
-};
-
-static int dma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
-{
- struct heap_helper_buffer *buffer = dmabuf->priv;
-
- if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
- return -EINVAL;
-
- vma->vm_ops = &dma_heap_vm_ops;
- vma->vm_private_data = buffer;
-
- return 0;
-}
-
-static void dma_heap_dma_buf_release(struct dma_buf *dmabuf)
-{
- struct heap_helper_buffer *buffer = dmabuf->priv;
-
- dma_heap_buffer_destroy(buffer);
-}
-
-static int dma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
- enum dma_data_direction direction)
-{
- struct heap_helper_buffer *buffer = dmabuf->priv;
- struct dma_heaps_attachment *a;
- int ret = 0;
-
- mutex_lock(&buffer->lock);
-
- if (buffer->vmap_cnt)
- invalidate_kernel_vmap_range(buffer->vaddr, buffer->size);
-
- list_for_each_entry(a, &buffer->attachments, list) {
- dma_sync_sg_for_cpu(a->dev, a->table.sgl, a->table.nents,
- direction);
- }
- mutex_unlock(&buffer->lock);
-
- return ret;
-}
-
-static int dma_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
- enum dma_data_direction direction)
-{
- struct heap_helper_buffer *buffer = dmabuf->priv;
- struct dma_heaps_attachment *a;
-
- mutex_lock(&buffer->lock);
-
- if (buffer->vmap_cnt)
- flush_kernel_vmap_range(buffer->vaddr, buffer->size);
-
- list_for_each_entry(a, &buffer->attachments, list) {
- dma_sync_sg_for_device(a->dev, a->table.sgl, a->table.nents,
- direction);
- }
- mutex_unlock(&buffer->lock);
-
- return 0;
-}
-
-static void *dma_heap_dma_buf_vmap(struct dma_buf *dmabuf)
-{
- struct heap_helper_buffer *buffer = dmabuf->priv;
- void *vaddr;
-
- mutex_lock(&buffer->lock);
- vaddr = dma_heap_buffer_vmap_get(buffer);
- mutex_unlock(&buffer->lock);
-
- return vaddr;
-}
-
-static void dma_heap_dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
-{
- struct heap_helper_buffer *buffer = dmabuf->priv;
-
- mutex_lock(&buffer->lock);
- dma_heap_buffer_vmap_put(buffer);
- mutex_unlock(&buffer->lock);
-}
-
-const struct dma_buf_ops heap_helper_ops = {
- .map_dma_buf = dma_heap_map_dma_buf,
- .unmap_dma_buf = dma_heap_unmap_dma_buf,
- .mmap = dma_heap_mmap,
- .release = dma_heap_dma_buf_release,
- .attach = dma_heap_attach,
- .detach = dma_heap_detach,
- .begin_cpu_access = dma_heap_dma_buf_begin_cpu_access,
- .end_cpu_access = dma_heap_dma_buf_end_cpu_access,
- .vmap = dma_heap_dma_buf_vmap,
- .vunmap = dma_heap_dma_buf_vunmap,
-};
diff --git a/drivers/dma-buf/heaps/heap-helpers.h b/drivers/dma-buf/heaps/heap-helpers.h
deleted file mode 100644
index 805d2df88024..000000000000
--- a/drivers/dma-buf/heaps/heap-helpers.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * DMABUF Heaps helper code
- *
- * Copyright (C) 2011 Google, Inc.
- * Copyright (C) 2019 Linaro Ltd.
- */
-
-#ifndef _HEAP_HELPERS_H
-#define _HEAP_HELPERS_H
-
-#include <linux/dma-heap.h>
-#include <linux/list.h>
-
-/**
- * struct heap_helper_buffer - helper buffer metadata
- * @heap: back pointer to the heap the buffer came from
- * @dmabuf: backing dma-buf for this buffer
- * @size: size of the buffer
- * @priv_virt pointer to heap specific private value
- * @lock mutext to protect the data in this structure
- * @vmap_cnt count of vmap references on the buffer
- * @vaddr vmap'ed virtual address
- * @pagecount number of pages in the buffer
- * @pages list of page pointers
- * @attachments list of device attachments
- *
- * @free heap callback to free the buffer
- */
-struct heap_helper_buffer {
- struct dma_heap *heap;
- struct dma_buf *dmabuf;
- size_t size;
-
- void *priv_virt;
- struct mutex lock;
- int vmap_cnt;
- void *vaddr;
- pgoff_t pagecount;
- struct page **pages;
- struct list_head attachments;
-
- void (*free)(struct heap_helper_buffer *buffer);
-};
-
-void init_heap_helper_buffer(struct heap_helper_buffer *buffer,
- void (*free)(struct heap_helper_buffer *));
-
-struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer,
- int fd_flags);
-
-extern const struct dma_buf_ops heap_helper_ops;
-#endif /* _HEAP_HELPERS_H */
--
2.17.1

2020-10-06 00:29:44

by kernel test robot

[permalink] [raw]
Subject: [RFC PATCH] dma-buf: system_heap: sys_uncached_heap can be static


Signed-off-by: kernel test robot <[email protected]>
---
system_heap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 952f1fd9dacfda..fa17959d81795d 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -22,7 +22,7 @@
#include <linux/vmalloc.h>

struct dma_heap *sys_heap;
-struct dma_heap *sys_uncached_heap;
+static struct dma_heap *sys_uncached_heap;

struct system_heap_buffer {
struct dma_heap *heap;

2020-10-06 00:51:04

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

Hi John,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on tegra-drm/drm/tegra/for-next linus/master v5.9-rc8]
[cannot apply to next-20201002]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/John-Stultz/dma-buf-Performance-improvements-for-system-heap-a-system-uncached-implementation/20201003-120520
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c
config: i386-randconfig-s032-20201005 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-201-g24bdaac6-dirty
# https://github.com/0day-ci/linux/commit/553f4e0fafc5b3b4542ea0c7bc21a70fce504c7f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review John-Stultz/dma-buf-Performance-improvements-for-system-heap-a-system-uncached-implementation/20201003-120520
git checkout 553f4e0fafc5b3b4542ea0c7bc21a70fce504c7f
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

echo
echo "sparse warnings: (new ones prefixed by >>)"
echo
>> drivers/dma-buf/heaps/system_heap.c:25:17: sparse: sparse: symbol 'sys_uncached_heap' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.84 kB)
.config.gz (35.33 kB)
Download all attachments

2020-10-07 07:46:38

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

Hi John,

url: https://github.com/0day-ci/linux/commits/John-Stultz/dma-buf-Performance-improvements-for-system-heap-a-system-uncached-implementation/20201003-120520
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c
config: i386-randconfig-m021-20201002 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>

smatch warnings:
drivers/dma-buf/heaps/system_heap.c:496 system_heap_create() warn: passing zero to 'PTR_ERR'

vim +/PTR_ERR +496 drivers/dma-buf/heaps/system_heap.c

efa04fefebbd724 John Stultz 2019-12-03 478 static int system_heap_create(void)
efa04fefebbd724 John Stultz 2019-12-03 479 {
efa04fefebbd724 John Stultz 2019-12-03 480 struct dma_heap_export_info exp_info;
efa04fefebbd724 John Stultz 2019-12-03 481
263e38f82cbb35b Andrew F. Davis 2019-12-16 482 exp_info.name = "system";
efa04fefebbd724 John Stultz 2019-12-03 483 exp_info.ops = &system_heap_ops;
efa04fefebbd724 John Stultz 2019-12-03 484 exp_info.priv = NULL;
efa04fefebbd724 John Stultz 2019-12-03 485
efa04fefebbd724 John Stultz 2019-12-03 486 sys_heap = dma_heap_add(&exp_info);
efa04fefebbd724 John Stultz 2019-12-03 487 if (IS_ERR(sys_heap))
a2e10cdd2e4d12a John Stultz 2020-10-03 488 return PTR_ERR(sys_heap);
efa04fefebbd724 John Stultz 2019-12-03 489
553f4e0fafc5b3b John Stultz 2020-10-03 490 exp_info.name = "system-uncached";
553f4e0fafc5b3b John Stultz 2020-10-03 491 exp_info.ops = &system_uncached_heap_ops;
553f4e0fafc5b3b John Stultz 2020-10-03 492 exp_info.priv = NULL;
553f4e0fafc5b3b John Stultz 2020-10-03 493
553f4e0fafc5b3b John Stultz 2020-10-03 494 sys_uncached_heap = dma_heap_add(&exp_info);
553f4e0fafc5b3b John Stultz 2020-10-03 495 if (IS_ERR(sys_uncached_heap))
553f4e0fafc5b3b John Stultz 2020-10-03 @496 return PTR_ERR(sys_heap);
^^^^^^^^^^^^^^^^^
This should be return PTR_ERR(sys_uncached_heap);


---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.35 kB)
.config.gz (31.79 kB)
Download all attachments

2020-10-08 06:07:50

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

On Mon, Oct 5, 2020 at 6:45 AM Christoph Hellwig <[email protected]> wrote:
>
> How is this going to deal with VIVT caches?

Hrm. That's a good question. I'm not sure I totally have my head
around it but, I guess we could make sure to call
invalidate_kernel_vmap_range() in begin_cpu_access() and
flush_kernel_vmap_range() in end_cpu_access() rather then exiting out
early as we do now?

Unless you have better guidance?

Worse case we could check CONFIG_CPU_CACHE_VIVT and not register the
system-uncached heap.

thanks
-john

2020-10-08 06:17:17

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

On Wed, Oct 7, 2020 at 12:44 AM Dan Carpenter <[email protected]> wrote:
>
> Hi John,
>
> url: https://github.com/0day-ci/linux/commits/John-Stultz/dma-buf-Performance-improvements-for-system-heap-a-system-uncached-implementation/20201003-120520
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c
> config: i386-randconfig-m021-20201002 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
> Reported-by: Dan Carpenter <[email protected]>
>
> smatch warnings:
> drivers/dma-buf/heaps/system_heap.c:496 system_heap_create() warn: passing zero to 'PTR_ERR'
>
> vim +/PTR_ERR +496 drivers/dma-buf/heaps/system_heap.c
>
> efa04fefebbd724 John Stultz 2019-12-03 478 static int system_heap_create(void)
> efa04fefebbd724 John Stultz 2019-12-03 479 {
> efa04fefebbd724 John Stultz 2019-12-03 480 struct dma_heap_export_info exp_info;
> efa04fefebbd724 John Stultz 2019-12-03 481
> 263e38f82cbb35b Andrew F. Davis 2019-12-16 482 exp_info.name = "system";
> efa04fefebbd724 John Stultz 2019-12-03 483 exp_info.ops = &system_heap_ops;
> efa04fefebbd724 John Stultz 2019-12-03 484 exp_info.priv = NULL;
> efa04fefebbd724 John Stultz 2019-12-03 485
> efa04fefebbd724 John Stultz 2019-12-03 486 sys_heap = dma_heap_add(&exp_info);
> efa04fefebbd724 John Stultz 2019-12-03 487 if (IS_ERR(sys_heap))
> a2e10cdd2e4d12a John Stultz 2020-10-03 488 return PTR_ERR(sys_heap);
> efa04fefebbd724 John Stultz 2019-12-03 489
> 553f4e0fafc5b3b John Stultz 2020-10-03 490 exp_info.name = "system-uncached";
> 553f4e0fafc5b3b John Stultz 2020-10-03 491 exp_info.ops = &system_uncached_heap_ops;
> 553f4e0fafc5b3b John Stultz 2020-10-03 492 exp_info.priv = NULL;
> 553f4e0fafc5b3b John Stultz 2020-10-03 493
> 553f4e0fafc5b3b John Stultz 2020-10-03 494 sys_uncached_heap = dma_heap_add(&exp_info);
> 553f4e0fafc5b3b John Stultz 2020-10-03 495 if (IS_ERR(sys_uncached_heap))
> 553f4e0fafc5b3b John Stultz 2020-10-03 @496 return PTR_ERR(sys_heap);
> ^^^^^^^^^^^^^^^^^
> This should be return PTR_ERR(sys_uncached_heap);

Oh nice! Very impressed that the tool caught that!
Thanks so much for the report! I'll fix it up here shortly.
-john

2020-10-08 11:55:51

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

On Sat, Oct 03, 2020 at 04:02:57AM +0000, John Stultz wrote:
> This adds a heap that allocates non-contiguous buffers that are
> marked as writecombined, so they are not cached by the CPU.
>
> This is useful, as most graphics buffers are usually not touched
> by the CPU or only written into once by the CPU. So when mapping
> the buffer over and over between devices, we can skip the CPU
> syncing, which saves a lot of cache management overhead, greatly
> improving performance.
>
> For folk using ION, there was a ION_FLAG_CACHED flag, which
> signaled if the returned buffer should be CPU cacheable or not.
> With DMA-BUF heaps, we do not yet have such a flag, and by default
> the current heaps (system and cma) produce CPU cachable buffers.
> So for folks transitioning from ION to DMA-BUF Heaps, this fills
> in some of that missing functionality.
>
> There has been a suggestion to make this functionality a flag
> (DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
> ION used the ION_FLAG_CACHED. But I want to make sure an
> _UNCACHED flag would truely be a generic attribute across all
> heaps. So far that has been unclear, so having it as a separate
> heap seemes better for now. (But I'm open to discussion on this
> point!)
>
> This is a rework of earlier efforts to add a uncached system heap,
> done utilizing the exisitng system heap, adding just a bit of
> logic to handle the uncached case.
>
> Feedback would be very welcome!
>
> Many thanks to Liam Mark for his help to get this working.
>
> Pending opensource users of this code include:
> * AOSP HiKey960 gralloc:
> - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> - Visibly improves performance over the system heap
> * AOSP Codec2 (possibly, needs more review):
> - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
>
> Cc: Sumit Semwal <[email protected]>
> Cc: Liam Mark <[email protected]>
> Cc: Laura Abbott <[email protected]>
> Cc: Brian Starkey <[email protected]>
> Cc: Hridya Valsaraju <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Cc: Sandeep Patil <[email protected]>
> Cc: Daniel Mentz <[email protected]>
> Cc: Chris Goldsworthy <[email protected]>
> Cc: �rjan Eide <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Ezequiel Garcia <[email protected]>
> Cc: Simon Ser <[email protected]>
> Cc: James Jones <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: John Stultz <[email protected]>
> ---
> drivers/dma-buf/heaps/system_heap.c | 87 ++++++++++++++++++++++++++---
> 1 file changed, 79 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 2b8d4b6abacb..952f1fd9dacf 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -22,6 +22,7 @@
> #include <linux/vmalloc.h>
>
> struct dma_heap *sys_heap;
> +struct dma_heap *sys_uncached_heap;
>
> struct system_heap_buffer {
> struct dma_heap *heap;
> @@ -31,6 +32,8 @@ struct system_heap_buffer {
> struct sg_table sg_table;
> int vmap_cnt;
> void *vaddr;
> +
> + bool uncached;
> };
>
> struct dma_heap_attachment {
> @@ -38,6 +41,8 @@ struct dma_heap_attachment {
> struct sg_table *table;
> struct list_head list;
> bool mapped;
> +
> + bool uncached;
> };
>
> #define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> @@ -94,7 +99,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
> a->dev = attachment->dev;
> INIT_LIST_HEAD(&a->list);
> a->mapped = false;
> -
> + a->uncached = buffer->uncached;
> attachment->priv = a;
>
> mutex_lock(&buffer->lock);
> @@ -124,9 +129,13 @@ static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attac
> {
> struct dma_heap_attachment *a = attachment->priv;
> struct sg_table *table = a->table;
> + int attr = 0;
> int ret;
>
> - ret = dma_map_sgtable(attachment->dev, table, direction, 0);
> + if (a->uncached)
> + attr = DMA_ATTR_SKIP_CPU_SYNC;
> +
> + ret = dma_map_sgtable(attachment->dev, table, direction, attr);
> if (ret)
> return ERR_PTR(ret);
>
> @@ -139,9 +148,12 @@ static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
> enum dma_data_direction direction)
> {
> struct dma_heap_attachment *a = attachment->priv;
> + int attr = 0;
>
> + if (a->uncached)
> + attr = DMA_ATTR_SKIP_CPU_SYNC;
> a->mapped = false;
> - dma_unmap_sgtable(attachment->dev, table, direction, 0);
> + dma_unmap_sgtable(attachment->dev, table, direction, attr);
> }
>
> static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
> @@ -150,6 +162,9 @@ static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
> struct system_heap_buffer *buffer = dmabuf->priv;
> struct dma_heap_attachment *a;
>
> + if (buffer->uncached)
> + return 0;
> +
> mutex_lock(&buffer->lock);
>
> if (buffer->vmap_cnt)
> @@ -171,6 +186,9 @@ static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
> struct system_heap_buffer *buffer = dmabuf->priv;
> struct dma_heap_attachment *a;
>
> + if (buffer->uncached)
> + return 0;
> +
> mutex_lock(&buffer->lock);
>
> if (buffer->vmap_cnt)
> @@ -194,6 +212,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> struct sg_page_iter piter;
> int ret;
>
> + if (buffer->uncached)
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
> struct page *page = sg_page_iter_page(&piter);
>
> @@ -215,8 +236,12 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
> struct page **pages = vmalloc(sizeof(struct page *) * npages);
> struct page **tmp = pages;
> struct sg_page_iter piter;
> + pgprot_t pgprot = PAGE_KERNEL;
> void *vaddr;
>
> + if (buffer->uncached)
> + pgprot = pgprot_writecombine(PAGE_KERNEL);

I think this should go after the 'if (!pages)' check. Best to get the
allocation failure check as close to the allocation as possible IMO.

> +
> if (!pages)
> return ERR_PTR(-ENOMEM);
>
> @@ -225,7 +250,7 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
> *tmp++ = sg_page_iter_page(&piter);
> }
>
> - vaddr = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
> + vaddr = vmap(pages, npages, VM_MAP, pgprot);
> vfree(pages);
>
> if (!vaddr)
> @@ -278,6 +303,10 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
> int i;
>
> table = &buffer->sg_table;
> + /* Unmap the uncached buffers from the heap device (pairs with the map on allocate) */
> + if (buffer->uncached)
> + dma_unmap_sgtable(dma_heap_get_dev(buffer->heap), table, DMA_BIDIRECTIONAL, 0);
> +
> for_each_sg(table->sgl, sg, table->nents, i) {
> struct page *page = sg_page(sg);
>
> @@ -320,10 +349,11 @@ static struct page *alloc_largest_available(unsigned long size,
> return NULL;
> }
>
> -static int system_heap_allocate(struct dma_heap *heap,
> - unsigned long len,
> - unsigned long fd_flags,
> - unsigned long heap_flags)
> +static int system_heap_do_allocate(struct dma_heap *heap,
> + unsigned long len,
> + unsigned long fd_flags,
> + unsigned long heap_flags,
> + bool uncached)
> {
> struct system_heap_buffer *buffer;
> DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> @@ -344,6 +374,7 @@ static int system_heap_allocate(struct dma_heap *heap,
> mutex_init(&buffer->lock);
> buffer->heap = heap;
> buffer->len = len;
> + buffer->uncached = uncached;
>
> INIT_LIST_HEAD(&pages);
> i = 0;
> @@ -393,6 +424,16 @@ static int system_heap_allocate(struct dma_heap *heap,
> /* just return, as put will call release and that will free */
> return ret;
> }
> +
> + /*
> + * For uncached buffers, we need to initially flush cpu cache, since
> + * the __GFP_ZERO on the allocation means the zeroing was done by the
> + * cpu and thus it is likely cached. Map (and implicitly flush) it out
> + * now so we don't get corruption later on.
> + */
> + if (buffer->uncached)
> + dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);

Do we have to keep this mapping around for the entire lifetime of the
buffer?

Also, this problem (and solution) keeps lingering around. It really
feels like there should be a better way to solve "clean the linear
mapping all the way to DRAM", but I don't know what that should be.

> +
> return ret;
>
> free_pages:
> @@ -410,10 +451,30 @@ static int system_heap_allocate(struct dma_heap *heap,
> return ret;
> }
>
> +static int system_heap_allocate(struct dma_heap *heap,
> + unsigned long len,
> + unsigned long fd_flags,
> + unsigned long heap_flags)
> +{
> + return system_heap_do_allocate(heap, len, fd_flags, heap_flags, false);
> +}
> +
> static const struct dma_heap_ops system_heap_ops = {
> .allocate = system_heap_allocate,
> };
>
> +static int system_uncached_heap_allocate(struct dma_heap *heap,
> + unsigned long len,
> + unsigned long fd_flags,
> + unsigned long heap_flags)
> +{
> + return system_heap_do_allocate(heap, len, fd_flags, heap_flags, true);
> +}
> +
> +static const struct dma_heap_ops system_uncached_heap_ops = {
> + .allocate = system_uncached_heap_allocate,
> +};
> +
> static int system_heap_create(void)
> {
> struct dma_heap_export_info exp_info;
> @@ -426,6 +487,16 @@ static int system_heap_create(void)
> if (IS_ERR(sys_heap))
> return PTR_ERR(sys_heap);
>
> + exp_info.name = "system-uncached";
> + exp_info.ops = &system_uncached_heap_ops;
> + exp_info.priv = NULL;
> +
> + sys_uncached_heap = dma_heap_add(&exp_info);
> + if (IS_ERR(sys_uncached_heap))
> + return PTR_ERR(sys_heap);
> +

In principle, there's a race here between the heap getting registered
to sysfs and the dma_mask getting updated.

I don't think it would cause a problem in practice, but with the API
as it is, there's no way to avoid it - so I wonder if the
dma_heap_get_dev() accessor isn't the right API design.

Cheers,
-Brian

> + dma_coerce_mask_and_coherent(dma_heap_get_dev(sys_uncached_heap), DMA_BIT_MASK(64));
> +
> return 0;
> }
> module_init(system_heap_create);
> --
> 2.17.1
>

2020-10-08 15:49:16

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation

Hi John,

On Sat, Oct 03, 2020 at 04:02:50AM +0000, John Stultz wrote:
> Hey All,

...

>
> I did add to this series a reworked version of my uncached
> system heap implementation I was submitting a few weeks back.
> Since it duplicated a lot of the now reworked system heap code,
> I realized it would be much simpler to add the functionality to
> the system_heap implementaiton itself.

That looks like a neat approach to me. Referencing your previous
thread, I like the separate heap (as you have done), rather than a
generic "cached"/"noncached" flag on all heaps.

>
> While not improving the core allocation performance, the
> uncached heap allocations do result in *much* improved
> performance on HiKey960 as it avoids a lot of flushing and
> invalidating buffers that the cpu doesn't touch often.
>
> Feedback on these would be great!

Minor nit: s/detatch/detach/ on both heaps, but other than that
you can add my r-b to patches 1-5.

As you've said, it does feel like there's some room for
de-duplication, but that will be easier to work out once the
implementations settle.

I've a couple of comments for the uncached heap, but I'm not confident
I understand the implications of having the non-cached alias enough to
say if it looks OK or not.

Cheers!
-Brian

2020-10-16 20:00:02

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation

On Thu, Oct 8, 2020 at 4:36 AM Brian Starkey <[email protected]> wrote:
>
> Hi John,
>
> On Sat, Oct 03, 2020 at 04:02:50AM +0000, John Stultz wrote:
> > Hey All,
>
> ...
>
> >
> > I did add to this series a reworked version of my uncached
> > system heap implementation I was submitting a few weeks back.
> > Since it duplicated a lot of the now reworked system heap code,
> > I realized it would be much simpler to add the functionality to
> > the system_heap implementaiton itself.
>
> That looks like a neat approach to me. Referencing your previous
> thread, I like the separate heap (as you have done), rather than a
> generic "cached"/"noncached" flag on all heaps.
>

Sounds good! I really appreciate the feedback on this.

> > While not improving the core allocation performance, the
> > uncached heap allocations do result in *much* improved
> > performance on HiKey960 as it avoids a lot of flushing and
> > invalidating buffers that the cpu doesn't touch often.
> >
> > Feedback on these would be great!
>
> Minor nit: s/detatch/detach/ on both heaps, but other than that
> you can add my r-b to patches 1-5.

Doh! Thanks for the spelling catch! Thanks again!

> As you've said, it does feel like there's some room for
> de-duplication, but that will be easier to work out once the
> implementations settle.
>
> I've a couple of comments for the uncached heap, but I'm not confident
> I understand the implications of having the non-cached alias enough to
> say if it looks OK or not.

Thanks so much!
-john

2020-10-16 20:05:44

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

On Thu, Oct 8, 2020 at 4:51 AM Brian Starkey <[email protected]> wrote:
> On Sat, Oct 03, 2020 at 04:02:57AM +0000, John Stultz wrote:
> > @@ -215,8 +236,12 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
> > struct page **pages = vmalloc(sizeof(struct page *) * npages);
> > struct page **tmp = pages;
> > struct sg_page_iter piter;
> > + pgprot_t pgprot = PAGE_KERNEL;
> > void *vaddr;
> >
> > + if (buffer->uncached)
> > + pgprot = pgprot_writecombine(PAGE_KERNEL);
>
> I think this should go after the 'if (!pages)' check. Best to get the
> allocation failure check as close to the allocation as possible IMO.

Sounds good. Changed in my tree.

> > @@ -393,6 +424,16 @@ static int system_heap_allocate(struct dma_heap *heap,
> > /* just return, as put will call release and that will free */
> > return ret;
> > }
> > +
> > + /*
> > + * For uncached buffers, we need to initially flush cpu cache, since
> > + * the __GFP_ZERO on the allocation means the zeroing was done by the
> > + * cpu and thus it is likely cached. Map (and implicitly flush) it out
> > + * now so we don't get corruption later on.
> > + */
> > + if (buffer->uncached)
> > + dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
>
> Do we have to keep this mapping around for the entire lifetime of the
> buffer?

Yea, I guess we can just map and unmap it right there. It will look a
little absurd, but that sort of aligns with your next point.

> Also, this problem (and solution) keeps lingering around. It really
> feels like there should be a better way to solve "clean the linear
> mapping all the way to DRAM", but I don't know what that should be.

Yea, something better here would be nice...


> > @@ -426,6 +487,16 @@ static int system_heap_create(void)
> > if (IS_ERR(sys_heap))
> > return PTR_ERR(sys_heap);
> >
> > + exp_info.name = "system-uncached";
> > + exp_info.ops = &system_uncached_heap_ops;
> > + exp_info.priv = NULL;
> > +
> > + sys_uncached_heap = dma_heap_add(&exp_info);
> > + if (IS_ERR(sys_uncached_heap))
> > + return PTR_ERR(sys_heap);
> > +
>
> In principle, there's a race here between the heap getting registered
> to sysfs and the dma_mask getting updated.
>
> I don't think it would cause a problem in practice, but with the API
> as it is, there's no way to avoid it - so I wonder if the
> dma_heap_get_dev() accessor isn't the right API design.

Hrm. I guess to address your concern we would need split
dma_heap_add() into something like:
dma_heap_create()
dma_heap_add()

Which breaks the creation of the heap with the registering it to the
subsystem, so some attributes can be tweaked inbetween?

I'll see about taking a stab at this, but I'll probably submit my
updated series sooner with this un-addressed so I can get some further
review.

thanks
-john

2020-10-17 05:37:04

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

On Fri, Oct 16, 2020 at 12:03 PM John Stultz <[email protected]> wrote:
> On Thu, Oct 8, 2020 at 4:51 AM Brian Starkey <[email protected]> wrote:
> > On Sat, Oct 03, 2020 at 04:02:57AM +0000, John Stultz wrote:
> > > @@ -426,6 +487,16 @@ static int system_heap_create(void)
> > > if (IS_ERR(sys_heap))
> > > return PTR_ERR(sys_heap);
> > >
> > > + exp_info.name = "system-uncached";
> > > + exp_info.ops = &system_uncached_heap_ops;
> > > + exp_info.priv = NULL;
> > > +
> > > + sys_uncached_heap = dma_heap_add(&exp_info);
> > > + if (IS_ERR(sys_uncached_heap))
> > > + return PTR_ERR(sys_heap);
> > > +
> >
> > In principle, there's a race here between the heap getting registered
> > to sysfs and the dma_mask getting updated.
> >
> > I don't think it would cause a problem in practice, but with the API
> > as it is, there's no way to avoid it - so I wonder if the
> > dma_heap_get_dev() accessor isn't the right API design.
>
> Hrm. I guess to address your concern we would need split
> dma_heap_add() into something like:
> dma_heap_create()
> dma_heap_add()
>
> Which breaks the creation of the heap with the registering it to the
> subsystem, so some attributes can be tweaked inbetween?

Looking at this some more, this approach isn't going to work. We
create the device and then we call dma_coerce_mask_and_coherent() on
it, but as soon as the device is created it seems possible for
userland to directly access it. Again, though, as you mentioned this
isn't terribly likely in practice.

The best thing I can think of for now is to have the uncached heap's
allocate pointer initially point to a dummy function that returns
EBUSY and then after we update the dma mask then we can set it to the
real alloc. I'll go with that for now, but let me know if you have
other suggestions.

thanks
-john

2021-01-29 01:27:58

by Daniel Mentz

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

On Fri, Oct 16, 2020 at 12:04 PM John Stultz <[email protected]> wrote:
>
> On Thu, Oct 8, 2020 at 4:51 AM Brian Starkey <[email protected]> wrote:
> > On Sat, Oct 03, 2020 at 04:02:57AM +0000, John Stultz wrote:
> > > @@ -393,6 +424,16 @@ static int system_heap_allocate(struct dma_heap *heap,
> > > /* just return, as put will call release and that will free */
> > > return ret;
> > > }
> > > +
> > > + /*
> > > + * For uncached buffers, we need to initially flush cpu cache, since
> > > + * the __GFP_ZERO on the allocation means the zeroing was done by the
> > > + * cpu and thus it is likely cached. Map (and implicitly flush) it out
> > > + * now so we don't get corruption later on.
> > > + */
> > > + if (buffer->uncached)
> > > + dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
> >
> > Do we have to keep this mapping around for the entire lifetime of the
> > buffer?
>
> Yea, I guess we can just map and unmap it right there. It will look a
> little absurd, but that sort of aligns with your next point.
>
> > Also, this problem (and solution) keeps lingering around. It really
> > feels like there should be a better way to solve "clean the linear
> > mapping all the way to DRAM", but I don't know what that should be.
>
> Yea, something better here would be nice...

In ION, we had a little helper function named
ion_buffer_prep_noncached that called arch_dma_prep_coherent() on all
sg entries like so

for_each_sg(table->sgl, sg, table->orig_nents, i)
arch_dma_prep_coherent(sg_page(sg), sg->length);

Would that help?