2019-01-11 20:59:16

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

Hello all,

This is a set of (hopefully) non-controversial cleanups for the ION
framework and current set of heaps. These were found as I start to
familiarize myself with the framework to help in whatever way I
can in getting all this up to the standards needed for de-staging.

I would like to get some ideas of what is left to work on to get ION
out of staging. Has there been some kind of agreement on what ION should
eventually end up being? To me it looks like it is being whittled away at
to it's most core functions. To me that is looking like being a DMA-BUF
user-space front end, simply advertising available memory backings in a
system and providing allocations as DMA-BUF handles. If this is the case
then it looks close to being ready to me at least, but I would love to
hear any other opinions and concerns.

Back to this patchset, the last patch may be a bit different than the
others, it adds an unmapped heaps type and creation helper. I wanted to
get this in to show off another heap type and maybe some issues we may
have with the current ION framework. The unmapped heap is used when the
backing memory should not (or cannot) be touched. Currently this kind
of heap is used for firewalled secure memory that can be allocated like
normal heap memory but only used by secure devices (OP-TEE, crypto HW,
etc). It is basically just copied from the "carveout" heap type with the
only difference being it is not mappable to userspace and we do not clear
the memory (as we should not map it either). So should this really be a
new heap type? Or maybe advertised as a carveout heap but with an
additional allocation flag? Perhaps we do away with "types" altogether
and just have flags, coherent/non-coherent, mapped/unmapped, etc.

Maybe more thinking will be needed afterall..

Thanks,
Andrew

Andrew F. Davis (14):
staging: android: ion: Add proper header information
staging: android: ion: Remove empty ion_ioctl_dir() function
staging: android: ion: Merge ion-ioctl.c into ion.c
staging: android: ion: Remove leftover comment
staging: android: ion: Remove struct ion_platform_heap
staging: android: ion: Fixup some white-space issues
staging: android: ion: Sync comment docs with struct ion_buffer
staging: android: ion: Remove base from ion_carveout_heap
staging: android: ion: Remove base from ion_chunk_heap
staging: android: ion: Remove unused headers
staging: android: ion: Allow heap name to be null
staging: android: ion: Declare helpers for carveout and chunk heaps
staging: android: ion: Do not sync CPU cache on map/unmap
staging: android: ion: Add UNMAPPED heap type and helper

drivers/staging/android/ion/Kconfig | 10 ++
drivers/staging/android/ion/Makefile | 3 +-
drivers/staging/android/ion/ion-ioctl.c | 98 --------------
drivers/staging/android/ion/ion.c | 93 +++++++++++--
drivers/staging/android/ion/ion.h | 87 ++++++++-----
.../staging/android/ion/ion_carveout_heap.c | 19 +--
drivers/staging/android/ion/ion_chunk_heap.c | 25 ++--
drivers/staging/android/ion/ion_cma_heap.c | 6 +-
drivers/staging/android/ion/ion_heap.c | 8 +-
drivers/staging/android/ion/ion_page_pool.c | 2 +-
drivers/staging/android/ion/ion_system_heap.c | 8 +-
.../staging/android/ion/ion_unmapped_heap.c | 123 ++++++++++++++++++
drivers/staging/android/uapi/ion.h | 3 +
13 files changed, 307 insertions(+), 178 deletions(-)
delete mode 100644 drivers/staging/android/ion/ion-ioctl.c
create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c

--
2.19.1



2019-01-11 20:57:34

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 02/14] staging: android: ion: Remove empty ion_ioctl_dir() function

This function is empty of real function and can be replaced with
_IOC_DIR().

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion-ioctl.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c
index a8d3cc412fb9..b366f97a5728 100644
--- a/drivers/staging/android/ion/ion-ioctl.c
+++ b/drivers/staging/android/ion/ion-ioctl.c
@@ -31,23 +31,11 @@ static int validate_ioctl_arg(unsigned int cmd, union ion_ioctl_arg *arg)
return 0;
}

-/* fix up the cases where the ioctl direction bits are incorrect */
-static unsigned int ion_ioctl_dir(unsigned int cmd)
-{
- switch (cmd) {
- default:
- return _IOC_DIR(cmd);
- }
-}
-
long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
int ret = 0;
- unsigned int dir;
union ion_ioctl_arg data;

- dir = ion_ioctl_dir(cmd);
-
if (_IOC_SIZE(cmd) > sizeof(data))
return -EINVAL;

@@ -65,7 +53,7 @@ long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return ret;
}

- if (!(dir & _IOC_WRITE))
+ if (!(_IOC_DIR(cmd) & _IOC_WRITE))
memset(&data, 0, sizeof(data));

switch (cmd) {
@@ -90,7 +78,7 @@ long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -ENOTTY;
}

- if (dir & _IOC_READ) {
+ if (_IOC_DIR(cmd) & _IOC_READ) {
if (copy_to_user((void __user *)arg, &data, _IOC_SIZE(cmd)))
return -EFAULT;
}
--
2.19.1


2019-01-11 20:57:49

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 10/14] staging: android: ion: Remove unused headers

Various cleanups have removed the use of some headers in ION, remove
these here.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.c | 3 ---
drivers/staging/android/ion/ion_carveout_heap.c | 4 ++--
drivers/staging/android/ion/ion_chunk_heap.c | 3 +--
3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index 2d6d8c0994b2..bba5f682bc25 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -5,7 +5,6 @@
* Copyright (C) 2011 Google, Inc.
*/

-#include <linux/anon_inodes.h>
#include <linux/debugfs.h>
#include <linux/device.h>
#include <linux/dma-buf.h>
@@ -14,10 +13,8 @@
#include <linux/file.h>
#include <linux/freezer.h>
#include <linux/fs.h>
-#include <linux/idr.h>
#include <linux/kthread.h>
#include <linux/list.h>
-#include <linux/memblock.h>
#include <linux/miscdevice.h>
#include <linux/mm.h>
#include <linux/mm_types.h>
diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c
index ab9b72adca9c..bb9d614767a2 100644
--- a/drivers/staging/android/ion/ion_carveout_heap.c
+++ b/drivers/staging/android/ion/ion_carveout_heap.c
@@ -4,7 +4,7 @@
*
* Copyright (C) 2011 Google, Inc.
*/
-#include <linux/spinlock.h>
+
#include <linux/dma-mapping.h>
#include <linux/err.h>
#include <linux/genalloc.h>
@@ -12,7 +12,7 @@
#include <linux/mm.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
-#include <linux/vmalloc.h>
+
#include "ion.h"

#define ION_CARVEOUT_ALLOCATE_FAIL -1
diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index 899380beeee1..3cdde9c1a717 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -8,11 +8,10 @@
#include <linux/dma-mapping.h>
#include <linux/err.h>
#include <linux/genalloc.h>
-#include <linux/io.h>
#include <linux/mm.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
-#include <linux/vmalloc.h>
+
#include "ion.h"

struct ion_chunk_heap {
--
2.19.1


2019-01-11 20:57:53

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 12/14] staging: android: ion: Declare helpers for carveout and chunk heaps

When enabled the helpers functions for creating carveout and chunk heaps
should have declarations in the ION header.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.h | 33 +++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index e291299fd35f..97b2876b165a 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -308,4 +308,37 @@ void ion_page_pool_free(struct ion_page_pool *pool, struct page *page);
int ion_page_pool_shrink(struct ion_page_pool *pool, gfp_t gfp_mask,
int nr_to_scan);

+#ifdef CONFIG_ION_CARVEOUT_HEAP
+/**
+ * ion_carveout_heap_create
+ * @base: base address of carveout memory
+ * @size: size of carveout memory region
+ *
+ * Creates a carveout ion_heap using the passed in data
+ */
+struct ion_heap *ion_carveout_heap_create(phys_addr_t base, size_t size);
+#else
+static inline struct ion_heap *ion_carveout_heap_create(phys_addr_t base, size_t size)
+{
+ return ERR_PTR(-ENODEV);
+}
+#endif
+
+#ifdef CONFIG_ION_CHUNK_HEAP
+/**
+ * ion_chunk_heap_create
+ * @base: base address of carveout memory
+ * @size: size of carveout memory region
+ * @chunk_size: minimum allocation granularity
+ *
+ * Creates a chunk ion_heap using the passed in data
+ */
+struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t size, size_t chunk_size);
+#else
+static inline struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t size, size_t chunk_size)
+{
+ return ERR_PTR(-ENODEV);
+}
+#endif
+
#endif /* _ION_H */
--
2.19.1


2019-01-11 20:58:04

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 07/14] staging: android: ion: Sync comment docs with struct ion_buffer

This struct is no longer documented correctly, fix this.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index 2ef78c951a6b..e291299fd35f 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -23,8 +23,8 @@

/**
* struct ion_buffer - metadata for a particular buffer
- * @ref: reference count
* @node: node in the ion_device buffers tree
+ * @list: element in list of deferred freeable buffers
* @dev: back pointer to the ion_device
* @heap: back pointer to the heap the buffer came from
* @flags: buffer specific flags
@@ -35,7 +35,8 @@
* @lock: protects the buffers cnt fields
* @kmap_cnt: number of times the buffer is mapped to the kernel
* @vaddr: the kernel mapping if kmap_cnt is not zero
- * @sg_table: the sg table for the buffer if dmap_cnt is not zero
+ * @sg_table: the sg table for the buffer
+ * @attachments: list of devices attached to this buffer
*/
struct ion_buffer {
union {
@@ -151,12 +152,16 @@ struct ion_heap {
unsigned long flags;
unsigned int id;
const char *name;
+
+ /* deferred free support */
struct shrinker shrinker;
struct list_head free_list;
size_t free_list_size;
spinlock_t free_lock;
wait_queue_head_t waitqueue;
struct task_struct *task;
+
+ /* heap statistics */
u64 num_of_buffers;
u64 num_of_alloc_bytes;
u64 alloc_bytes_wm;
--
2.19.1


2019-01-11 20:58:20

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 05/14] staging: android: ion: Remove struct ion_platform_heap

Now that ION heap registration has been re-worked to not depend on board
files or have a central heap register helper there is no need to have
this data structure. Most of the fields are unused.

Some heap creation helpers are still available that use this to define
the a heap but only use 2 or 3 elements from this struct, just convert
these to get supplied these values from the heap registrar directly.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.h | 23 -------------------
.../staging/android/ion/ion_carveout_heap.c | 12 ++++------
drivers/staging/android/ion/ion_chunk_heap.c | 17 ++++++--------
3 files changed, 11 insertions(+), 41 deletions(-)

diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index 084f246dcfc9..2ef78c951a6b 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -21,29 +21,6 @@

#include "../uapi/ion.h"

-/**
- * struct ion_platform_heap - defines a heap in the given platform
- * @type: type of the heap from ion_heap_type enum
- * @id: unique identifier for heap. When allocating higher numb ers
- * will be allocated from first. At allocation these are passed
- * as a bit mask and therefore can not exceed ION_NUM_HEAP_IDS.
- * @name: used for debug purposes
- * @base: base address of heap in physical memory if applicable
- * @size: size of the heap in bytes if applicable
- * @priv: private info passed from the board file
- *
- * Provided by the board file.
- */
-struct ion_platform_heap {
- enum ion_heap_type type;
- unsigned int id;
- const char *name;
- phys_addr_t base;
- size_t size;
- phys_addr_t align;
- void *priv;
-};
-
/**
* struct ion_buffer - metadata for a particular buffer
* @ref: reference count
diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c
index 4a9f9c275654..891f5703220b 100644
--- a/drivers/staging/android/ion/ion_carveout_heap.c
+++ b/drivers/staging/android/ion/ion_carveout_heap.c
@@ -103,17 +103,14 @@ static struct ion_heap_ops carveout_heap_ops = {
.unmap_kernel = ion_heap_unmap_kernel,
};

-struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data)
+struct ion_heap *ion_carveout_heap_create(phys_addr_t base, size_t size)
{
struct ion_carveout_heap *carveout_heap;
int ret;

struct page *page;
- size_t size;
-
- page = pfn_to_page(PFN_DOWN(heap_data->base));
- size = heap_data->size;

+ page = pfn_to_page(PFN_DOWN(base));
ret = ion_heap_pages_zero(page, size, pgprot_writecombine(PAGE_KERNEL));
if (ret)
return ERR_PTR(ret);
@@ -127,9 +124,8 @@ struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data)
kfree(carveout_heap);
return ERR_PTR(-ENOMEM);
}
- carveout_heap->base = heap_data->base;
- gen_pool_add(carveout_heap->pool, carveout_heap->base, heap_data->size,
- -1);
+ carveout_heap->base = base;
+ gen_pool_add(carveout_heap->pool, carveout_heap->base, size, -1);
carveout_heap->heap.ops = &carveout_heap_ops;
carveout_heap->heap.type = ION_HEAP_TYPE_CARVEOUT;
carveout_heap->heap.flags = ION_HEAP_FLAG_DEFER_FREE;
diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index 5a8917d9beac..c2321a047f0f 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -108,16 +108,13 @@ static struct ion_heap_ops chunk_heap_ops = {
.unmap_kernel = ion_heap_unmap_kernel,
};

-struct ion_heap *ion_chunk_heap_create(struct ion_platform_heap *heap_data)
+struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t size, size_t chunk_size)
{
struct ion_chunk_heap *chunk_heap;
int ret;
struct page *page;
- size_t size;
-
- page = pfn_to_page(PFN_DOWN(heap_data->base));
- size = heap_data->size;

+ page = pfn_to_page(PFN_DOWN(base));
ret = ion_heap_pages_zero(page, size, pgprot_writecombine(PAGE_KERNEL));
if (ret)
return ERR_PTR(ret);
@@ -126,23 +123,23 @@ struct ion_heap *ion_chunk_heap_create(struct ion_platform_heap *heap_data)
if (!chunk_heap)
return ERR_PTR(-ENOMEM);

- chunk_heap->chunk_size = (unsigned long)heap_data->priv;
+ chunk_heap->chunk_size = chunk_size;
chunk_heap->pool = gen_pool_create(get_order(chunk_heap->chunk_size) +
PAGE_SHIFT, -1);
if (!chunk_heap->pool) {
ret = -ENOMEM;
goto error_gen_pool_create;
}
- chunk_heap->base = heap_data->base;
- chunk_heap->size = heap_data->size;
+ chunk_heap->base = base;
+ chunk_heap->size = size;
chunk_heap->allocated = 0;

- gen_pool_add(chunk_heap->pool, chunk_heap->base, heap_data->size, -1);
+ gen_pool_add(chunk_heap->pool, chunk_heap->base, size, -1);
chunk_heap->heap.ops = &chunk_heap_ops;
chunk_heap->heap.type = ION_HEAP_TYPE_CHUNK;
chunk_heap->heap.flags = ION_HEAP_FLAG_DEFER_FREE;
pr_debug("%s: base %pa size %zu\n", __func__,
- &chunk_heap->base, heap_data->size);
+ &chunk_heap->base, size);

return &chunk_heap->heap;

--
2.19.1


2019-01-11 20:58:23

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 01/14] staging: android: ion: Add proper header information

The filenames in headers add nothing are often wrong after moves, lets
drop them here and add a little description of the files contents.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.c | 2 +-
drivers/staging/android/ion/ion.h | 2 +-
drivers/staging/android/ion/ion_carveout_heap.c | 2 +-
drivers/staging/android/ion/ion_chunk_heap.c | 2 +-
drivers/staging/android/ion/ion_cma_heap.c | 2 +-
drivers/staging/android/ion/ion_heap.c | 2 +-
drivers/staging/android/ion/ion_page_pool.c | 2 +-
drivers/staging/android/ion/ion_system_heap.c | 2 +-
8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index a0802de8c3a1..de1ca5e26a4a 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion.c
+ * ION Memory Allocator
*
* Copyright (C) 2011 Google, Inc.
*/
diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index 47b594cf1ac9..ff455fdde1a8 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
- * drivers/staging/android/ion/ion.h
+ * ION Memory Allocator kernel interface header
*
* Copyright (C) 2011 Google, Inc.
*/
diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c
index e129237a0417..4a9f9c275654 100644
--- a/drivers/staging/android/ion/ion_carveout_heap.c
+++ b/drivers/staging/android/ion/ion_carveout_heap.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion_carveout_heap.c
+ * ION Memory Allocator carveout heap helper
*
* Copyright (C) 2011 Google, Inc.
*/
diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index 159d72f5bc42..5a8917d9beac 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion_chunk_heap.c
+ * ION memory allocator chunk heap helper
*
* Copyright (C) 2012 Google, Inc.
*/
diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c
index 3fafd013d80a..7b557dd5e78b 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion_cma_heap.c
+ * ION Memory Allocator CMA heap exporter
*
* Copyright (C) Linaro 2012
* Author: <[email protected]> for ST-Ericsson.
diff --git a/drivers/staging/android/ion/ion_heap.c b/drivers/staging/android/ion/ion_heap.c
index 31db510018a9..6ee0ac6e4be4 100644
--- a/drivers/staging/android/ion/ion_heap.c
+++ b/drivers/staging/android/ion/ion_heap.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion_heap.c
+ * ION Memory Allocator generic heap helpers
*
* Copyright (C) 2011 Google, Inc.
*/
diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index 0d2a95957ee8..fd4995fb676e 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion_mem_pool.c
+ * ION Memory Allocator page pool helpers
*
* Copyright (C) 2011 Google, Inc.
*/
diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c
index 0383f7548d48..643b32099488 100644
--- a/drivers/staging/android/ion/ion_system_heap.c
+++ b/drivers/staging/android/ion/ion_system_heap.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * drivers/staging/android/ion/ion_system_heap.c
+ * ION Memory Allocator system heap exporter
*
* Copyright (C) 2011 Google, Inc.
*/
--
2.19.1


2019-01-11 20:58:32

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 09/14] staging: android: ion: Remove base from ion_chunk_heap

The base address is not used anywhere and tracked by the pool
allocator. No need to store this anymore.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion_chunk_heap.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index b82eac1c2d7a..899380beeee1 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -18,7 +18,6 @@
struct ion_chunk_heap {
struct ion_heap heap;
struct gen_pool *pool;
- phys_addr_t base;
unsigned long chunk_size;
unsigned long size;
unsigned long allocated;
@@ -131,16 +130,14 @@ struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t size, size_t chu
ret = -ENOMEM;
goto error_gen_pool_create;
}
- chunk_heap->base = base;
chunk_heap->size = size;
chunk_heap->allocated = 0;

- gen_pool_add(chunk_heap->pool, chunk_heap->base, size, -1);
+ gen_pool_add(chunk_heap->pool, base, size, -1);
chunk_heap->heap.ops = &chunk_heap_ops;
chunk_heap->heap.type = ION_HEAP_TYPE_CHUNK;
chunk_heap->heap.flags = ION_HEAP_FLAG_DEFER_FREE;
- pr_debug("%s: base %pa size %zu\n", __func__,
- &chunk_heap->base, size);
+ pr_debug("%s: base %pa size %zu\n", __func__, &base, size);

return &chunk_heap->heap;

--
2.19.1


2019-01-11 20:58:49

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 04/14] staging: android: ion: Remove leftover comment

Since we use CMA APIs directly there is no device nor private heaps data,
drop this comment.

Fixes: 204f672255c2 ("staging: android: ion: Use CMA APIs directly")
Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion_cma_heap.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c
index 7b557dd5e78b..bf65e67ef9d8 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -111,10 +111,6 @@ static struct ion_heap *__ion_cma_heap_create(struct cma *cma)
return ERR_PTR(-ENOMEM);

cma_heap->heap.ops = &ion_cma_ops;
- /*
- * get device from private heaps data, later it will be
- * used to make the link with reserved CMA memory
- */
cma_heap->cma = cma;
cma_heap->heap.type = ION_HEAP_TYPE_DMA;
return &cma_heap->heap;
--
2.19.1


2019-01-11 20:59:05

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 06/14] staging: android: ion: Fixup some white-space issues

Add white-space for easier reading and remove some where it does
not belong. No functional changes, they just bug me..

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion_carveout_heap.c | 1 +
drivers/staging/android/ion/ion_chunk_heap.c | 2 +-
drivers/staging/android/ion/ion_heap.c | 6 ++++++
drivers/staging/android/ion/ion_system_heap.c | 6 +++++-
4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c
index 891f5703220b..be80671df9c8 100644
--- a/drivers/staging/android/ion/ion_carveout_heap.c
+++ b/drivers/staging/android/ion/ion_carveout_heap.c
@@ -44,6 +44,7 @@ static void ion_carveout_free(struct ion_heap *heap, phys_addr_t addr,

if (addr == ION_CARVEOUT_ALLOCATE_FAIL)
return;
+
gen_pool_free(carveout_heap->pool, addr, size);
}

diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index c2321a047f0f..b82eac1c2d7a 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -4,6 +4,7 @@
*
* Copyright (C) 2012 Google, Inc.
*/
+
#include <linux/dma-mapping.h>
#include <linux/err.h>
#include <linux/genalloc.h>
@@ -147,4 +148,3 @@ struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t size, size_t chu
kfree(chunk_heap);
return ERR_PTR(ret);
}
-
diff --git a/drivers/staging/android/ion/ion_heap.c b/drivers/staging/android/ion/ion_heap.c
index 6ee0ac6e4be4..473b465724f1 100644
--- a/drivers/staging/android/ion/ion_heap.c
+++ b/drivers/staging/android/ion/ion_heap.c
@@ -14,6 +14,7 @@
#include <uapi/linux/sched/types.h>
#include <linux/scatterlist.h>
#include <linux/vmalloc.h>
+
#include "ion.h"

void *ion_heap_map_kernel(struct ion_heap *heap,
@@ -92,6 +93,7 @@ int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer,
if (addr >= vma->vm_end)
return 0;
}
+
return 0;
}

@@ -254,6 +256,7 @@ int ion_heap_init_deferred_free(struct ion_heap *heap)
return PTR_ERR_OR_ZERO(heap->task);
}
sched_setscheduler(heap->task, SCHED_IDLE, &param);
+
return 0;
}

@@ -265,8 +268,10 @@ static unsigned long ion_heap_shrink_count(struct shrinker *shrinker,
int total = 0;

total = ion_heap_freelist_size(heap) / PAGE_SIZE;
+
if (heap->ops->shrink)
total += heap->ops->shrink(heap, sc->gfp_mask, 0);
+
return total;
}

@@ -295,6 +300,7 @@ static unsigned long ion_heap_shrink_scan(struct shrinker *shrinker,

if (heap->ops->shrink)
freed += heap->ops->shrink(heap, sc->gfp_mask, to_scan);
+
return freed;
}

diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c
index 643b32099488..ec526a464db8 100644
--- a/drivers/staging/android/ion/ion_system_heap.c
+++ b/drivers/staging/android/ion/ion_system_heap.c
@@ -13,6 +13,7 @@
#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
+
#include "ion.h"

#define NUM_ORDERS ARRAY_SIZE(orders)
@@ -236,6 +237,7 @@ static int ion_system_heap_create_pools(struct ion_page_pool **pools)
goto err_create_pool;
pools[i] = pool;
}
+
return 0;

err_create_pool:
@@ -274,6 +276,7 @@ static int ion_system_heap_create(void)
heap->name = "ion_system_heap";

ion_device_add_heap(heap);
+
return 0;
}
device_initcall(ion_system_heap_create);
@@ -355,6 +358,7 @@ static struct ion_heap *__ion_system_contig_heap_create(void)
heap->ops = &kmalloc_ops;
heap->type = ION_HEAP_TYPE_SYSTEM_CONTIG;
heap->name = "ion_system_contig_heap";
+
return heap;
}

@@ -367,7 +371,7 @@ static int ion_system_contig_heap_create(void)
return PTR_ERR(heap);

ion_device_add_heap(heap);
+
return 0;
}
device_initcall(ion_system_contig_heap_create);
-
--
2.19.1


2019-01-11 20:59:05

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 03/14] staging: android: ion: Merge ion-ioctl.c into ion.c

The file ion-ioctl.c is now much to small and tightly integrated
with the main ion.c file to justify keeping it separate. Merge
this file.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/Makefile | 2 +-
drivers/staging/android/ion/ion-ioctl.c | 86 -------------------------
drivers/staging/android/ion/ion.c | 79 ++++++++++++++++++++++-
drivers/staging/android/ion/ion.h | 8 ---
4 files changed, 78 insertions(+), 97 deletions(-)
delete mode 100644 drivers/staging/android/ion/ion-ioctl.c

diff --git a/drivers/staging/android/ion/Makefile b/drivers/staging/android/ion/Makefile
index bb30bf8774a0..17f3a7569e3d 100644
--- a/drivers/staging/android/ion/Makefile
+++ b/drivers/staging/android/ion/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_ION) += ion.o ion-ioctl.o ion_heap.o
+obj-$(CONFIG_ION) += ion.o ion_heap.o
obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o ion_page_pool.o
obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c
deleted file mode 100644
index b366f97a5728..000000000000
--- a/drivers/staging/android/ion/ion-ioctl.c
+++ /dev/null
@@ -1,86 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (C) 2011 Google, Inc.
- */
-
-#include <linux/kernel.h>
-#include <linux/file.h>
-#include <linux/fs.h>
-#include <linux/uaccess.h>
-
-#include "ion.h"
-
-union ion_ioctl_arg {
- struct ion_allocation_data allocation;
- struct ion_heap_query query;
-};
-
-static int validate_ioctl_arg(unsigned int cmd, union ion_ioctl_arg *arg)
-{
- switch (cmd) {
- case ION_IOC_HEAP_QUERY:
- if (arg->query.reserved0 ||
- arg->query.reserved1 ||
- arg->query.reserved2)
- return -EINVAL;
- break;
- default:
- break;
- }
-
- return 0;
-}
-
-long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
-{
- int ret = 0;
- union ion_ioctl_arg data;
-
- if (_IOC_SIZE(cmd) > sizeof(data))
- return -EINVAL;
-
- /*
- * The copy_from_user is unconditional here for both read and write
- * to do the validate. If there is no write for the ioctl, the
- * buffer is cleared
- */
- if (copy_from_user(&data, (void __user *)arg, _IOC_SIZE(cmd)))
- return -EFAULT;
-
- ret = validate_ioctl_arg(cmd, &data);
- if (ret) {
- pr_warn_once("%s: ioctl validate failed\n", __func__);
- return ret;
- }
-
- if (!(_IOC_DIR(cmd) & _IOC_WRITE))
- memset(&data, 0, sizeof(data));
-
- switch (cmd) {
- case ION_IOC_ALLOC:
- {
- int fd;
-
- fd = ion_alloc(data.allocation.len,
- data.allocation.heap_id_mask,
- data.allocation.flags);
- if (fd < 0)
- return fd;
-
- data.allocation.fd = fd;
-
- break;
- }
- case ION_IOC_HEAP_QUERY:
- ret = ion_query_heaps(&data.query);
- break;
- default:
- return -ENOTTY;
- }
-
- if (_IOC_DIR(cmd) & _IOC_READ) {
- if (copy_to_user((void __user *)arg, &data, _IOC_SIZE(cmd)))
- return -EFAULT;
- }
- return ret;
-}
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index de1ca5e26a4a..2d6d8c0994b2 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -390,7 +390,7 @@ static const struct dma_buf_ops dma_buf_ops = {
.unmap = ion_dma_buf_kunmap,
};

-int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags)
+static int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags)
{
struct ion_device *dev = internal_dev;
struct ion_buffer *buffer = NULL;
@@ -447,7 +447,7 @@ int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags)
return fd;
}

-int ion_query_heaps(struct ion_heap_query *query)
+static int ion_query_heaps(struct ion_heap_query *query)
{
struct ion_device *dev = internal_dev;
struct ion_heap_data __user *buffer = u64_to_user_ptr(query->heaps);
@@ -492,6 +492,81 @@ int ion_query_heaps(struct ion_heap_query *query)
return ret;
}

+union ion_ioctl_arg {
+ struct ion_allocation_data allocation;
+ struct ion_heap_query query;
+};
+
+static int validate_ioctl_arg(unsigned int cmd, union ion_ioctl_arg *arg)
+{
+ switch (cmd) {
+ case ION_IOC_HEAP_QUERY:
+ if (arg->query.reserved0 ||
+ arg->query.reserved1 ||
+ arg->query.reserved2)
+ return -EINVAL;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ int ret = 0;
+ union ion_ioctl_arg data;
+
+ if (_IOC_SIZE(cmd) > sizeof(data))
+ return -EINVAL;
+
+ /*
+ * The copy_from_user is unconditional here for both read and write
+ * to do the validate. If there is no write for the ioctl, the
+ * buffer is cleared
+ */
+ if (copy_from_user(&data, (void __user *)arg, _IOC_SIZE(cmd)))
+ return -EFAULT;
+
+ ret = validate_ioctl_arg(cmd, &data);
+ if (ret) {
+ pr_warn_once("%s: ioctl validate failed\n", __func__);
+ return ret;
+ }
+
+ if (!(_IOC_DIR(cmd) & _IOC_WRITE))
+ memset(&data, 0, sizeof(data));
+
+ switch (cmd) {
+ case ION_IOC_ALLOC:
+ {
+ int fd;
+
+ fd = ion_alloc(data.allocation.len,
+ data.allocation.heap_id_mask,
+ data.allocation.flags);
+ if (fd < 0)
+ return fd;
+
+ data.allocation.fd = fd;
+
+ break;
+ }
+ case ION_IOC_HEAP_QUERY:
+ ret = ion_query_heaps(&data.query);
+ break;
+ default:
+ return -ENOTTY;
+ }
+
+ if (_IOC_DIR(cmd) & _IOC_READ) {
+ if (copy_to_user((void __user *)arg, &data, _IOC_SIZE(cmd)))
+ return -EFAULT;
+ }
+ return ret;
+}
+
static const struct file_operations ion_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = ion_ioctl,
diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index ff455fdde1a8..084f246dcfc9 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -205,10 +205,6 @@ int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer,
int ion_heap_buffer_zero(struct ion_buffer *buffer);
int ion_heap_pages_zero(struct page *page, size_t size, pgprot_t pgprot);

-int ion_alloc(size_t len,
- unsigned int heap_id_mask,
- unsigned int flags);
-
/**
* ion_heap_init_shrinker
* @heap: the heap
@@ -330,8 +326,4 @@ void ion_page_pool_free(struct ion_page_pool *pool, struct page *page);
int ion_page_pool_shrink(struct ion_page_pool *pool, gfp_t gfp_mask,
int nr_to_scan);

-long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
-
-int ion_query_heaps(struct ion_heap_query *query);
-
#endif /* _ION_H */
--
2.19.1


2019-01-11 20:59:06

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

Buffers may not be mapped from the CPU so skip cache maintenance here.
Accesses from the CPU to a cached heap should be bracketed with
{begin,end}_cpu_access calls so maintenance should not be needed anyway.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index 14e48f6eb734..09cb5a8e2b09 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,

table = a->table;

- if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
- direction))
+ if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
+ direction, DMA_ATTR_SKIP_CPU_SYNC))
return ERR_PTR(-ENOMEM);

return table;
@@ -272,7 +272,8 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
{
- dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
+ dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
+ direction, DMA_ATTR_SKIP_CPU_SYNC);
}

static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
--
2.19.1


2019-01-11 20:59:07

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper

The "unmapped" heap is very similar to the carveout heap except
the backing memory is presumed to be unmappable by the host, in
my specific case due to firewalls. This memory can still be
allocated from and used by devices that do have access to the
backing memory.

Based originally on the secure/unmapped heap from Linaro for
the OP-TEE SDP implementation, this was re-written to match
the carveout heap helper code.

Suggested-by: Etienne Carriere <[email protected]>
Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/Kconfig | 10 ++
drivers/staging/android/ion/Makefile | 1 +
drivers/staging/android/ion/ion.h | 16 +++
.../staging/android/ion/ion_unmapped_heap.c | 123 ++++++++++++++++++
drivers/staging/android/uapi/ion.h | 3 +
5 files changed, 153 insertions(+)
create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c

diff --git a/drivers/staging/android/ion/Kconfig b/drivers/staging/android/ion/Kconfig
index 0fdda6f62953..a117b8b91b14 100644
--- a/drivers/staging/android/ion/Kconfig
+++ b/drivers/staging/android/ion/Kconfig
@@ -42,3 +42,13 @@ config ION_CMA_HEAP
Choose this option to enable CMA heaps with Ion. This heap is backed
by the Contiguous Memory Allocator (CMA). If your system has these
regions, you should say Y here.
+
+config ION_UNMAPPED_HEAP
+ bool "ION unmapped heap support"
+ depends on ION
+ help
+ Choose this option to enable UNMAPPED heaps with Ion. This heap is
+ backed in specific memory pools, carveout from the Linux memory.
+ Unlike carveout heaps these are assumed to be not mappable by
+ kernel or user-space.
+ Unless you know your system has these regions, you should say N here.
diff --git a/drivers/staging/android/ion/Makefile b/drivers/staging/android/ion/Makefile
index 17f3a7569e3d..c71a1f3de581 100644
--- a/drivers/staging/android/ion/Makefile
+++ b/drivers/staging/android/ion/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o ion_page_pool.o
obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o
+obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o
diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index 97b2876b165a..ce74332018ba 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -341,4 +341,20 @@ static inline struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t si
}
#endif

+#ifdef CONFIG_ION_UNMAPPED_HEAP
+/**
+ * ion_unmapped_heap_create
+ * @base: base address of carveout memory
+ * @size: size of carveout memory region
+ *
+ * Creates an unmapped ion_heap using the passed in data
+ */
+struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size);
+#else
+struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
+{
+ return ERR_PTR(-ENODEV);
+}
+#endif
+
#endif /* _ION_H */
diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c b/drivers/staging/android/ion/ion_unmapped_heap.c
new file mode 100644
index 000000000000..7602b659c2ec
--- /dev/null
+++ b/drivers/staging/android/ion/ion_unmapped_heap.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ION Memory Allocator unmapped heap helper
+ *
+ * Copyright (C) 2015-2016 Texas Instruments Incorporated - http://www.ti.com/
+ * Andrew F. Davis <[email protected]>
+ *
+ * ION "unmapped" heaps are physical memory heaps not by default mapped into
+ * a virtual address space. The buffer owner can explicitly request kernel
+ * space mappings but the underlying memory may still not be accessible for
+ * various reasons, such as firewalls.
+ */
+
+#include <linux/err.h>
+#include <linux/genalloc.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+
+#include "ion.h"
+
+#define ION_UNMAPPED_ALLOCATE_FAIL -1
+
+struct ion_unmapped_heap {
+ struct ion_heap heap;
+ struct gen_pool *pool;
+};
+
+static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap,
+ unsigned long size)
+{
+ struct ion_unmapped_heap *unmapped_heap =
+ container_of(heap, struct ion_unmapped_heap, heap);
+ unsigned long offset;
+
+ offset = gen_pool_alloc(unmapped_heap->pool, size);
+ if (!offset)
+ return ION_UNMAPPED_ALLOCATE_FAIL;
+
+ return offset;
+}
+
+static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr,
+ unsigned long size)
+{
+ struct ion_unmapped_heap *unmapped_heap =
+ container_of(heap, struct ion_unmapped_heap, heap);
+
+ gen_pool_free(unmapped_heap->pool, addr, size);
+}
+
+static int ion_unmapped_heap_allocate(struct ion_heap *heap,
+ struct ion_buffer *buffer,
+ unsigned long size,
+ unsigned long flags)
+{
+ struct sg_table *table;
+ phys_addr_t paddr;
+ int ret;
+
+ table = kmalloc(sizeof(*table), GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+ ret = sg_alloc_table(table, 1, GFP_KERNEL);
+ if (ret)
+ goto err_free;
+
+ paddr = ion_unmapped_allocate(heap, size);
+ if (paddr == ION_UNMAPPED_ALLOCATE_FAIL) {
+ ret = -ENOMEM;
+ goto err_free_table;
+ }
+
+ sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(paddr)), size, 0);
+ buffer->sg_table = table;
+
+ return 0;
+
+err_free_table:
+ sg_free_table(table);
+err_free:
+ kfree(table);
+ return ret;
+}
+
+static void ion_unmapped_heap_free(struct ion_buffer *buffer)
+{
+ struct ion_heap *heap = buffer->heap;
+ struct sg_table *table = buffer->sg_table;
+ struct page *page = sg_page(table->sgl);
+ phys_addr_t paddr = PFN_PHYS(page_to_pfn(page));
+
+ ion_unmapped_free(heap, paddr, buffer->size);
+ sg_free_table(buffer->sg_table);
+ kfree(buffer->sg_table);
+}
+
+static struct ion_heap_ops unmapped_heap_ops = {
+ .allocate = ion_unmapped_heap_allocate,
+ .free = ion_unmapped_heap_free,
+ /* no .map_user, user mapping of unmapped heaps not allowed */
+ .map_kernel = ion_heap_map_kernel,
+ .unmap_kernel = ion_heap_unmap_kernel,
+};
+
+struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
+{
+ struct ion_unmapped_heap *unmapped_heap;
+
+ unmapped_heap = kzalloc(sizeof(*unmapped_heap), GFP_KERNEL);
+ if (!unmapped_heap)
+ return ERR_PTR(-ENOMEM);
+
+ unmapped_heap->pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (!unmapped_heap->pool) {
+ kfree(unmapped_heap);
+ return ERR_PTR(-ENOMEM);
+ }
+ gen_pool_add(unmapped_heap->pool, base, size, -1);
+ unmapped_heap->heap.ops = &unmapped_heap_ops;
+ unmapped_heap->heap.type = ION_HEAP_TYPE_UNMAPPED;
+
+ return &unmapped_heap->heap;
+}
diff --git a/drivers/staging/android/uapi/ion.h b/drivers/staging/android/uapi/ion.h
index 5d7009884c13..d5f98bc5f340 100644
--- a/drivers/staging/android/uapi/ion.h
+++ b/drivers/staging/android/uapi/ion.h
@@ -19,6 +19,8 @@
* carveout heap, allocations are physically
* contiguous
* @ION_HEAP_TYPE_DMA: memory allocated via DMA API
+ * @ION_HEAP_TYPE_UNMAPPED: memory not intended to be mapped into the
+ * linux address space unless for debug cases
* @ION_NUM_HEAPS: helper for iterating over heaps, a bit mask
* is used to identify the heaps, so only 32
* total heap types are supported
@@ -29,6 +31,7 @@ enum ion_heap_type {
ION_HEAP_TYPE_CARVEOUT,
ION_HEAP_TYPE_CHUNK,
ION_HEAP_TYPE_DMA,
+ ION_HEAP_TYPE_UNMAPPED,
ION_HEAP_TYPE_CUSTOM, /*
* must be last so device specific heaps always
* are at the end of this enum
--
2.19.1


2019-01-11 20:59:10

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 11/14] staging: android: ion: Allow heap name to be null

The heap name can be used for debugging but otherwise does not seem
to be required and no other part of the code will fail if left NULL
except here. We can make it required and check for it at some point,
for now lets just prevent this from causing a NULL pointer exception.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index bba5f682bc25..14e48f6eb734 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -467,7 +467,7 @@ static int ion_query_heaps(struct ion_heap_query *query)
max_cnt = query->cnt;

plist_for_each_entry(heap, &dev->heaps, node) {
- strncpy(hdata.name, heap->name, MAX_HEAP_NAME);
+ strncpy(hdata.name, heap->name ?: "(null)", MAX_HEAP_NAME);
hdata.name[sizeof(hdata.name) - 1] = '\0';
hdata.type = heap->type;
hdata.heap_id = heap->id;
--
2.19.1


2019-01-11 20:59:15

by Andrew Davis

[permalink] [raw]
Subject: [PATCH 08/14] staging: android: ion: Remove base from ion_carveout_heap

The base address is not used anywhere and tracked by the pool
allocator. No need to store this anymore.

Signed-off-by: Andrew F. Davis <[email protected]>
---
drivers/staging/android/ion/ion_carveout_heap.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c
index be80671df9c8..ab9b72adca9c 100644
--- a/drivers/staging/android/ion/ion_carveout_heap.c
+++ b/drivers/staging/android/ion/ion_carveout_heap.c
@@ -20,7 +20,6 @@
struct ion_carveout_heap {
struct ion_heap heap;
struct gen_pool *pool;
- phys_addr_t base;
};

static phys_addr_t ion_carveout_allocate(struct ion_heap *heap,
@@ -125,8 +124,7 @@ struct ion_heap *ion_carveout_heap_create(phys_addr_t base, size_t size)
kfree(carveout_heap);
return ERR_PTR(-ENOMEM);
}
- carveout_heap->base = base;
- gen_pool_add(carveout_heap->pool, carveout_heap->base, size, -1);
+ gen_pool_add(carveout_heap->pool, base, size, -1);
carveout_heap->heap.ops = &carveout_heap_ops;
carveout_heap->heap.type = ION_HEAP_TYPE_CARVEOUT;
carveout_heap->heap.flags = ION_HEAP_FLAG_DEFER_FREE;
--
2.19.1


2019-01-14 17:15:25

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Fri, 11 Jan 2019, Andrew F. Davis wrote:

> Buffers may not be mapped from the CPU so skip cache maintenance here.
> Accesses from the CPU to a cached heap should be bracketed with
> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>
> Signed-off-by: Andrew F. Davis <[email protected]>
> ---
> drivers/staging/android/ion/ion.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> index 14e48f6eb734..09cb5a8e2b09 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>
> table = a->table;
>
> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> - direction))
> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> + direction, DMA_ATTR_SKIP_CPU_SYNC))

Unfortunately I don't think you can do this for a couple reasons.
You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
If the calls to {begin,end}_cpu_access were made before the call to
dma_buf_attach then there won't have been a device attached so the calls
to {begin,end}_cpu_access won't have done any cache maintenance.

Also ION no longer provides DMA ready memory, so if you are not doing CPU
access then there is no requirement (that I am aware of) for you to call
{begin,end}_cpu_access before passing the buffer to the device and if this
buffer is cached and your device is not IO-coherent then the cache maintenance
in ion_map_dma_buf and ion_unmap_dma_buf is required.

> return ERR_PTR(-ENOMEM);
>
> return table;
> @@ -272,7 +272,8 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
> struct sg_table *table,
> enum dma_data_direction direction)
> {
> - dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
> + dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
> + direction, DMA_ATTR_SKIP_CPU_SYNC);
> }
>
> static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> --
> 2.19.1
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-15 03:10:24

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On 1/11/19 10:05 AM, Andrew F. Davis wrote:
> Hello all,
>
> This is a set of (hopefully) non-controversial cleanups for the ION
> framework and current set of heaps. These were found as I start to
> familiarize myself with the framework to help in whatever way I
> can in getting all this up to the standards needed for de-staging.
>
> I would like to get some ideas of what is left to work on to get ION
> out of staging. Has there been some kind of agreement on what ION should
> eventually end up being? To me it looks like it is being whittled away at
> to it's most core functions. To me that is looking like being a DMA-BUF
> user-space front end, simply advertising available memory backings in a
> system and providing allocations as DMA-BUF handles. If this is the case
> then it looks close to being ready to me at least, but I would love to
> hear any other opinions and concerns.
>

Yes, at this point the only functionality that people are really
depending on is the ability to allocate a dma_buf easily from userspace.

> Back to this patchset, the last patch may be a bit different than the
> others, it adds an unmapped heaps type and creation helper. I wanted to
> get this in to show off another heap type and maybe some issues we may
> have with the current ION framework. The unmapped heap is used when the
> backing memory should not (or cannot) be touched. Currently this kind
> of heap is used for firewalled secure memory that can be allocated like
> normal heap memory but only used by secure devices (OP-TEE, crypto HW,
> etc). It is basically just copied from the "carveout" heap type with the
> only difference being it is not mappable to userspace and we do not clear
> the memory (as we should not map it either). So should this really be a
> new heap type? Or maybe advertised as a carveout heap but with an
> additional allocation flag? Perhaps we do away with "types" altogether
> and just have flags, coherent/non-coherent, mapped/unmapped, etc.
>
> Maybe more thinking will be needed afterall..
>

So the cleanup looks okay (I need to finish reviewing) but I'm not a
fan of adding another heaptype without solving the problem of adding
some sort of devicetree binding or other method of allocating and
placing Ion heaps. That plus uncached buffers are one of the big
open problems that need to be solved for destaging Ion. See
https://lore.kernel.org/lkml/[email protected]/
for some background on that problem.

Thanks,
Laura

> Thanks,
> Andrew
>
> Andrew F. Davis (14):
> staging: android: ion: Add proper header information
> staging: android: ion: Remove empty ion_ioctl_dir() function
> staging: android: ion: Merge ion-ioctl.c into ion.c
> staging: android: ion: Remove leftover comment
> staging: android: ion: Remove struct ion_platform_heap
> staging: android: ion: Fixup some white-space issues
> staging: android: ion: Sync comment docs with struct ion_buffer
> staging: android: ion: Remove base from ion_carveout_heap
> staging: android: ion: Remove base from ion_chunk_heap
> staging: android: ion: Remove unused headers
> staging: android: ion: Allow heap name to be null
> staging: android: ion: Declare helpers for carveout and chunk heaps
> staging: android: ion: Do not sync CPU cache on map/unmap
> staging: android: ion: Add UNMAPPED heap type and helper
>
> drivers/staging/android/ion/Kconfig | 10 ++
> drivers/staging/android/ion/Makefile | 3 +-
> drivers/staging/android/ion/ion-ioctl.c | 98 --------------
> drivers/staging/android/ion/ion.c | 93 +++++++++++--
> drivers/staging/android/ion/ion.h | 87 ++++++++-----
> .../staging/android/ion/ion_carveout_heap.c | 19 +--
> drivers/staging/android/ion/ion_chunk_heap.c | 25 ++--
> drivers/staging/android/ion/ion_cma_heap.c | 6 +-
> drivers/staging/android/ion/ion_heap.c | 8 +-
> drivers/staging/android/ion/ion_page_pool.c | 2 +-
> drivers/staging/android/ion/ion_system_heap.c | 8 +-
> .../staging/android/ion/ion_unmapped_heap.c | 123 ++++++++++++++++++
> drivers/staging/android/uapi/ion.h | 3 +
> 13 files changed, 307 insertions(+), 178 deletions(-)
> delete mode 100644 drivers/staging/android/ion/ion-ioctl.c
> create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>


2019-01-15 04:41:27

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper

On 1/11/19 10:05 AM, Andrew F. Davis wrote:
> The "unmapped" heap is very similar to the carveout heap except
> the backing memory is presumed to be unmappable by the host, in
> my specific case due to firewalls. This memory can still be
> allocated from and used by devices that do have access to the
> backing memory.
>
> Based originally on the secure/unmapped heap from Linaro for
> the OP-TEE SDP implementation, this was re-written to match
> the carveout heap helper code.
>
> Suggested-by: Etienne Carriere <[email protected]>
> Signed-off-by: Andrew F. Davis <[email protected]>
> ---
> drivers/staging/android/ion/Kconfig | 10 ++
> drivers/staging/android/ion/Makefile | 1 +
> drivers/staging/android/ion/ion.h | 16 +++
> .../staging/android/ion/ion_unmapped_heap.c | 123 ++++++++++++++++++
> drivers/staging/android/uapi/ion.h | 3 +
> 5 files changed, 153 insertions(+)
> create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>
> diff --git a/drivers/staging/android/ion/Kconfig b/drivers/staging/android/ion/Kconfig
> index 0fdda6f62953..a117b8b91b14 100644
> --- a/drivers/staging/android/ion/Kconfig
> +++ b/drivers/staging/android/ion/Kconfig
> @@ -42,3 +42,13 @@ config ION_CMA_HEAP
> Choose this option to enable CMA heaps with Ion. This heap is backed
> by the Contiguous Memory Allocator (CMA). If your system has these
> regions, you should say Y here.
> +
> +config ION_UNMAPPED_HEAP
> + bool "ION unmapped heap support"
> + depends on ION
> + help
> + Choose this option to enable UNMAPPED heaps with Ion. This heap is
> + backed in specific memory pools, carveout from the Linux memory.
> + Unlike carveout heaps these are assumed to be not mappable by
> + kernel or user-space.
> + Unless you know your system has these regions, you should say N here.
> diff --git a/drivers/staging/android/ion/Makefile b/drivers/staging/android/ion/Makefile
> index 17f3a7569e3d..c71a1f3de581 100644
> --- a/drivers/staging/android/ion/Makefile
> +++ b/drivers/staging/android/ion/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o ion_page_pool.o
> obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
> obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
> obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o
> +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o
> diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
> index 97b2876b165a..ce74332018ba 100644
> --- a/drivers/staging/android/ion/ion.h
> +++ b/drivers/staging/android/ion/ion.h
> @@ -341,4 +341,20 @@ static inline struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t si
> }
> #endif
>
> +#ifdef CONFIG_ION_UNMAPPED_HEAP
> +/**
> + * ion_unmapped_heap_create
> + * @base: base address of carveout memory
> + * @size: size of carveout memory region
> + *
> + * Creates an unmapped ion_heap using the passed in data
> + */
> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size);
> +#else
> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
> +{
> + return ERR_PTR(-ENODEV);
> +}
> +#endif
> +
> #endif /* _ION_H */
> diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c b/drivers/staging/android/ion/ion_unmapped_heap.c
> new file mode 100644
> index 000000000000..7602b659c2ec
> --- /dev/null
> +++ b/drivers/staging/android/ion/ion_unmapped_heap.c
> @@ -0,0 +1,123 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ION Memory Allocator unmapped heap helper
> + *
> + * Copyright (C) 2015-2016 Texas Instruments Incorporated - http://www.ti.com/
> + * Andrew F. Davis <[email protected]>
> + *
> + * ION "unmapped" heaps are physical memory heaps not by default mapped into
> + * a virtual address space. The buffer owner can explicitly request kernel
> + * space mappings but the underlying memory may still not be accessible for
> + * various reasons, such as firewalls.
> + */
> +
> +#include <linux/err.h>
> +#include <linux/genalloc.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +
> +#include "ion.h"
> +
> +#define ION_UNMAPPED_ALLOCATE_FAIL -1
> +
> +struct ion_unmapped_heap {
> + struct ion_heap heap;
> + struct gen_pool *pool;
> +};
> +
> +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap,
> + unsigned long size)
> +{
> + struct ion_unmapped_heap *unmapped_heap =
> + container_of(heap, struct ion_unmapped_heap, heap);
> + unsigned long offset;
> +
> + offset = gen_pool_alloc(unmapped_heap->pool, size);
> + if (!offset)
> + return ION_UNMAPPED_ALLOCATE_FAIL;
> +
> + return offset;
> +}
> +
> +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr,
> + unsigned long size)
> +{
> + struct ion_unmapped_heap *unmapped_heap =
> + container_of(heap, struct ion_unmapped_heap, heap);
> +
> + gen_pool_free(unmapped_heap->pool, addr, size);
> +}
> +
> +static int ion_unmapped_heap_allocate(struct ion_heap *heap,
> + struct ion_buffer *buffer,
> + unsigned long size,
> + unsigned long flags)
> +{
> + struct sg_table *table;
> + phys_addr_t paddr;
> + int ret;
> +
> + table = kmalloc(sizeof(*table), GFP_KERNEL);
> + if (!table)
> + return -ENOMEM;
> + ret = sg_alloc_table(table, 1, GFP_KERNEL);
> + if (ret)
> + goto err_free;
> +
> + paddr = ion_unmapped_allocate(heap, size);
> + if (paddr == ION_UNMAPPED_ALLOCATE_FAIL) {
> + ret = -ENOMEM;
> + goto err_free_table;
> + }
> +
> + sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(paddr)), size, 0);
> + buffer->sg_table = table;
> +


If this memory is actually unmapped this is not going to work because
the struct page will not be valid.

> + return 0;
> +
> +err_free_table:
> + sg_free_table(table);
> +err_free:
> + kfree(table);
> + return ret;
> +}
> +
> +static void ion_unmapped_heap_free(struct ion_buffer *buffer)
> +{
> + struct ion_heap *heap = buffer->heap;
> + struct sg_table *table = buffer->sg_table;
> + struct page *page = sg_page(table->sgl);
> + phys_addr_t paddr = PFN_PHYS(page_to_pfn(page));
> +
> + ion_unmapped_free(heap, paddr, buffer->size);
> + sg_free_table(buffer->sg_table);
> + kfree(buffer->sg_table);
> +}
> +
> +static struct ion_heap_ops unmapped_heap_ops = {
> + .allocate = ion_unmapped_heap_allocate,
> + .free = ion_unmapped_heap_free,
> + /* no .map_user, user mapping of unmapped heaps not allowed */
> + .map_kernel = ion_heap_map_kernel,
> + .unmap_kernel = ion_heap_unmap_kernel,
> +};
> +
> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
> +{
> + struct ion_unmapped_heap *unmapped_heap;
> +
> + unmapped_heap = kzalloc(sizeof(*unmapped_heap), GFP_KERNEL);
> + if (!unmapped_heap)
> + return ERR_PTR(-ENOMEM);
> +
> + unmapped_heap->pool = gen_pool_create(PAGE_SHIFT, -1);
> + if (!unmapped_heap->pool) {
> + kfree(unmapped_heap);
> + return ERR_PTR(-ENOMEM);
> + }
> + gen_pool_add(unmapped_heap->pool, base, size, -1);
> + unmapped_heap->heap.ops = &unmapped_heap_ops;
> + unmapped_heap->heap.type = ION_HEAP_TYPE_UNMAPPED;
> +
> + return &unmapped_heap->heap;
> +}
> diff --git a/drivers/staging/android/uapi/ion.h b/drivers/staging/android/uapi/ion.h
> index 5d7009884c13..d5f98bc5f340 100644
> --- a/drivers/staging/android/uapi/ion.h
> +++ b/drivers/staging/android/uapi/ion.h
> @@ -19,6 +19,8 @@
> * carveout heap, allocations are physically
> * contiguous
> * @ION_HEAP_TYPE_DMA: memory allocated via DMA API
> + * @ION_HEAP_TYPE_UNMAPPED: memory not intended to be mapped into the
> + * linux address space unless for debug cases
> * @ION_NUM_HEAPS: helper for iterating over heaps, a bit mask
> * is used to identify the heaps, so only 32
> * total heap types are supported
> @@ -29,6 +31,7 @@ enum ion_heap_type {
> ION_HEAP_TYPE_CARVEOUT,
> ION_HEAP_TYPE_CHUNK,
> ION_HEAP_TYPE_DMA,
> + ION_HEAP_TYPE_UNMAPPED,
> ION_HEAP_TYPE_CUSTOM, /*
> * must be last so device specific heaps always
> * are at the end of this enum
>

Overall this seems way too similar to the carveout heap
to justify adding another heap type. It also still missing
the part of where exactly you call ion_unmapped_heap_create.
Figuring that out is one of the blocking items for moving
Ion out of staging.

Thanks,
Laura

2019-01-15 16:48:24

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper

On 1/14/19 8:32 PM, Laura Abbott wrote:
> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>> The "unmapped" heap is very similar to the carveout heap except
>> the backing memory is presumed to be unmappable by the host, in
>> my specific case due to firewalls. This memory can still be
>> allocated from and used by devices that do have access to the
>> backing memory.
>>
>> Based originally on the secure/unmapped heap from Linaro for
>> the OP-TEE SDP implementation, this was re-written to match
>> the carveout heap helper code.
>>
>> Suggested-by: Etienne Carriere <[email protected]>
>> Signed-off-by: Andrew F. Davis <[email protected]>
>> ---
>>   drivers/staging/android/ion/Kconfig           |  10 ++
>>   drivers/staging/android/ion/Makefile          |   1 +
>>   drivers/staging/android/ion/ion.h             |  16 +++
>>   .../staging/android/ion/ion_unmapped_heap.c   | 123 ++++++++++++++++++
>>   drivers/staging/android/uapi/ion.h            |   3 +
>>   5 files changed, 153 insertions(+)
>>   create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>>
>> diff --git a/drivers/staging/android/ion/Kconfig
>> b/drivers/staging/android/ion/Kconfig
>> index 0fdda6f62953..a117b8b91b14 100644
>> --- a/drivers/staging/android/ion/Kconfig
>> +++ b/drivers/staging/android/ion/Kconfig
>> @@ -42,3 +42,13 @@ config ION_CMA_HEAP
>>         Choose this option to enable CMA heaps with Ion. This heap is
>> backed
>>         by the Contiguous Memory Allocator (CMA). If your system has
>> these
>>         regions, you should say Y here.
>> +
>> +config ION_UNMAPPED_HEAP
>> +    bool "ION unmapped heap support"
>> +    depends on ION
>> +    help
>> +      Choose this option to enable UNMAPPED heaps with Ion. This heap is
>> +      backed in specific memory pools, carveout from the Linux memory.
>> +      Unlike carveout heaps these are assumed to be not mappable by
>> +      kernel or user-space.
>> +      Unless you know your system has these regions, you should say N
>> here.
>> diff --git a/drivers/staging/android/ion/Makefile
>> b/drivers/staging/android/ion/Makefile
>> index 17f3a7569e3d..c71a1f3de581 100644
>> --- a/drivers/staging/android/ion/Makefile
>> +++ b/drivers/staging/android/ion/Makefile
>> @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o
>> ion_page_pool.o
>>   obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
>>   obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
>>   obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o
>> +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o
>> diff --git a/drivers/staging/android/ion/ion.h
>> b/drivers/staging/android/ion/ion.h
>> index 97b2876b165a..ce74332018ba 100644
>> --- a/drivers/staging/android/ion/ion.h
>> +++ b/drivers/staging/android/ion/ion.h
>> @@ -341,4 +341,20 @@ static inline struct ion_heap
>> *ion_chunk_heap_create(phys_addr_t base, size_t si
>>   }
>>   #endif
>>   +#ifdef CONFIG_ION_UNMAPPED_HEAP
>> +/**
>> + * ion_unmapped_heap_create
>> + * @base:        base address of carveout memory
>> + * @size:        size of carveout memory region
>> + *
>> + * Creates an unmapped ion_heap using the passed in data
>> + */
>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t
>> size);
>> +#else
>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
>> +{
>> +    return ERR_PTR(-ENODEV);
>> +}
>> +#endif
>> +
>>   #endif /* _ION_H */
>> diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c
>> b/drivers/staging/android/ion/ion_unmapped_heap.c
>> new file mode 100644
>> index 000000000000..7602b659c2ec
>> --- /dev/null
>> +++ b/drivers/staging/android/ion/ion_unmapped_heap.c
>> @@ -0,0 +1,123 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * ION Memory Allocator unmapped heap helper
>> + *
>> + * Copyright (C) 2015-2016 Texas Instruments Incorporated -
>> http://www.ti.com/
>> + *    Andrew F. Davis <[email protected]>
>> + *
>> + * ION "unmapped" heaps are physical memory heaps not by default
>> mapped into
>> + * a virtual address space. The buffer owner can explicitly request
>> kernel
>> + * space mappings but the underlying memory may still not be
>> accessible for
>> + * various reasons, such as firewalls.
>> + */
>> +
>> +#include <linux/err.h>
>> +#include <linux/genalloc.h>
>> +#include <linux/scatterlist.h>
>> +#include <linux/slab.h>
>> +
>> +#include "ion.h"
>> +
>> +#define ION_UNMAPPED_ALLOCATE_FAIL -1
>> +
>> +struct ion_unmapped_heap {
>> +    struct ion_heap heap;
>> +    struct gen_pool *pool;
>> +};
>> +
>> +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap,
>> +                     unsigned long size)
>> +{
>> +    struct ion_unmapped_heap *unmapped_heap =
>> +        container_of(heap, struct ion_unmapped_heap, heap);
>> +    unsigned long offset;
>> +
>> +    offset = gen_pool_alloc(unmapped_heap->pool, size);
>> +    if (!offset)
>> +        return ION_UNMAPPED_ALLOCATE_FAIL;
>> +
>> +    return offset;
>> +}
>> +
>> +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr,
>> +                  unsigned long size)
>> +{
>> +    struct ion_unmapped_heap *unmapped_heap =
>> +        container_of(heap, struct ion_unmapped_heap, heap);
>> +
>> +    gen_pool_free(unmapped_heap->pool, addr, size);
>> +}
>> +
>> +static int ion_unmapped_heap_allocate(struct ion_heap *heap,
>> +                      struct ion_buffer *buffer,
>> +                      unsigned long size,
>> +                      unsigned long flags)
>> +{
>> +    struct sg_table *table;
>> +    phys_addr_t paddr;
>> +    int ret;
>> +
>> +    table = kmalloc(sizeof(*table), GFP_KERNEL);
>> +    if (!table)
>> +        return -ENOMEM;
>> +    ret = sg_alloc_table(table, 1, GFP_KERNEL);
>> +    if (ret)
>> +        goto err_free;
>> +
>> +    paddr = ion_unmapped_allocate(heap, size);
>> +    if (paddr == ION_UNMAPPED_ALLOCATE_FAIL) {
>> +        ret = -ENOMEM;
>> +        goto err_free_table;
>> +    }
>> +
>> +    sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(paddr)), size, 0);
>> +    buffer->sg_table = table;
>> +
>
>
> If this memory is actually unmapped this is not going to work because
> the struct page will not be valid.
>

If it will never get mapped then it doesn't need a valid struct page as
far as I can tell. We only use it as a marker to where the start of
backing memory is, and that is calculated based on the struct page
pointer address, not its contents.

>> +    return 0;
>> +
>> +err_free_table:
>> +    sg_free_table(table);
>> +err_free:
>> +    kfree(table);
>> +    return ret;
>> +}
>> +
>> +static void ion_unmapped_heap_free(struct ion_buffer *buffer)
>> +{
>> +    struct ion_heap *heap = buffer->heap;
>> +    struct sg_table *table = buffer->sg_table;
>> +    struct page *page = sg_page(table->sgl);
>> +    phys_addr_t paddr = PFN_PHYS(page_to_pfn(page));
>> +
>> +    ion_unmapped_free(heap, paddr, buffer->size);
>> +    sg_free_table(buffer->sg_table);
>> +    kfree(buffer->sg_table);
>> +}
>> +
>> +static struct ion_heap_ops unmapped_heap_ops = {
>> +    .allocate = ion_unmapped_heap_allocate,
>> +    .free = ion_unmapped_heap_free,
>> +    /* no .map_user, user mapping of unmapped heaps not allowed */
>> +    .map_kernel = ion_heap_map_kernel,
>> +    .unmap_kernel = ion_heap_unmap_kernel,
>> +};
>> +
>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
>> +{
>> +    struct ion_unmapped_heap *unmapped_heap;
>> +
>> +    unmapped_heap = kzalloc(sizeof(*unmapped_heap), GFP_KERNEL);
>> +    if (!unmapped_heap)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    unmapped_heap->pool = gen_pool_create(PAGE_SHIFT, -1);
>> +    if (!unmapped_heap->pool) {
>> +        kfree(unmapped_heap);
>> +        return ERR_PTR(-ENOMEM);
>> +    }
>> +    gen_pool_add(unmapped_heap->pool, base, size, -1);
>> +    unmapped_heap->heap.ops = &unmapped_heap_ops;
>> +    unmapped_heap->heap.type = ION_HEAP_TYPE_UNMAPPED;
>> +
>> +    return &unmapped_heap->heap;
>> +}
>> diff --git a/drivers/staging/android/uapi/ion.h
>> b/drivers/staging/android/uapi/ion.h
>> index 5d7009884c13..d5f98bc5f340 100644
>> --- a/drivers/staging/android/uapi/ion.h
>> +++ b/drivers/staging/android/uapi/ion.h
>> @@ -19,6 +19,8 @@
>>    *                 carveout heap, allocations are physically
>>    *                 contiguous
>>    * @ION_HEAP_TYPE_DMA:         memory allocated via DMA API
>> + * @ION_HEAP_TYPE_UNMAPPED:     memory not intended to be mapped into
>> the
>> + *                 linux address space unless for debug cases
>>    * @ION_NUM_HEAPS:         helper for iterating over heaps, a bit mask
>>    *                 is used to identify the heaps, so only 32
>>    *                 total heap types are supported
>> @@ -29,6 +31,7 @@ enum ion_heap_type {
>>       ION_HEAP_TYPE_CARVEOUT,
>>       ION_HEAP_TYPE_CHUNK,
>>       ION_HEAP_TYPE_DMA,
>> +    ION_HEAP_TYPE_UNMAPPED,
>>       ION_HEAP_TYPE_CUSTOM, /*
>>                      * must be last so device specific heaps always
>>                      * are at the end of this enum
>>
>
> Overall this seems way too similar to the carveout heap
> to justify adding another heap type. It also still missing
> the part of where exactly you call ion_unmapped_heap_create.
> Figuring that out is one of the blocking items for moving
> Ion out of staging.
>

I agree with this being almost a 1:1 copy of the carveout heap, I'm just
not sure of a good way to do this without a new heap type. Adding flags
to the existing carveout type seem a bit messy. Plus then those flags
should be valid for other heap types, gets complicated quickly..

I'll reply to the second part in your other top level response.

Andrew

> Thanks,
> Laura

2019-01-15 17:59:22

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/14/19 11:13 AM, Liam Mark wrote:
> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>
>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>> Accesses from the CPU to a cached heap should be bracketed with
>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>
>> Signed-off-by: Andrew F. Davis <[email protected]>
>> ---
>> drivers/staging/android/ion/ion.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>> index 14e48f6eb734..09cb5a8e2b09 100644
>> --- a/drivers/staging/android/ion/ion.c
>> +++ b/drivers/staging/android/ion/ion.c
>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>
>> table = a->table;
>>
>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>> - direction))
>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>
> Unfortunately I don't think you can do this for a couple reasons.
> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> If the calls to {begin,end}_cpu_access were made before the call to
> dma_buf_attach then there won't have been a device attached so the calls
> to {begin,end}_cpu_access won't have done any cache maintenance.
>

That should be okay though, if you have no attachments (or all
attachments are IO-coherent) then there is no need for cache
maintenance. Unless you mean a sequence where a non-io-coherent device
is attached later after data has already been written. Does that
sequence need supporting? DMA-BUF doesn't have to allocate the backing
memory until map_dma_buf() time, and that should only happen after all
the devices have attached so it can know where to put the buffer. So we
shouldn't expect any CPU access to buffers before all the devices are
attached and mapped, right?

> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> access then there is no requirement (that I am aware of) for you to call
> {begin,end}_cpu_access before passing the buffer to the device and if this
> buffer is cached and your device is not IO-coherent then the cache maintenance
> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>

If I am not doing any CPU access then why do I need CPU cache
maintenance on the buffer?

Andrew

>> return ERR_PTR(-ENOMEM);
>>
>> return table;
>> @@ -272,7 +272,8 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
>> struct sg_table *table,
>> enum dma_data_direction direction)
>> {
>> - dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
>> + dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
>> + direction, DMA_ATTR_SKIP_CPU_SYNC);
>> }
>>
>> static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
>> --
>> 2.19.1
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
>>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-16 05:37:52

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Tue, 15 Jan 2019, Andrew F. Davis wrote:

> On 1/14/19 11:13 AM, Liam Mark wrote:
> > On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >
> >> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >> Accesses from the CPU to a cached heap should be bracketed with
> >> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>
> >> Signed-off-by: Andrew F. Davis <[email protected]>
> >> ---
> >> drivers/staging/android/ion/ion.c | 7 ++++---
> >> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> >> index 14e48f6eb734..09cb5a8e2b09 100644
> >> --- a/drivers/staging/android/ion/ion.c
> >> +++ b/drivers/staging/android/ion/ion.c
> >> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>
> >> table = a->table;
> >>
> >> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >> - direction))
> >> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> >
> > Unfortunately I don't think you can do this for a couple reasons.
> > You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> > If the calls to {begin,end}_cpu_access were made before the call to
> > dma_buf_attach then there won't have been a device attached so the calls
> > to {begin,end}_cpu_access won't have done any cache maintenance.
> >
>
> That should be okay though, if you have no attachments (or all
> attachments are IO-coherent) then there is no need for cache
> maintenance. Unless you mean a sequence where a non-io-coherent device
> is attached later after data has already been written. Does that
> sequence need supporting?

Yes, but also I think there are cases where CPU access can happen before
in Android, but I will focus on later for now.

> DMA-BUF doesn't have to allocate the backing
> memory until map_dma_buf() time, and that should only happen after all
> the devices have attached so it can know where to put the buffer. So we
> shouldn't expect any CPU access to buffers before all the devices are
> attached and mapped, right?
>

Here is an example where CPU access can happen later in Android.

Camera device records video -> software post processing -> video device
(who does compression of raw data) and writes to a file

In this example assume the buffer is cached and the devices are not
IO-coherent (quite common).

ION buffer is allocated.

//Camera device records video
dma_buf_attach
dma_map_attachment (buffer needs to be cleaned)
[camera device writes to buffer]
dma_buf_unmap_attachment (buffer needs to be invalidated)
dma_buf_detach (device cannot stay attached because it is being sent down
the pipeline and Camera doesn't know the end of the use case)

//buffer is send down the pipeline

// Usersapce software post processing occurs
mmap buffer
DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
devices attached to buffer
[CPU reads/writes to the buffer]
DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
devices attached to buffer
munmap buffer

//buffer is send down the pipeline
// Buffer is send to video device (who does compression of raw data) and
writes to a file
dma_buf_attach
dma_map_attachment (buffer needs to be cleaned)
[video device writes to buffer]
dma_buf_unmap_attachment
dma_buf_detach (device cannot stay attached because it is being sent down
the pipeline and Video doesn't know the end of the use case)



> > Also ION no longer provides DMA ready memory, so if you are not doing CPU
> > access then there is no requirement (that I am aware of) for you to call
> > {begin,end}_cpu_access before passing the buffer to the device and if this
> > buffer is cached and your device is not IO-coherent then the cache maintenance
> > in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >
>
> If I am not doing any CPU access then why do I need CPU cache
> maintenance on the buffer?
>

Because ION no longer provides DMA ready memory.
Take the above example.

ION allocates memory from buddy allocator and requests zeroing.
Zeros are written to the cache.

You pass the buffer to the camera device which is not IO-coherent.
The camera devices writes directly to the buffer in DDR.
Since you didn't clean the buffer a dirty cache line (one of the zeros) is
evicted from the cache, this zero overwrites data the camera device has
written which corrupts your data.

Liam

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-16 05:47:49

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On 1/14/19 8:39 PM, Laura Abbott wrote:
> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>> Hello all,
>>
>> This is a set of (hopefully) non-controversial cleanups for the ION
>> framework and current set of heaps. These were found as I start to
>> familiarize myself with the framework to help in whatever way I
>> can in getting all this up to the standards needed for de-staging.
>>
>> I would like to get some ideas of what is left to work on to get ION
>> out of staging. Has there been some kind of agreement on what ION should
>> eventually end up being? To me it looks like it is being whittled away at
>> to it's most core functions. To me that is looking like being a DMA-BUF
>> user-space front end, simply advertising available memory backings in a
>> system and providing allocations as DMA-BUF handles. If this is the case
>> then it looks close to being ready to me at least, but I would love to
>> hear any other opinions and concerns.
>>
>
> Yes, at this point the only functionality that people are really
> depending on is the ability to allocate a dma_buf easily from userspace.
>
>> Back to this patchset, the last patch may be a bit different than the
>> others, it adds an unmapped heaps type and creation helper. I wanted to
>> get this in to show off another heap type and maybe some issues we may
>> have with the current ION framework. The unmapped heap is used when the
>> backing memory should not (or cannot) be touched. Currently this kind
>> of heap is used for firewalled secure memory that can be allocated like
>> normal heap memory but only used by secure devices (OP-TEE, crypto HW,
>> etc). It is basically just copied from the "carveout" heap type with the
>> only difference being it is not mappable to userspace and we do not clear
>> the memory (as we should not map it either). So should this really be a
>> new heap type? Or maybe advertised as a carveout heap but with an
>> additional allocation flag? Perhaps we do away with "types" altogether
>> and just have flags, coherent/non-coherent, mapped/unmapped, etc.
>>
>> Maybe more thinking will be needed afterall..
>>
>
> So the cleanup looks okay (I need to finish reviewing) but I'm not a
> fan of adding another heaptype without solving the problem of adding
> some sort of devicetree binding or other method of allocating and
> placing Ion heaps. That plus uncached buffers are one of the big
> open problems that need to be solved for destaging Ion. See
> https://lore.kernel.org/lkml/[email protected]/
>
> for some background on that problem.
>

I'm under the impression that adding heaps like carveouts/chunk will be
rather system specific and so do not lend themselves well to a universal
DT style exporter. For instance a carveout memory space can be reported
by a device at runtime, then the driver managing that device should go
and use the carveout heap helpers to export that heap. If this is the
case then I'm not sure it is a problem for the ION core framework to
solve, but rather the users of it to figure out how best to create the
various heaps. All Ion needs to do is allow exporting and advertising
them IMHO.

Thanks for the background thread link, I've been looking for some info
on current status of all this and "ion" is a bit hard to search the
lists for. The core reason for posting this cleanup series is to throw
my hat into the ring of all this Ion work and start getting familiar
with the pending issues. The last two patches are not all that important
to get in right now.

In that thread you linked above, it seems we may have arrived at a
similar problem for different reasons. I think the root issue is the Ion
core makes too many assumptions about the heap memory. My proposal would
be to allow the heap exporters more control over the DMA-BUF ops, maybe
even going as far as letting them provide their own complete struct
dma_buf_ops.

Let me give an example where I think this is going to be useful. We have
the classic constraint solving problem on our SoCs. Our SoCs are full of
various coherent and non-coherent devices, some require contiguous
memory allocations, others have in-line IOMMUs so can operate on
non-contiguous, etc..

DMA-BUF has a solution designed in for this we can use, namely
allocation at map time after all the attachments have come in. The
checking of each attached device to find the right backing memory is
something the DMA-BUF exporter has to do, and so this SoC specific logic
would have to be added to each exporting framework (DRM, V4L2, etc),
unless we have one unified system exporter everyone uses, Ion.

Then each system can define one (maybe typeless) heap, the correct
backing type is system specific anyway, so let the system specific
backing logic in the unified system exporter heap handle picking that.
To allow that heaps need direct control of dma_buf_ops.

Direct heap control of dma_buf_ops also fixes the cache/non-cache issue,
and my unmapped memory issue, each heap type handles the quirks of its
backing storage in its own way, instead of trying to find some one size
fits all memory operations like we are doing now.

We can provide helpers for the simple heap types still, but with this
much of the heavy lifting moves out of the Ion core framework making it
much more simple, something I think it will need for de-staging.

Anyway, I might be completely off base in my direction here, just let me
know :)

Thanks,
Andrew

> Thanks,
> Laura
>
>> Thanks,
>> Andrew
>>
>> Andrew F. Davis (14):
>>    staging: android: ion: Add proper header information
>>    staging: android: ion: Remove empty ion_ioctl_dir() function
>>    staging: android: ion: Merge ion-ioctl.c into ion.c
>>    staging: android: ion: Remove leftover comment
>>    staging: android: ion: Remove struct ion_platform_heap
>>    staging: android: ion: Fixup some white-space issues
>>    staging: android: ion: Sync comment docs with struct ion_buffer
>>    staging: android: ion: Remove base from ion_carveout_heap
>>    staging: android: ion: Remove base from ion_chunk_heap
>>    staging: android: ion: Remove unused headers
>>    staging: android: ion: Allow heap name to be null
>>    staging: android: ion: Declare helpers for carveout and chunk heaps
>>    staging: android: ion: Do not sync CPU cache on map/unmap
>>    staging: android: ion: Add UNMAPPED heap type and helper
>>
>>   drivers/staging/android/ion/Kconfig           |  10 ++
>>   drivers/staging/android/ion/Makefile          |   3 +-
>>   drivers/staging/android/ion/ion-ioctl.c       |  98 --------------
>>   drivers/staging/android/ion/ion.c             |  93 +++++++++++--
>>   drivers/staging/android/ion/ion.h             |  87 ++++++++-----
>>   .../staging/android/ion/ion_carveout_heap.c   |  19 +--
>>   drivers/staging/android/ion/ion_chunk_heap.c  |  25 ++--
>>   drivers/staging/android/ion/ion_cma_heap.c    |   6 +-
>>   drivers/staging/android/ion/ion_heap.c        |   8 +-
>>   drivers/staging/android/ion/ion_page_pool.c   |   2 +-
>>   drivers/staging/android/ion/ion_system_heap.c |   8 +-
>>   .../staging/android/ion/ion_unmapped_heap.c   | 123 ++++++++++++++++++
>>   drivers/staging/android/uapi/ion.h            |   3 +
>>   13 files changed, 307 insertions(+), 178 deletions(-)
>>   delete mode 100644 drivers/staging/android/ion/ion-ioctl.c
>>   create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>>
>

2019-01-16 09:03:09

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper

On 1/15/19 7:58 AM, Andrew F. Davis wrote:
> On 1/14/19 8:32 PM, Laura Abbott wrote:
>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>> The "unmapped" heap is very similar to the carveout heap except
>>> the backing memory is presumed to be unmappable by the host, in
>>> my specific case due to firewalls. This memory can still be
>>> allocated from and used by devices that do have access to the
>>> backing memory.
>>>
>>> Based originally on the secure/unmapped heap from Linaro for
>>> the OP-TEE SDP implementation, this was re-written to match
>>> the carveout heap helper code.
>>>
>>> Suggested-by: Etienne Carriere <[email protected]>
>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>> ---
>>>   drivers/staging/android/ion/Kconfig           |  10 ++
>>>   drivers/staging/android/ion/Makefile          |   1 +
>>>   drivers/staging/android/ion/ion.h             |  16 +++
>>>   .../staging/android/ion/ion_unmapped_heap.c   | 123 ++++++++++++++++++
>>>   drivers/staging/android/uapi/ion.h            |   3 +
>>>   5 files changed, 153 insertions(+)
>>>   create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>>>
>>> diff --git a/drivers/staging/android/ion/Kconfig
>>> b/drivers/staging/android/ion/Kconfig
>>> index 0fdda6f62953..a117b8b91b14 100644
>>> --- a/drivers/staging/android/ion/Kconfig
>>> +++ b/drivers/staging/android/ion/Kconfig
>>> @@ -42,3 +42,13 @@ config ION_CMA_HEAP
>>>         Choose this option to enable CMA heaps with Ion. This heap is
>>> backed
>>>         by the Contiguous Memory Allocator (CMA). If your system has
>>> these
>>>         regions, you should say Y here.
>>> +
>>> +config ION_UNMAPPED_HEAP
>>> +    bool "ION unmapped heap support"
>>> +    depends on ION
>>> +    help
>>> +      Choose this option to enable UNMAPPED heaps with Ion. This heap is
>>> +      backed in specific memory pools, carveout from the Linux memory.
>>> +      Unlike carveout heaps these are assumed to be not mappable by
>>> +      kernel or user-space.
>>> +      Unless you know your system has these regions, you should say N
>>> here.
>>> diff --git a/drivers/staging/android/ion/Makefile
>>> b/drivers/staging/android/ion/Makefile
>>> index 17f3a7569e3d..c71a1f3de581 100644
>>> --- a/drivers/staging/android/ion/Makefile
>>> +++ b/drivers/staging/android/ion/Makefile
>>> @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o
>>> ion_page_pool.o
>>>   obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
>>>   obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
>>>   obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o
>>> +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o
>>> diff --git a/drivers/staging/android/ion/ion.h
>>> b/drivers/staging/android/ion/ion.h
>>> index 97b2876b165a..ce74332018ba 100644
>>> --- a/drivers/staging/android/ion/ion.h
>>> +++ b/drivers/staging/android/ion/ion.h
>>> @@ -341,4 +341,20 @@ static inline struct ion_heap
>>> *ion_chunk_heap_create(phys_addr_t base, size_t si
>>>   }
>>>   #endif
>>>   +#ifdef CONFIG_ION_UNMAPPED_HEAP
>>> +/**
>>> + * ion_unmapped_heap_create
>>> + * @base:        base address of carveout memory
>>> + * @size:        size of carveout memory region
>>> + *
>>> + * Creates an unmapped ion_heap using the passed in data
>>> + */
>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t
>>> size);
>>> +#else
>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
>>> +{
>>> +    return ERR_PTR(-ENODEV);
>>> +}
>>> +#endif
>>> +
>>>   #endif /* _ION_H */
>>> diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c
>>> b/drivers/staging/android/ion/ion_unmapped_heap.c
>>> new file mode 100644
>>> index 000000000000..7602b659c2ec
>>> --- /dev/null
>>> +++ b/drivers/staging/android/ion/ion_unmapped_heap.c
>>> @@ -0,0 +1,123 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * ION Memory Allocator unmapped heap helper
>>> + *
>>> + * Copyright (C) 2015-2016 Texas Instruments Incorporated -
>>> http://www.ti.com/
>>> + *    Andrew F. Davis <[email protected]>
>>> + *
>>> + * ION "unmapped" heaps are physical memory heaps not by default
>>> mapped into
>>> + * a virtual address space. The buffer owner can explicitly request
>>> kernel
>>> + * space mappings but the underlying memory may still not be
>>> accessible for
>>> + * various reasons, such as firewalls.
>>> + */
>>> +
>>> +#include <linux/err.h>
>>> +#include <linux/genalloc.h>
>>> +#include <linux/scatterlist.h>
>>> +#include <linux/slab.h>
>>> +
>>> +#include "ion.h"
>>> +
>>> +#define ION_UNMAPPED_ALLOCATE_FAIL -1
>>> +
>>> +struct ion_unmapped_heap {
>>> +    struct ion_heap heap;
>>> +    struct gen_pool *pool;
>>> +};
>>> +
>>> +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap,
>>> +                     unsigned long size)
>>> +{
>>> +    struct ion_unmapped_heap *unmapped_heap =
>>> +        container_of(heap, struct ion_unmapped_heap, heap);
>>> +    unsigned long offset;
>>> +
>>> +    offset = gen_pool_alloc(unmapped_heap->pool, size);
>>> +    if (!offset)
>>> +        return ION_UNMAPPED_ALLOCATE_FAIL;
>>> +
>>> +    return offset;
>>> +}
>>> +
>>> +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr,
>>> +                  unsigned long size)
>>> +{
>>> +    struct ion_unmapped_heap *unmapped_heap =
>>> +        container_of(heap, struct ion_unmapped_heap, heap);
>>> +
>>> +    gen_pool_free(unmapped_heap->pool, addr, size);
>>> +}
>>> +
>>> +static int ion_unmapped_heap_allocate(struct ion_heap *heap,
>>> +                      struct ion_buffer *buffer,
>>> +                      unsigned long size,
>>> +                      unsigned long flags)
>>> +{
>>> +    struct sg_table *table;
>>> +    phys_addr_t paddr;
>>> +    int ret;
>>> +
>>> +    table = kmalloc(sizeof(*table), GFP_KERNEL);
>>> +    if (!table)
>>> +        return -ENOMEM;
>>> +    ret = sg_alloc_table(table, 1, GFP_KERNEL);
>>> +    if (ret)
>>> +        goto err_free;
>>> +
>>> +    paddr = ion_unmapped_allocate(heap, size);
>>> +    if (paddr == ION_UNMAPPED_ALLOCATE_FAIL) {
>>> +        ret = -ENOMEM;
>>> +        goto err_free_table;
>>> +    }
>>> +
>>> +    sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(paddr)), size, 0);
>>> +    buffer->sg_table = table;
>>> +
>>
>>
>> If this memory is actually unmapped this is not going to work because
>> the struct page will not be valid.
>>
>
> If it will never get mapped then it doesn't need a valid struct page as
> far as I can tell. We only use it as a marker to where the start of
> backing memory is, and that is calculated based on the struct page
> pointer address, not its contents.
>

You can't rely on pfn_to_page returning a valid page pointer if the
pfn doesn't point to reserved memory. Even if you aren't relying
on the contents, that's no guarantee you can use that to to get
the valid pfn back. You can get away with that sometimes but it's
not correct to rely on it all the time.

>>> +    return 0;
>>> +
>>> +err_free_table:
>>> +    sg_free_table(table);
>>> +err_free:
>>> +    kfree(table);
>>> +    return ret;
>>> +}
>>> +
>>> +static void ion_unmapped_heap_free(struct ion_buffer *buffer)
>>> +{
>>> +    struct ion_heap *heap = buffer->heap;
>>> +    struct sg_table *table = buffer->sg_table;
>>> +    struct page *page = sg_page(table->sgl);
>>> +    phys_addr_t paddr = PFN_PHYS(page_to_pfn(page));
>>> +
>>> +    ion_unmapped_free(heap, paddr, buffer->size);
>>> +    sg_free_table(buffer->sg_table);
>>> +    kfree(buffer->sg_table);
>>> +}
>>> +
>>> +static struct ion_heap_ops unmapped_heap_ops = {
>>> +    .allocate = ion_unmapped_heap_allocate,
>>> +    .free = ion_unmapped_heap_free,
>>> +    /* no .map_user, user mapping of unmapped heaps not allowed */
>>> +    .map_kernel = ion_heap_map_kernel,
>>> +    .unmap_kernel = ion_heap_unmap_kernel,
>>> +};
>>> +
>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
>>> +{
>>> +    struct ion_unmapped_heap *unmapped_heap;
>>> +
>>> +    unmapped_heap = kzalloc(sizeof(*unmapped_heap), GFP_KERNEL);
>>> +    if (!unmapped_heap)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    unmapped_heap->pool = gen_pool_create(PAGE_SHIFT, -1);
>>> +    if (!unmapped_heap->pool) {
>>> +        kfree(unmapped_heap);
>>> +        return ERR_PTR(-ENOMEM);
>>> +    }
>>> +    gen_pool_add(unmapped_heap->pool, base, size, -1);
>>> +    unmapped_heap->heap.ops = &unmapped_heap_ops;
>>> +    unmapped_heap->heap.type = ION_HEAP_TYPE_UNMAPPED;
>>> +
>>> +    return &unmapped_heap->heap;
>>> +}
>>> diff --git a/drivers/staging/android/uapi/ion.h
>>> b/drivers/staging/android/uapi/ion.h
>>> index 5d7009884c13..d5f98bc5f340 100644
>>> --- a/drivers/staging/android/uapi/ion.h
>>> +++ b/drivers/staging/android/uapi/ion.h
>>> @@ -19,6 +19,8 @@
>>>    *                 carveout heap, allocations are physically
>>>    *                 contiguous
>>>    * @ION_HEAP_TYPE_DMA:         memory allocated via DMA API
>>> + * @ION_HEAP_TYPE_UNMAPPED:     memory not intended to be mapped into
>>> the
>>> + *                 linux address space unless for debug cases
>>>    * @ION_NUM_HEAPS:         helper for iterating over heaps, a bit mask
>>>    *                 is used to identify the heaps, so only 32
>>>    *                 total heap types are supported
>>> @@ -29,6 +31,7 @@ enum ion_heap_type {
>>>       ION_HEAP_TYPE_CARVEOUT,
>>>       ION_HEAP_TYPE_CHUNK,
>>>       ION_HEAP_TYPE_DMA,
>>> +    ION_HEAP_TYPE_UNMAPPED,
>>>       ION_HEAP_TYPE_CUSTOM, /*
>>>                      * must be last so device specific heaps always
>>>                      * are at the end of this enum
>>>
>>
>> Overall this seems way too similar to the carveout heap
>> to justify adding another heap type. It also still missing
>> the part of where exactly you call ion_unmapped_heap_create.
>> Figuring that out is one of the blocking items for moving
>> Ion out of staging.
>>
>
> I agree with this being almost a 1:1 copy of the carveout heap, I'm just
> not sure of a good way to do this without a new heap type. Adding flags
> to the existing carveout type seem a bit messy. Plus then those flags
> should be valid for other heap types, gets complicated quickly..
>
> I'll reply to the second part in your other top level response.
>
> Andrew
>
>> Thanks,
>> Laura


2019-01-16 09:16:48

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/15/19 11:45 AM, Liam Mark wrote:
> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>
>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>> ---
>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>> --- a/drivers/staging/android/ion/ion.c
>>>> +++ b/drivers/staging/android/ion/ion.c
>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>
>>>> table = a->table;
>>>>
>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>> - direction))
>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>
>>> Unfortunately I don't think you can do this for a couple reasons.
>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>> If the calls to {begin,end}_cpu_access were made before the call to
>>> dma_buf_attach then there won't have been a device attached so the calls
>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>
>>
>> That should be okay though, if you have no attachments (or all
>> attachments are IO-coherent) then there is no need for cache
>> maintenance. Unless you mean a sequence where a non-io-coherent device
>> is attached later after data has already been written. Does that
>> sequence need supporting?
>
> Yes, but also I think there are cases where CPU access can happen before
> in Android, but I will focus on later for now.
>
>> DMA-BUF doesn't have to allocate the backing
>> memory until map_dma_buf() time, and that should only happen after all
>> the devices have attached so it can know where to put the buffer. So we
>> shouldn't expect any CPU access to buffers before all the devices are
>> attached and mapped, right?
>>
>
> Here is an example where CPU access can happen later in Android.
>
> Camera device records video -> software post processing -> video device
> (who does compression of raw data) and writes to a file
>
> In this example assume the buffer is cached and the devices are not
> IO-coherent (quite common).
>

This is the start of the problem, having cached mappings of memory that
is also being accessed non-coherently is going to cause issues one way
or another. On top of the speculative cache fills that have to be
constantly fought back against with CMOs like below; some coherent
interconnects behave badly when you mix coherent and non-coherent access
(snoop filters get messed up).

The solution is to either always have the addresses marked non-coherent
(like device memory, no-map carveouts), or if you really want to use
regular system memory allocated at runtime, then all cached mappings of
it need to be dropped, even the kernel logical address (area as painful
as that would be).

> ION buffer is allocated.
>
> //Camera device records video
> dma_buf_attach
> dma_map_attachment (buffer needs to be cleaned)

Why does the buffer need to be cleaned here? I just got through reading
the thread linked by Laura in the other reply. I do like +Brian's
suggestion of tracking if the buffer has had CPU access since the last
time and only flushing the cache if it has. As unmapped heaps never get
CPU mapped this would never be the case for unmapped heaps, it solves my
problem.

> [camera device writes to buffer]
> dma_buf_unmap_attachment (buffer needs to be invalidated)

It doesn't know there will be any further CPU access, it could get freed
after this for all we know, the invalidate can be saved until the CPU
requests access again.

> dma_buf_detach (device cannot stay attached because it is being sent down
> the pipeline and Camera doesn't know the end of the use case)
>

This seems like a broken use-case, I understand the desire to keep
everything as modular as possible and separate the steps, but at this
point no one owns this buffers backing memory, not the CPU or any
device. I would go as far as to say DMA-BUF should be free now to
de-allocate the backing storage if it wants, that way it could get ready
for the next attachment, which may change the required backing memory
completely.

All devices should attach before the first mapping, and only let go
after the task is complete, otherwise this buffers data needs copied off
to a different location or the CPU needs to take ownership in-between.

> //buffer is send down the pipeline
>
> // Usersapce software post processing occurs
> mmap buffer

Perhaps the invalidate should happen here in mmap.

> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> devices attached to buffer

And that should be okay, mmap does the sync, and if no devices are
attached nothing could have changed the underlying memory in the
mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.

> [CPU reads/writes to the buffer]
> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> devices attached to buffer
> munmap buffer
>
> //buffer is send down the pipeline
> // Buffer is send to video device (who does compression of raw data) and
> writes to a file
> dma_buf_attach
> dma_map_attachment (buffer needs to be cleaned)
> [video device writes to buffer]
> dma_buf_unmap_attachment
> dma_buf_detach (device cannot stay attached because it is being sent down
> the pipeline and Video doesn't know the end of the use case)
>
>
>
>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>> access then there is no requirement (that I am aware of) for you to call
>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>
>>
>> If I am not doing any CPU access then why do I need CPU cache
>> maintenance on the buffer?
>>
>
> Because ION no longer provides DMA ready memory.
> Take the above example.
>
> ION allocates memory from buddy allocator and requests zeroing.
> Zeros are written to the cache.
>
> You pass the buffer to the camera device which is not IO-coherent.
> The camera devices writes directly to the buffer in DDR.
> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> evicted from the cache, this zero overwrites data the camera device has
> written which corrupts your data.
>

The zeroing *is* a CPU access, therefor it should handle the needed CMO
for CPU access at the time of zeroing.

Andrew

> Liam
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-16 09:33:20

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> On 1/15/19 11:45 AM, Liam Mark wrote:
>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>
>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>
>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>>
>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>> ---
>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>>
>>>>> table = a->table;
>>>>>
>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>> - direction))
>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>
>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>> dma_buf_attach then there won't have been a device attached so the calls
>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>
>>>
>>> That should be okay though, if you have no attachments (or all
>>> attachments are IO-coherent) then there is no need for cache
>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>> is attached later after data has already been written. Does that
>>> sequence need supporting?
>>
>> Yes, but also I think there are cases where CPU access can happen before
>> in Android, but I will focus on later for now.
>>
>>> DMA-BUF doesn't have to allocate the backing
>>> memory until map_dma_buf() time, and that should only happen after all
>>> the devices have attached so it can know where to put the buffer. So we
>>> shouldn't expect any CPU access to buffers before all the devices are
>>> attached and mapped, right?
>>>
>>
>> Here is an example where CPU access can happen later in Android.
>>
>> Camera device records video -> software post processing -> video device
>> (who does compression of raw data) and writes to a file
>>
>> In this example assume the buffer is cached and the devices are not
>> IO-coherent (quite common).
>>
>
> This is the start of the problem, having cached mappings of memory that
> is also being accessed non-coherently is going to cause issues one way
> or another. On top of the speculative cache fills that have to be
> constantly fought back against with CMOs like below; some coherent
> interconnects behave badly when you mix coherent and non-coherent access
> (snoop filters get messed up).
>
> The solution is to either always have the addresses marked non-coherent
> (like device memory, no-map carveouts), or if you really want to use
> regular system memory allocated at runtime, then all cached mappings of
> it need to be dropped, even the kernel logical address (area as painful
> as that would be).
>
>> ION buffer is allocated.
>>
>> //Camera device records video
>> dma_buf_attach
>> dma_map_attachment (buffer needs to be cleaned)
>
> Why does the buffer need to be cleaned here? I just got through reading
> the thread linked by Laura in the other reply. I do like +Brian's

Actually +Brian this time :)

> suggestion of tracking if the buffer has had CPU access since the last
> time and only flushing the cache if it has. As unmapped heaps never get
> CPU mapped this would never be the case for unmapped heaps, it solves my
> problem.
>
>> [camera device writes to buffer]
>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>
> It doesn't know there will be any further CPU access, it could get freed
> after this for all we know, the invalidate can be saved until the CPU
> requests access again.
>
>> dma_buf_detach (device cannot stay attached because it is being sent down
>> the pipeline and Camera doesn't know the end of the use case)
>>
>
> This seems like a broken use-case, I understand the desire to keep
> everything as modular as possible and separate the steps, but at this
> point no one owns this buffers backing memory, not the CPU or any
> device. I would go as far as to say DMA-BUF should be free now to
> de-allocate the backing storage if it wants, that way it could get ready
> for the next attachment, which may change the required backing memory
> completely.
>
> All devices should attach before the first mapping, and only let go
> after the task is complete, otherwise this buffers data needs copied off
> to a different location or the CPU needs to take ownership in-between.
>
>> //buffer is send down the pipeline
>>
>> // Usersapce software post processing occurs
>> mmap buffer
>
> Perhaps the invalidate should happen here in mmap.
>
>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>> devices attached to buffer
>
> And that should be okay, mmap does the sync, and if no devices are
> attached nothing could have changed the underlying memory in the
> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>
>> [CPU reads/writes to the buffer]
>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>> devices attached to buffer
>> munmap buffer
>>
>> //buffer is send down the pipeline
>> // Buffer is send to video device (who does compression of raw data) and
>> writes to a file
>> dma_buf_attach
>> dma_map_attachment (buffer needs to be cleaned)
>> [video device writes to buffer]
>> dma_buf_unmap_attachment
>> dma_buf_detach (device cannot stay attached because it is being sent down
>> the pipeline and Video doesn't know the end of the use case)
>>
>>
>>
>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>>> access then there is no requirement (that I am aware of) for you to call
>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>
>>>
>>> If I am not doing any CPU access then why do I need CPU cache
>>> maintenance on the buffer?
>>>
>>
>> Because ION no longer provides DMA ready memory.
>> Take the above example.
>>
>> ION allocates memory from buddy allocator and requests zeroing.
>> Zeros are written to the cache.
>>
>> You pass the buffer to the camera device which is not IO-coherent.
>> The camera devices writes directly to the buffer in DDR.
>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
>> evicted from the cache, this zero overwrites data the camera device has
>> written which corrupts your data.
>>
>
> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> for CPU access at the time of zeroing.
>
> Andrew
>
>> Liam
>>
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>

2019-01-16 10:17:15

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On 1/15/19 9:47 AM, Andrew F. Davis wrote:
> On 1/14/19 8:39 PM, Laura Abbott wrote:
>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>> Hello all,
>>>
>>> This is a set of (hopefully) non-controversial cleanups for the ION
>>> framework and current set of heaps. These were found as I start to
>>> familiarize myself with the framework to help in whatever way I
>>> can in getting all this up to the standards needed for de-staging.
>>>
>>> I would like to get some ideas of what is left to work on to get ION
>>> out of staging. Has there been some kind of agreement on what ION should
>>> eventually end up being? To me it looks like it is being whittled away at
>>> to it's most core functions. To me that is looking like being a DMA-BUF
>>> user-space front end, simply advertising available memory backings in a
>>> system and providing allocations as DMA-BUF handles. If this is the case
>>> then it looks close to being ready to me at least, but I would love to
>>> hear any other opinions and concerns.
>>>
>>
>> Yes, at this point the only functionality that people are really
>> depending on is the ability to allocate a dma_buf easily from userspace.
>>
>>> Back to this patchset, the last patch may be a bit different than the
>>> others, it adds an unmapped heaps type and creation helper. I wanted to
>>> get this in to show off another heap type and maybe some issues we may
>>> have with the current ION framework. The unmapped heap is used when the
>>> backing memory should not (or cannot) be touched. Currently this kind
>>> of heap is used for firewalled secure memory that can be allocated like
>>> normal heap memory but only used by secure devices (OP-TEE, crypto HW,
>>> etc). It is basically just copied from the "carveout" heap type with the
>>> only difference being it is not mappable to userspace and we do not clear
>>> the memory (as we should not map it either). So should this really be a
>>> new heap type? Or maybe advertised as a carveout heap but with an
>>> additional allocation flag? Perhaps we do away with "types" altogether
>>> and just have flags, coherent/non-coherent, mapped/unmapped, etc.
>>>
>>> Maybe more thinking will be needed afterall..
>>>
>>
>> So the cleanup looks okay (I need to finish reviewing) but I'm not a
>> fan of adding another heaptype without solving the problem of adding
>> some sort of devicetree binding or other method of allocating and
>> placing Ion heaps. That plus uncached buffers are one of the big
>> open problems that need to be solved for destaging Ion. See
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> for some background on that problem.
>>
>
> I'm under the impression that adding heaps like carveouts/chunk will be
> rather system specific and so do not lend themselves well to a universal
> DT style exporter. For instance a carveout memory space can be reported
> by a device at runtime, then the driver managing that device should go
> and use the carveout heap helpers to export that heap. If this is the
> case then I'm not sure it is a problem for the ION core framework to
> solve, but rather the users of it to figure out how best to create the
> various heaps. All Ion needs to do is allow exporting and advertising
> them IMHO.
>

I think it is a problem for the Ion core framework to take care of.
Ion is useless if you don't actually have the heaps. Nobody has
actually gotten a full Ion solution end-to-end with a carveout heap
working in mainline because any proposals have been rejected. I think
we need at least one example in mainline of how creating a carveout
heap would work.

> Thanks for the background thread link, I've been looking for some info
> on current status of all this and "ion" is a bit hard to search the
> lists for. The core reason for posting this cleanup series is to throw
> my hat into the ring of all this Ion work and start getting familiar
> with the pending issues. The last two patches are not all that important
> to get in right now.
>
> In that thread you linked above, it seems we may have arrived at a
> similar problem for different reasons. I think the root issue is the Ion
> core makes too many assumptions about the heap memory. My proposal would
> be to allow the heap exporters more control over the DMA-BUF ops, maybe
> even going as far as letting them provide their own complete struct
> dma_buf_ops.
>
> Let me give an example where I think this is going to be useful. We have
> the classic constraint solving problem on our SoCs. Our SoCs are full of
> various coherent and non-coherent devices, some require contiguous
> memory allocations, others have in-line IOMMUs so can operate on
> non-contiguous, etc..
>
> DMA-BUF has a solution designed in for this we can use, namely
> allocation at map time after all the attachments have come in. The
> checking of each attached device to find the right backing memory is
> something the DMA-BUF exporter has to do, and so this SoC specific logic
> would have to be added to each exporting framework (DRM, V4L2, etc),
> unless we have one unified system exporter everyone uses, Ion.
>

That's how dmabuf is supposed to work in theory but in practice we
also have the case of userspace allocates memory, mmaps, and then
a device attaches to it. The issue is we end up having to do work
and make decisions before all devices are actually attached.

> Then each system can define one (maybe typeless) heap, the correct
> backing type is system specific anyway, so let the system specific
> backing logic in the unified system exporter heap handle picking that.
> To allow that heaps need direct control of dma_buf_ops.
>
> Direct heap control of dma_buf_ops also fixes the cache/non-cache issue,
> and my unmapped memory issue, each heap type handles the quirks of its
> backing storage in its own way, instead of trying to find some one size
> fits all memory operations like we are doing now.
>

I don't think this is an issue of one-size fits all. We have flags
to differentiate between cached and uncached paths, the issue is
that doing the synchronization for uncached buffers is difficult.

I'm just not sure how an extra set of dma_buf ops actually solves
the problem of needing to synchronize alias mappings.

Thanks,
Laura

> We can provide helpers for the simple heap types still, but with this
> much of the heavy lifting moves out of the Ion core framework making it
> much more simple, something I think it will need for de-staging.
>
> Anyway, I might be completely off base in my direction here, just let me
> know :)
>


2019-01-16 10:37:01

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> On 1/15/19 11:45 AM, Liam Mark wrote:
>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>
>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>
>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>>
>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>> ---
>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>>
>>>>> table = a->table;
>>>>>
>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>> - direction))
>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>
>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>> dma_buf_attach then there won't have been a device attached so the calls
>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>
>>>
>>> That should be okay though, if you have no attachments (or all
>>> attachments are IO-coherent) then there is no need for cache
>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>> is attached later after data has already been written. Does that
>>> sequence need supporting?
>>
>> Yes, but also I think there are cases where CPU access can happen before
>> in Android, but I will focus on later for now.
>>
>>> DMA-BUF doesn't have to allocate the backing
>>> memory until map_dma_buf() time, and that should only happen after all
>>> the devices have attached so it can know where to put the buffer. So we
>>> shouldn't expect any CPU access to buffers before all the devices are
>>> attached and mapped, right?
>>>
>>
>> Here is an example where CPU access can happen later in Android.
>>
>> Camera device records video -> software post processing -> video device
>> (who does compression of raw data) and writes to a file
>>
>> In this example assume the buffer is cached and the devices are not
>> IO-coherent (quite common).
>>
>
> This is the start of the problem, having cached mappings of memory that
> is also being accessed non-coherently is going to cause issues one way
> or another. On top of the speculative cache fills that have to be
> constantly fought back against with CMOs like below; some coherent
> interconnects behave badly when you mix coherent and non-coherent access
> (snoop filters get messed up).
>
> The solution is to either always have the addresses marked non-coherent
> (like device memory, no-map carveouts), or if you really want to use
> regular system memory allocated at runtime, then all cached mappings of
> it need to be dropped, even the kernel logical address (area as painful
> as that would be).
>

I agree it's broken, hence my desire to remove it :)

The other problem is that uncached buffers are being used for
performance reason so anything that would involve getting
rid of the logical address would probably negate any performance
benefit.

>> ION buffer is allocated.
>>
>> //Camera device records video
>> dma_buf_attach
>> dma_map_attachment (buffer needs to be cleaned)
>
> Why does the buffer need to be cleaned here? I just got through reading
> the thread linked by Laura in the other reply. I do like +Brian's
> suggestion of tracking if the buffer has had CPU access since the last
> time and only flushing the cache if it has. As unmapped heaps never get
> CPU mapped this would never be the case for unmapped heaps, it solves my
> problem.
>
>> [camera device writes to buffer]
>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>
> It doesn't know there will be any further CPU access, it could get freed
> after this for all we know, the invalidate can be saved until the CPU
> requests access again.
>
>> dma_buf_detach (device cannot stay attached because it is being sent down
>> the pipeline and Camera doesn't know the end of the use case)
>>
>
> This seems like a broken use-case, I understand the desire to keep
> everything as modular as possible and separate the steps, but at this
> point no one owns this buffers backing memory, not the CPU or any
> device. I would go as far as to say DMA-BUF should be free now to
> de-allocate the backing storage if it wants, that way it could get ready
> for the next attachment, which may change the required backing memory
> completely.
>
> All devices should attach before the first mapping, and only let go
> after the task is complete, otherwise this buffers data needs copied off
> to a different location or the CPU needs to take ownership in-between.
>

Maybe it's broken but it's the status quo and we spent a good
amount of time at plumbers concluding there isn't a great way
to fix it :/

>> //buffer is send down the pipeline
>>
>> // Usersapce software post processing occurs
>> mmap buffer
>
> Perhaps the invalidate should happen here in mmap.
>
>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>> devices attached to buffer
>
> And that should be okay, mmap does the sync, and if no devices are
> attached nothing could have changed the underlying memory in the
> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>
>> [CPU reads/writes to the buffer]
>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>> devices attached to buffer
>> munmap buffer
>>
>> //buffer is send down the pipeline
>> // Buffer is send to video device (who does compression of raw data) and
>> writes to a file
>> dma_buf_attach
>> dma_map_attachment (buffer needs to be cleaned)
>> [video device writes to buffer]
>> dma_buf_unmap_attachment
>> dma_buf_detach (device cannot stay attached because it is being sent down
>> the pipeline and Video doesn't know the end of the use case)
>>
>>
>>
>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>>> access then there is no requirement (that I am aware of) for you to call
>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>
>>>
>>> If I am not doing any CPU access then why do I need CPU cache
>>> maintenance on the buffer?
>>>
>>
>> Because ION no longer provides DMA ready memory.
>> Take the above example.
>>
>> ION allocates memory from buddy allocator and requests zeroing.
>> Zeros are written to the cache.
>>
>> You pass the buffer to the camera device which is not IO-coherent.
>> The camera devices writes directly to the buffer in DDR.
>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
>> evicted from the cache, this zero overwrites data the camera device has
>> written which corrupts your data.
>>
>
> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> for CPU access at the time of zeroing.
>
> Andrew
>
>> Liam
>>
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>


2019-01-16 10:44:15

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper

On 1/15/19 10:43 AM, Laura Abbott wrote:
> On 1/15/19 7:58 AM, Andrew F. Davis wrote:
>> On 1/14/19 8:32 PM, Laura Abbott wrote:
>>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>>> The "unmapped" heap is very similar to the carveout heap except
>>>> the backing memory is presumed to be unmappable by the host, in
>>>> my specific case due to firewalls. This memory can still be
>>>> allocated from and used by devices that do have access to the
>>>> backing memory.
>>>>
>>>> Based originally on the secure/unmapped heap from Linaro for
>>>> the OP-TEE SDP implementation, this was re-written to match
>>>> the carveout heap helper code.
>>>>
>>>> Suggested-by: Etienne Carriere <[email protected]>
>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>> ---
>>>>    drivers/staging/android/ion/Kconfig           |  10 ++
>>>>    drivers/staging/android/ion/Makefile          |   1 +
>>>>    drivers/staging/android/ion/ion.h             |  16 +++
>>>>    .../staging/android/ion/ion_unmapped_heap.c   | 123 ++++++++++++++++++
>>>>    drivers/staging/android/uapi/ion.h            |   3 +
>>>>    5 files changed, 153 insertions(+)
>>>>    create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>>>>
>>>> diff --git a/drivers/staging/android/ion/Kconfig
>>>> b/drivers/staging/android/ion/Kconfig
>>>> index 0fdda6f62953..a117b8b91b14 100644
>>>> --- a/drivers/staging/android/ion/Kconfig
>>>> +++ b/drivers/staging/android/ion/Kconfig
>>>> @@ -42,3 +42,13 @@ config ION_CMA_HEAP
>>>>          Choose this option to enable CMA heaps with Ion. This heap is
>>>> backed
>>>>          by the Contiguous Memory Allocator (CMA). If your system has
>>>> these
>>>>          regions, you should say Y here.
>>>> +
>>>> +config ION_UNMAPPED_HEAP
>>>> +    bool "ION unmapped heap support"
>>>> +    depends on ION
>>>> +    help
>>>> +      Choose this option to enable UNMAPPED heaps with Ion. This heap is
>>>> +      backed in specific memory pools, carveout from the Linux memory.
>>>> +      Unlike carveout heaps these are assumed to be not mappable by
>>>> +      kernel or user-space.
>>>> +      Unless you know your system has these regions, you should say N
>>>> here.
>>>> diff --git a/drivers/staging/android/ion/Makefile
>>>> b/drivers/staging/android/ion/Makefile
>>>> index 17f3a7569e3d..c71a1f3de581 100644
>>>> --- a/drivers/staging/android/ion/Makefile
>>>> +++ b/drivers/staging/android/ion/Makefile
>>>> @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o
>>>> ion_page_pool.o
>>>>    obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
>>>>    obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
>>>>    obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o
>>>> +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o
>>>> diff --git a/drivers/staging/android/ion/ion.h
>>>> b/drivers/staging/android/ion/ion.h
>>>> index 97b2876b165a..ce74332018ba 100644
>>>> --- a/drivers/staging/android/ion/ion.h
>>>> +++ b/drivers/staging/android/ion/ion.h
>>>> @@ -341,4 +341,20 @@ static inline struct ion_heap
>>>> *ion_chunk_heap_create(phys_addr_t base, size_t si
>>>>    }
>>>>    #endif
>>>>    +#ifdef CONFIG_ION_UNMAPPED_HEAP
>>>> +/**
>>>> + * ion_unmapped_heap_create
>>>> + * @base:        base address of carveout memory
>>>> + * @size:        size of carveout memory region
>>>> + *
>>>> + * Creates an unmapped ion_heap using the passed in data
>>>> + */
>>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t
>>>> size);
>>>> +#else
>>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
>>>> +{
>>>> +    return ERR_PTR(-ENODEV);
>>>> +}
>>>> +#endif
>>>> +
>>>>    #endif /* _ION_H */
>>>> diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c
>>>> b/drivers/staging/android/ion/ion_unmapped_heap.c
>>>> new file mode 100644
>>>> index 000000000000..7602b659c2ec
>>>> --- /dev/null
>>>> +++ b/drivers/staging/android/ion/ion_unmapped_heap.c
>>>> @@ -0,0 +1,123 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * ION Memory Allocator unmapped heap helper
>>>> + *
>>>> + * Copyright (C) 2015-2016 Texas Instruments Incorporated -
>>>> http://www.ti.com/
>>>> + *    Andrew F. Davis <[email protected]>
>>>> + *
>>>> + * ION "unmapped" heaps are physical memory heaps not by default
>>>> mapped into
>>>> + * a virtual address space. The buffer owner can explicitly request
>>>> kernel
>>>> + * space mappings but the underlying memory may still not be
>>>> accessible for
>>>> + * various reasons, such as firewalls.
>>>> + */
>>>> +
>>>> +#include <linux/err.h>
>>>> +#include <linux/genalloc.h>
>>>> +#include <linux/scatterlist.h>
>>>> +#include <linux/slab.h>
>>>> +
>>>> +#include "ion.h"
>>>> +
>>>> +#define ION_UNMAPPED_ALLOCATE_FAIL -1
>>>> +
>>>> +struct ion_unmapped_heap {
>>>> +    struct ion_heap heap;
>>>> +    struct gen_pool *pool;
>>>> +};
>>>> +
>>>> +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap,
>>>> +                     unsigned long size)
>>>> +{
>>>> +    struct ion_unmapped_heap *unmapped_heap =
>>>> +        container_of(heap, struct ion_unmapped_heap, heap);
>>>> +    unsigned long offset;
>>>> +
>>>> +    offset = gen_pool_alloc(unmapped_heap->pool, size);
>>>> +    if (!offset)
>>>> +        return ION_UNMAPPED_ALLOCATE_FAIL;
>>>> +
>>>> +    return offset;
>>>> +}
>>>> +
>>>> +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr,
>>>> +                  unsigned long size)
>>>> +{
>>>> +    struct ion_unmapped_heap *unmapped_heap =
>>>> +        container_of(heap, struct ion_unmapped_heap, heap);
>>>> +
>>>> +    gen_pool_free(unmapped_heap->pool, addr, size);
>>>> +}
>>>> +
>>>> +static int ion_unmapped_heap_allocate(struct ion_heap *heap,
>>>> +                      struct ion_buffer *buffer,
>>>> +                      unsigned long size,
>>>> +                      unsigned long flags)
>>>> +{
>>>> +    struct sg_table *table;
>>>> +    phys_addr_t paddr;
>>>> +    int ret;
>>>> +
>>>> +    table = kmalloc(sizeof(*table), GFP_KERNEL);
>>>> +    if (!table)
>>>> +        return -ENOMEM;
>>>> +    ret = sg_alloc_table(table, 1, GFP_KERNEL);
>>>> +    if (ret)
>>>> +        goto err_free;
>>>> +
>>>> +    paddr = ion_unmapped_allocate(heap, size);
>>>> +    if (paddr == ION_UNMAPPED_ALLOCATE_FAIL) {
>>>> +        ret = -ENOMEM;
>>>> +        goto err_free_table;
>>>> +    }
>>>> +
>>>> +    sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(paddr)), size, 0);
>>>> +    buffer->sg_table = table;
>>>> +
>>>
>>>
>>> If this memory is actually unmapped this is not going to work because
>>> the struct page will not be valid.
>>>
>>
>> If it will never get mapped then it doesn't need a valid struct page as
>> far as I can tell. We only use it as a marker to where the start of
>> backing memory is, and that is calculated based on the struct page
>> pointer address, not its contents.
>>
>
> You can't rely on pfn_to_page returning a valid page pointer if the
> pfn doesn't point to reserved memory. Even if you aren't relying
> on the contents, that's no guarantee you can use that to to get
> the valid pfn back. You can get away with that sometimes but it's
> not correct to rely on it all the time.
>

I found https://lore.kernel.org/lkml/[email protected]/T/#u
which I think is closer to what we might want to be looking at.

>>>> +    return 0;
>>>> +
>>>> +err_free_table:
>>>> +    sg_free_table(table);
>>>> +err_free:
>>>> +    kfree(table);
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void ion_unmapped_heap_free(struct ion_buffer *buffer)
>>>> +{
>>>> +    struct ion_heap *heap = buffer->heap;
>>>> +    struct sg_table *table = buffer->sg_table;
>>>> +    struct page *page = sg_page(table->sgl);
>>>> +    phys_addr_t paddr = PFN_PHYS(page_to_pfn(page));
>>>> +
>>>> +    ion_unmapped_free(heap, paddr, buffer->size);
>>>> +    sg_free_table(buffer->sg_table);
>>>> +    kfree(buffer->sg_table);
>>>> +}
>>>> +
>>>> +static struct ion_heap_ops unmapped_heap_ops = {
>>>> +    .allocate = ion_unmapped_heap_allocate,
>>>> +    .free = ion_unmapped_heap_free,
>>>> +    /* no .map_user, user mapping of unmapped heaps not allowed */
>>>> +    .map_kernel = ion_heap_map_kernel,
>>>> +    .unmap_kernel = ion_heap_unmap_kernel,
>>>> +};
>>>> +
>>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size)
>>>> +{
>>>> +    struct ion_unmapped_heap *unmapped_heap;
>>>> +
>>>> +    unmapped_heap = kzalloc(sizeof(*unmapped_heap), GFP_KERNEL);
>>>> +    if (!unmapped_heap)
>>>> +        return ERR_PTR(-ENOMEM);
>>>> +
>>>> +    unmapped_heap->pool = gen_pool_create(PAGE_SHIFT, -1);
>>>> +    if (!unmapped_heap->pool) {
>>>> +        kfree(unmapped_heap);
>>>> +        return ERR_PTR(-ENOMEM);
>>>> +    }
>>>> +    gen_pool_add(unmapped_heap->pool, base, size, -1);
>>>> +    unmapped_heap->heap.ops = &unmapped_heap_ops;
>>>> +    unmapped_heap->heap.type = ION_HEAP_TYPE_UNMAPPED;
>>>> +
>>>> +    return &unmapped_heap->heap;
>>>> +}
>>>> diff --git a/drivers/staging/android/uapi/ion.h
>>>> b/drivers/staging/android/uapi/ion.h
>>>> index 5d7009884c13..d5f98bc5f340 100644
>>>> --- a/drivers/staging/android/uapi/ion.h
>>>> +++ b/drivers/staging/android/uapi/ion.h
>>>> @@ -19,6 +19,8 @@
>>>>     *                 carveout heap, allocations are physically
>>>>     *                 contiguous
>>>>     * @ION_HEAP_TYPE_DMA:         memory allocated via DMA API
>>>> + * @ION_HEAP_TYPE_UNMAPPED:     memory not intended to be mapped into
>>>> the
>>>> + *                 linux address space unless for debug cases
>>>>     * @ION_NUM_HEAPS:         helper for iterating over heaps, a bit mask
>>>>     *                 is used to identify the heaps, so only 32
>>>>     *                 total heap types are supported
>>>> @@ -29,6 +31,7 @@ enum ion_heap_type {
>>>>        ION_HEAP_TYPE_CARVEOUT,
>>>>        ION_HEAP_TYPE_CHUNK,
>>>>        ION_HEAP_TYPE_DMA,
>>>> +    ION_HEAP_TYPE_UNMAPPED,
>>>>        ION_HEAP_TYPE_CUSTOM, /*
>>>>                       * must be last so device specific heaps always
>>>>                       * are at the end of this enum
>>>>
>>>
>>> Overall this seems way too similar to the carveout heap
>>> to justify adding another heap type. It also still missing
>>> the part of where exactly you call ion_unmapped_heap_create.
>>> Figuring that out is one of the blocking items for moving
>>> Ion out of staging.
>>>
>>
>> I agree with this being almost a 1:1 copy of the carveout heap, I'm just
>> not sure of a good way to do this without a new heap type. Adding flags
>> to the existing carveout type seem a bit messy. Plus then those flags
>> should be valid for other heap types, gets complicated quickly..
>>
>> I'll reply to the second part in your other top level response.
>>
>> Andrew
>>
>>> Thanks,
>>> Laura
>


2019-01-16 22:50:48

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

Hi :-)

On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> > On 1/15/19 11:45 AM, Liam Mark wrote:
> >> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>
> >>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>
> >>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>>>>
> >>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>> ---
> >>>>> drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> >>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>>>
> >>>>> table = a->table;
> >>>>>
> >>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>> - direction))
> >>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>
> >>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> >>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>> dma_buf_attach then there won't have been a device attached so the calls
> >>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>
> >>>
> >>> That should be okay though, if you have no attachments (or all
> >>> attachments are IO-coherent) then there is no need for cache
> >>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>> is attached later after data has already been written. Does that
> >>> sequence need supporting?
> >>
> >> Yes, but also I think there are cases where CPU access can happen before
> >> in Android, but I will focus on later for now.
> >>
> >>> DMA-BUF doesn't have to allocate the backing
> >>> memory until map_dma_buf() time, and that should only happen after all
> >>> the devices have attached so it can know where to put the buffer. So we
> >>> shouldn't expect any CPU access to buffers before all the devices are
> >>> attached and mapped, right?
> >>>
> >>
> >> Here is an example where CPU access can happen later in Android.
> >>
> >> Camera device records video -> software post processing -> video device
> >> (who does compression of raw data) and writes to a file
> >>
> >> In this example assume the buffer is cached and the devices are not
> >> IO-coherent (quite common).
> >>
> >
> > This is the start of the problem, having cached mappings of memory that
> > is also being accessed non-coherently is going to cause issues one way
> > or another. On top of the speculative cache fills that have to be
> > constantly fought back against with CMOs like below; some coherent
> > interconnects behave badly when you mix coherent and non-coherent access
> > (snoop filters get messed up).
> >
> > The solution is to either always have the addresses marked non-coherent
> > (like device memory, no-map carveouts), or if you really want to use
> > regular system memory allocated at runtime, then all cached mappings of
> > it need to be dropped, even the kernel logical address (area as painful
> > as that would be).

Ouch :-( I wasn't aware about these potential interconnect issues. How
"real" is that? It seems that we aren't really hitting that today on
real devices.

> >
> >> ION buffer is allocated.
> >>
> >> //Camera device records video
> >> dma_buf_attach
> >> dma_map_attachment (buffer needs to be cleaned)
> >
> > Why does the buffer need to be cleaned here? I just got through reading
> > the thread linked by Laura in the other reply. I do like +Brian's
>
> Actually +Brian this time :)
>
> > suggestion of tracking if the buffer has had CPU access since the last
> > time and only flushing the cache if it has. As unmapped heaps never get
> > CPU mapped this would never be the case for unmapped heaps, it solves my
> > problem.
> >
> >> [camera device writes to buffer]
> >> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >
> > It doesn't know there will be any further CPU access, it could get freed
> > after this for all we know, the invalidate can be saved until the CPU
> > requests access again.

We don't have any API to allow the invalidate to happen on CPU access
if all devices already detached. We need a struct device pointer to
give to the DMA API, otherwise on arm64 there'll be no invalidate.

I had a chat with a few people internally after the previous
discussion with Liam. One suggestion was to use
DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
one other device attached (guarantees that we can do an invalidate in
the future if begin_cpu_access is called). If the last device
detaches, do a sync then.

Conversely, in map_dma_buf, we would track if there was any CPU access
and use/skip the sync appropriately.

I did start poking the code to check out how that would look, but then
Christmas happened and I'm still catching back up.

> >
> >> dma_buf_detach (device cannot stay attached because it is being sent down
> >> the pipeline and Camera doesn't know the end of the use case)
> >>
> >
> > This seems like a broken use-case, I understand the desire to keep
> > everything as modular as possible and separate the steps, but at this
> > point no one owns this buffers backing memory, not the CPU or any
> > device. I would go as far as to say DMA-BUF should be free now to
> > de-allocate the backing storage if it wants, that way it could get ready
> > for the next attachment, which may change the required backing memory
> > completely.
> >
> > All devices should attach before the first mapping, and only let go
> > after the task is complete, otherwise this buffers data needs copied off
> > to a different location or the CPU needs to take ownership in-between.
> >

Yeah.. that's certainly the theory. Are there any DMA-BUF
implementations which actually do that? I hear it quoted a lot,
because that's what the docs say - but if the reality doesn't match
it, maybe we should change the docs.

> >> //buffer is send down the pipeline
> >>
> >> // Usersapce software post processing occurs
> >> mmap buffer
> >
> > Perhaps the invalidate should happen here in mmap.
> >
> >> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >> devices attached to buffer
> >
> > And that should be okay, mmap does the sync, and if no devices are
> > attached nothing could have changed the underlying memory in the
> > mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.

Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
Liam was saying that it's too painful for them to do that every time a
device unmaps - when in many cases (device->device, no CPU) it's not
needed.

> >
> >> [CPU reads/writes to the buffer]
> >> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >> devices attached to buffer
> >> munmap buffer
> >>
> >> //buffer is send down the pipeline
> >> // Buffer is send to video device (who does compression of raw data) and
> >> writes to a file
> >> dma_buf_attach
> >> dma_map_attachment (buffer needs to be cleaned)
> >> [video device writes to buffer]
> >> dma_buf_unmap_attachment
> >> dma_buf_detach (device cannot stay attached because it is being sent down
> >> the pipeline and Video doesn't know the end of the use case)
> >>
> >>
> >>
> >>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> >>>> access then there is no requirement (that I am aware of) for you to call
> >>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> >>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> >>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>
> >>>
> >>> If I am not doing any CPU access then why do I need CPU cache
> >>> maintenance on the buffer?
> >>>
> >>
> >> Because ION no longer provides DMA ready memory.
> >> Take the above example.
> >>
> >> ION allocates memory from buddy allocator and requests zeroing.
> >> Zeros are written to the cache.
> >>
> >> You pass the buffer to the camera device which is not IO-coherent.
> >> The camera devices writes directly to the buffer in DDR.
> >> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> >> evicted from the cache, this zero overwrites data the camera device has
> >> written which corrupts your data.
> >>
> >
> > The zeroing *is* a CPU access, therefor it should handle the needed CMO
> > for CPU access at the time of zeroing.
> >

Actually that should be at the point of the first non-coherent device
mapping the buffer right? No point in doing CMO if the future accesses
are coherent.

Cheers,
-Brian

> > Andrew
> >
> >> Liam
> >>
> >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >> a Linux Foundation Collaborative Project
> >>

2019-01-16 22:51:05

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH 11/14] staging: android: ion: Allow heap name to be null

Hi Andrew,

On Fri, Jan 11, 2019 at 12:05:20PM -0600, Andrew F. Davis wrote:
> The heap name can be used for debugging but otherwise does not seem
> to be required and no other part of the code will fail if left NULL
> except here. We can make it required and check for it at some point,
> for now lets just prevent this from causing a NULL pointer exception.

I'm not so keen on this one. In the "new" API with heap querying, the
name string is the only way to identify the heap. I think Laura
mentioned at XDC2017 that it was expected that userspace should use
the strings to find the heap they want.

I'd actually be in favor of making the string a more strict UAPI than
allowing it to be empty (at least, if heap name strings is the API we
decide on for identifying heaps - which is another discussion).

Cheers,
-Brian

>
> Signed-off-by: Andrew F. Davis <[email protected]>
> ---
> drivers/staging/android/ion/ion.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> index bba5f682bc25..14e48f6eb734 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -467,7 +467,7 @@ static int ion_query_heaps(struct ion_heap_query *query)
> max_cnt = query->cnt;
>
> plist_for_each_entry(heap, &dev->heaps, node) {
> - strncpy(hdata.name, heap->name, MAX_HEAP_NAME);
> + strncpy(hdata.name, heap->name ?: "(null)", MAX_HEAP_NAME);
> hdata.name[sizeof(hdata.name) - 1] = '\0';
> hdata.type = heap->type;
> hdata.heap_id = heap->id;
> --
> 2.19.1
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

2019-01-16 22:54:58

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On 1/15/19 12:58 PM, Laura Abbott wrote:
> On 1/15/19 9:47 AM, Andrew F. Davis wrote:
>> On 1/14/19 8:39 PM, Laura Abbott wrote:
>>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>>> Hello all,
>>>>
>>>> This is a set of (hopefully) non-controversial cleanups for the ION
>>>> framework and current set of heaps. These were found as I start to
>>>> familiarize myself with the framework to help in whatever way I
>>>> can in getting all this up to the standards needed for de-staging.
>>>>
>>>> I would like to get some ideas of what is left to work on to get ION
>>>> out of staging. Has there been some kind of agreement on what ION
>>>> should
>>>> eventually end up being? To me it looks like it is being whittled
>>>> away at
>>>> to it's most core functions. To me that is looking like being a DMA-BUF
>>>> user-space front end, simply advertising available memory backings in a
>>>> system and providing allocations as DMA-BUF handles. If this is the
>>>> case
>>>> then it looks close to being ready to me at least, but I would love to
>>>> hear any other opinions and concerns.
>>>>
>>>
>>> Yes, at this point the only functionality that people are really
>>> depending on is the ability to allocate a dma_buf easily from userspace.
>>>
>>>> Back to this patchset, the last patch may be a bit different than the
>>>> others, it adds an unmapped heaps type and creation helper. I wanted to
>>>> get this in to show off another heap type and maybe some issues we may
>>>> have with the current ION framework. The unmapped heap is used when the
>>>> backing memory should not (or cannot) be touched. Currently this kind
>>>> of heap is used for firewalled secure memory that can be allocated like
>>>> normal heap memory but only used by secure devices (OP-TEE, crypto HW,
>>>> etc). It is basically just copied from the "carveout" heap type with
>>>> the
>>>> only difference being it is not mappable to userspace and we do not
>>>> clear
>>>> the memory (as we should not map it either). So should this really be a
>>>> new heap type? Or maybe advertised as a carveout heap but with an
>>>> additional allocation flag? Perhaps we do away with "types" altogether
>>>> and just have flags, coherent/non-coherent, mapped/unmapped, etc.
>>>>
>>>> Maybe more thinking will be needed afterall..
>>>>
>>>
>>> So the cleanup looks okay (I need to finish reviewing) but I'm not a
>>> fan of adding another heaptype without solving the problem of adding
>>> some sort of devicetree binding or other method of allocating and
>>> placing Ion heaps. That plus uncached buffers are one of the big
>>> open problems that need to be solved for destaging Ion. See
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>>
>>> for some background on that problem.
>>>
>>
>> I'm under the impression that adding heaps like carveouts/chunk will be
>> rather system specific and so do not lend themselves well to a universal
>> DT style exporter. For instance a carveout memory space can be reported
>> by a device at runtime, then the driver managing that device should go
>> and use the carveout heap helpers to export that heap. If this is the
>> case then I'm not sure it is a problem for the ION core framework to
>> solve, but rather the users of it to figure out how best to create the
>> various heaps. All Ion needs to do is allow exporting and advertising
>> them IMHO.
>>
>
> I think it is a problem for the Ion core framework to take care of.
> Ion is useless if you don't actually have the heaps. Nobody has
> actually gotten a full Ion solution end-to-end with a carveout heap
> working in mainline because any proposals have been rejected. I think
> we need at least one example in mainline of how creating a carveout
> heap would work.

In our evil vendor trees we have several examples. The issue being that
Ion is still staging and attempts for generic DT heap definitions
haven't seemed to go so well. So for now we just keep it specific to our
platforms until upstream makes a direction decision.

>
>> Thanks for the background thread link, I've been looking for some info
>> on current status of all this and "ion" is a bit hard to search the
>> lists for. The core reason for posting this cleanup series is to throw
>> my hat into the ring of all this Ion work and start getting familiar
>> with the pending issues. The last two patches are not all that important
>> to get in right now.
>>
>> In that thread you linked above, it seems we may have arrived at a
>> similar problem for different reasons. I think the root issue is the Ion
>> core makes too many assumptions about the heap memory. My proposal would
>> be to allow the heap exporters more control over the DMA-BUF ops, maybe
>> even going as far as letting them provide their own complete struct
>> dma_buf_ops.
>>
>> Let me give an example where I think this is going to be useful. We have
>> the classic constraint solving problem on our SoCs. Our SoCs are full of
>> various coherent and non-coherent devices, some require contiguous
>> memory allocations, others have in-line IOMMUs so can operate on
>> non-contiguous, etc..
>>
>> DMA-BUF has a solution designed in for this we can use, namely
>> allocation at map time after all the attachments have come in. The
>> checking of each attached device to find the right backing memory is
>> something the DMA-BUF exporter has to do, and so this SoC specific logic
>> would have to be added to each exporting framework (DRM, V4L2, etc),
>> unless we have one unified system exporter everyone uses, Ion.
>>
>
> That's how dmabuf is supposed to work in theory but in practice we
> also have the case of userspace allocates memory, mmaps, and then
> a device attaches to it. The issue is we end up having to do work
> and make decisions before all devices are actually attached.
>

That just seems wrong, DMA-BUF should be used for, well, DMA-able
buffers.. Userspace should not be using these buffers without devices
attached, otherwise why not use a regular buffer. If you need to fill
the buffer then you should attach/map it first so the DMA-BUF exporter
can pick the appropriate backing memory first.

Maybe a couple more rules on the ordering of DMA-BUF operations are
needed to prevent having to deal with all these non-useful permutations.

Sumit? ^^

>> Then each system can define one (maybe typeless) heap, the correct
>> backing type is system specific anyway, so let the system specific
>> backing logic in the unified system exporter heap handle picking that.
>> To allow that heaps need direct control of dma_buf_ops.
>>
>> Direct heap control of dma_buf_ops also fixes the cache/non-cache issue,
>> and my unmapped memory issue, each heap type handles the quirks of its
>> backing storage in its own way, instead of trying to find some one size
>> fits all memory operations like we are doing now.
>>
>
> I don't think this is an issue of one-size fits all. We have flags
> to differentiate between cached and uncached paths, the issue is
> that doing the synchronization for uncached buffers is difficult.
>

It is difficult, hence why letting an uncached heap exporter do all the
heavy work, instead of trying to deal with all these cases in the Ion
core framework.

> I'm just not sure how an extra set of dma_buf ops actually solves
> the problem of needing to synchronize alias mappings.
>

It doesn't solve it, it just moves the work out of the framework. There
are going to be a lot more interesting problems than this with some
types heaps we will have in the future, dealing with all the logic in
the framework core is not going to scale.

Thanks,
Andrew

> Thanks,
> Laura
>
>> We can provide helpers for the simple heap types still, but with this
>> much of the heavy lifting moves out of the Ion core framework making it
>> much more simple, something I think it will need for de-staging.
>>
>> Anyway, I might be completely off base in my direction here, just let me
>> know :)
>>
>

2019-01-16 22:57:47

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper

On 1/15/19 1:11 PM, Laura Abbott wrote:
> On 1/15/19 10:43 AM, Laura Abbott wrote:
>> On 1/15/19 7:58 AM, Andrew F. Davis wrote:
>>> On 1/14/19 8:32 PM, Laura Abbott wrote:
>>>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>>>> The "unmapped" heap is very similar to the carveout heap except
>>>>> the backing memory is presumed to be unmappable by the host, in
>>>>> my specific case due to firewalls. This memory can still be
>>>>> allocated from and used by devices that do have access to the
>>>>> backing memory.
>>>>>
>>>>> Based originally on the secure/unmapped heap from Linaro for
>>>>> the OP-TEE SDP implementation, this was re-written to match
>>>>> the carveout heap helper code.
>>>>>
>>>>> Suggested-by: Etienne Carriere <[email protected]>
>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>> ---
>>>>>    drivers/staging/android/ion/Kconfig           |  10 ++
>>>>>    drivers/staging/android/ion/Makefile          |   1 +
>>>>>    drivers/staging/android/ion/ion.h             |  16 +++
>>>>>    .../staging/android/ion/ion_unmapped_heap.c   | 123
>>>>> ++++++++++++++++++
>>>>>    drivers/staging/android/uapi/ion.h            |   3 +
>>>>>    5 files changed, 153 insertions(+)
>>>>>    create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
>>>>>
>>>>> diff --git a/drivers/staging/android/ion/Kconfig
>>>>> b/drivers/staging/android/ion/Kconfig
>>>>> index 0fdda6f62953..a117b8b91b14 100644
>>>>> --- a/drivers/staging/android/ion/Kconfig
>>>>> +++ b/drivers/staging/android/ion/Kconfig
>>>>> @@ -42,3 +42,13 @@ config ION_CMA_HEAP
>>>>>          Choose this option to enable CMA heaps with Ion. This heap is
>>>>> backed
>>>>>          by the Contiguous Memory Allocator (CMA). If your system has
>>>>> these
>>>>>          regions, you should say Y here.
>>>>> +
>>>>> +config ION_UNMAPPED_HEAP
>>>>> +    bool "ION unmapped heap support"
>>>>> +    depends on ION
>>>>> +    help
>>>>> +      Choose this option to enable UNMAPPED heaps with Ion. This
>>>>> heap is
>>>>> +      backed in specific memory pools, carveout from the Linux
>>>>> memory.
>>>>> +      Unlike carveout heaps these are assumed to be not mappable by
>>>>> +      kernel or user-space.
>>>>> +      Unless you know your system has these regions, you should say N
>>>>> here.
>>>>> diff --git a/drivers/staging/android/ion/Makefile
>>>>> b/drivers/staging/android/ion/Makefile
>>>>> index 17f3a7569e3d..c71a1f3de581 100644
>>>>> --- a/drivers/staging/android/ion/Makefile
>>>>> +++ b/drivers/staging/android/ion/Makefile
>>>>> @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o
>>>>> ion_page_pool.o
>>>>>    obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o
>>>>>    obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o
>>>>>    obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o
>>>>> +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o
>>>>> diff --git a/drivers/staging/android/ion/ion.h
>>>>> b/drivers/staging/android/ion/ion.h
>>>>> index 97b2876b165a..ce74332018ba 100644
>>>>> --- a/drivers/staging/android/ion/ion.h
>>>>> +++ b/drivers/staging/android/ion/ion.h
>>>>> @@ -341,4 +341,20 @@ static inline struct ion_heap
>>>>> *ion_chunk_heap_create(phys_addr_t base, size_t si
>>>>>    }
>>>>>    #endif
>>>>>    +#ifdef CONFIG_ION_UNMAPPED_HEAP
>>>>> +/**
>>>>> + * ion_unmapped_heap_create
>>>>> + * @base:        base address of carveout memory
>>>>> + * @size:        size of carveout memory region
>>>>> + *
>>>>> + * Creates an unmapped ion_heap using the passed in data
>>>>> + */
>>>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t
>>>>> size);
>>>>> +#else
>>>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t
>>>>> size)
>>>>> +{
>>>>> +    return ERR_PTR(-ENODEV);
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>>    #endif /* _ION_H */
>>>>> diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c
>>>>> b/drivers/staging/android/ion/ion_unmapped_heap.c
>>>>> new file mode 100644
>>>>> index 000000000000..7602b659c2ec
>>>>> --- /dev/null
>>>>> +++ b/drivers/staging/android/ion/ion_unmapped_heap.c
>>>>> @@ -0,0 +1,123 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>> +/*
>>>>> + * ION Memory Allocator unmapped heap helper
>>>>> + *
>>>>> + * Copyright (C) 2015-2016 Texas Instruments Incorporated -
>>>>> http://www.ti.com/
>>>>> + *    Andrew F. Davis <[email protected]>
>>>>> + *
>>>>> + * ION "unmapped" heaps are physical memory heaps not by default
>>>>> mapped into
>>>>> + * a virtual address space. The buffer owner can explicitly request
>>>>> kernel
>>>>> + * space mappings but the underlying memory may still not be
>>>>> accessible for
>>>>> + * various reasons, such as firewalls.
>>>>> + */
>>>>> +
>>>>> +#include <linux/err.h>
>>>>> +#include <linux/genalloc.h>
>>>>> +#include <linux/scatterlist.h>
>>>>> +#include <linux/slab.h>
>>>>> +
>>>>> +#include "ion.h"
>>>>> +
>>>>> +#define ION_UNMAPPED_ALLOCATE_FAIL -1
>>>>> +
>>>>> +struct ion_unmapped_heap {
>>>>> +    struct ion_heap heap;
>>>>> +    struct gen_pool *pool;
>>>>> +};
>>>>> +
>>>>> +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap,
>>>>> +                     unsigned long size)
>>>>> +{
>>>>> +    struct ion_unmapped_heap *unmapped_heap =
>>>>> +        container_of(heap, struct ion_unmapped_heap, heap);
>>>>> +    unsigned long offset;
>>>>> +
>>>>> +    offset = gen_pool_alloc(unmapped_heap->pool, size);
>>>>> +    if (!offset)
>>>>> +        return ION_UNMAPPED_ALLOCATE_FAIL;
>>>>> +
>>>>> +    return offset;
>>>>> +}
>>>>> +
>>>>> +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t
>>>>> addr,
>>>>> +                  unsigned long size)
>>>>> +{
>>>>> +    struct ion_unmapped_heap *unmapped_heap =
>>>>> +        container_of(heap, struct ion_unmapped_heap, heap);
>>>>> +
>>>>> +    gen_pool_free(unmapped_heap->pool, addr, size);
>>>>> +}
>>>>> +
>>>>> +static int ion_unmapped_heap_allocate(struct ion_heap *heap,
>>>>> +                      struct ion_buffer *buffer,
>>>>> +                      unsigned long size,
>>>>> +                      unsigned long flags)
>>>>> +{
>>>>> +    struct sg_table *table;
>>>>> +    phys_addr_t paddr;
>>>>> +    int ret;
>>>>> +
>>>>> +    table = kmalloc(sizeof(*table), GFP_KERNEL);
>>>>> +    if (!table)
>>>>> +        return -ENOMEM;
>>>>> +    ret = sg_alloc_table(table, 1, GFP_KERNEL);
>>>>> +    if (ret)
>>>>> +        goto err_free;
>>>>> +
>>>>> +    paddr = ion_unmapped_allocate(heap, size);
>>>>> +    if (paddr == ION_UNMAPPED_ALLOCATE_FAIL) {
>>>>> +        ret = -ENOMEM;
>>>>> +        goto err_free_table;
>>>>> +    }
>>>>> +
>>>>> +    sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(paddr)), size, 0);
>>>>> +    buffer->sg_table = table;
>>>>> +
>>>>
>>>>
>>>> If this memory is actually unmapped this is not going to work because
>>>> the struct page will not be valid.
>>>>
>>>
>>> If it will never get mapped then it doesn't need a valid struct page as
>>> far as I can tell. We only use it as a marker to where the start of
>>> backing memory is, and that is calculated based on the struct page
>>> pointer address, not its contents.
>>>
>>
>> You can't rely on pfn_to_page returning a valid page pointer if the
>> pfn doesn't point to reserved memory. Even if you aren't relying
>> on the contents, that's no guarantee you can use that to to get
>> the valid pfn back. You can get away with that sometimes but it's
>> not correct to rely on it all the time.
>>
>
> I found https://lore.kernel.org/lkml/[email protected]/T/#u
> which I think is closer to what we might want to be looking at.
>

That does seem to be something that could be used, I'll have to try to
understand this for a bit how to bring that into use here.

>>>>> +    return 0;
>>>>> +
>>>>> +err_free_table:
>>>>> +    sg_free_table(table);
>>>>> +err_free:
>>>>> +    kfree(table);
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static void ion_unmapped_heap_free(struct ion_buffer *buffer)
>>>>> +{
>>>>> +    struct ion_heap *heap = buffer->heap;
>>>>> +    struct sg_table *table = buffer->sg_table;
>>>>> +    struct page *page = sg_page(table->sgl);
>>>>> +    phys_addr_t paddr = PFN_PHYS(page_to_pfn(page));
>>>>> +
>>>>> +    ion_unmapped_free(heap, paddr, buffer->size);
>>>>> +    sg_free_table(buffer->sg_table);
>>>>> +    kfree(buffer->sg_table);
>>>>> +}
>>>>> +
>>>>> +static struct ion_heap_ops unmapped_heap_ops = {
>>>>> +    .allocate = ion_unmapped_heap_allocate,
>>>>> +    .free = ion_unmapped_heap_free,
>>>>> +    /* no .map_user, user mapping of unmapped heaps not allowed */
>>>>> +    .map_kernel = ion_heap_map_kernel,
>>>>> +    .unmap_kernel = ion_heap_unmap_kernel,
>>>>> +};
>>>>> +
>>>>> +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t
>>>>> size)
>>>>> +{
>>>>> +    struct ion_unmapped_heap *unmapped_heap;
>>>>> +
>>>>> +    unmapped_heap = kzalloc(sizeof(*unmapped_heap), GFP_KERNEL);
>>>>> +    if (!unmapped_heap)
>>>>> +        return ERR_PTR(-ENOMEM);
>>>>> +
>>>>> +    unmapped_heap->pool = gen_pool_create(PAGE_SHIFT, -1);
>>>>> +    if (!unmapped_heap->pool) {
>>>>> +        kfree(unmapped_heap);
>>>>> +        return ERR_PTR(-ENOMEM);
>>>>> +    }
>>>>> +    gen_pool_add(unmapped_heap->pool, base, size, -1);
>>>>> +    unmapped_heap->heap.ops = &unmapped_heap_ops;
>>>>> +    unmapped_heap->heap.type = ION_HEAP_TYPE_UNMAPPED;
>>>>> +
>>>>> +    return &unmapped_heap->heap;
>>>>> +}
>>>>> diff --git a/drivers/staging/android/uapi/ion.h
>>>>> b/drivers/staging/android/uapi/ion.h
>>>>> index 5d7009884c13..d5f98bc5f340 100644
>>>>> --- a/drivers/staging/android/uapi/ion.h
>>>>> +++ b/drivers/staging/android/uapi/ion.h
>>>>> @@ -19,6 +19,8 @@
>>>>>     *                 carveout heap, allocations are physically
>>>>>     *                 contiguous
>>>>>     * @ION_HEAP_TYPE_DMA:         memory allocated via DMA API
>>>>> + * @ION_HEAP_TYPE_UNMAPPED:     memory not intended to be mapped into
>>>>> the
>>>>> + *                 linux address space unless for debug cases
>>>>>     * @ION_NUM_HEAPS:         helper for iterating over heaps, a
>>>>> bit mask
>>>>>     *                 is used to identify the heaps, so only 32
>>>>>     *                 total heap types are supported
>>>>> @@ -29,6 +31,7 @@ enum ion_heap_type {
>>>>>        ION_HEAP_TYPE_CARVEOUT,
>>>>>        ION_HEAP_TYPE_CHUNK,
>>>>>        ION_HEAP_TYPE_DMA,
>>>>> +    ION_HEAP_TYPE_UNMAPPED,
>>>>>        ION_HEAP_TYPE_CUSTOM, /*
>>>>>                       * must be last so device specific heaps always
>>>>>                       * are at the end of this enum
>>>>>
>>>>
>>>> Overall this seems way too similar to the carveout heap
>>>> to justify adding another heap type. It also still missing
>>>> the part of where exactly you call ion_unmapped_heap_create.
>>>> Figuring that out is one of the blocking items for moving
>>>> Ion out of staging.
>>>>
>>>
>>> I agree with this being almost a 1:1 copy of the carveout heap, I'm just
>>> not sure of a good way to do this without a new heap type. Adding flags
>>> to the existing carveout type seem a bit messy. Plus then those flags
>>> should be valid for other heap types, gets complicated quickly..
>>>
>>> I'll reply to the second part in your other top level response.
>>>
>>> Andrew
>>>
>>>> Thanks,
>>>> Laura
>>
>

2019-01-16 22:57:53

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/15/19 1:05 PM, Laura Abbott wrote:
> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>
>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>> here.
>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>> anyway.
>>>>>>
>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>> ---
>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>>>>>> dma_buf_attachment *attachment,
>>>>>>         table = a->table;
>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>> -            direction))
>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>
>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>> maintenance.
>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>> calls
>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>
>>>>
>>>> That should be okay though, if you have no attachments (or all
>>>> attachments are IO-coherent) then there is no need for cache
>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>> is attached later after data has already been written. Does that
>>>> sequence need supporting?
>>>
>>> Yes, but also I think there are cases where CPU access can happen before
>>> in Android, but I will focus on later for now.
>>>
>>>> DMA-BUF doesn't have to allocate the backing
>>>> memory until map_dma_buf() time, and that should only happen after all
>>>> the devices have attached so it can know where to put the buffer. So we
>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>> attached and mapped, right?
>>>>
>>>
>>> Here is an example where CPU access can happen later in Android.
>>>
>>> Camera device records video -> software post processing -> video device
>>> (who does compression of raw data) and writes to a file
>>>
>>> In this example assume the buffer is cached and the devices are not
>>> IO-coherent (quite common).
>>>
>>
>> This is the start of the problem, having cached mappings of memory that
>> is also being accessed non-coherently is going to cause issues one way
>> or another. On top of the speculative cache fills that have to be
>> constantly fought back against with CMOs like below; some coherent
>> interconnects behave badly when you mix coherent and non-coherent access
>> (snoop filters get messed up).
>>
>> The solution is to either always have the addresses marked non-coherent
>> (like device memory, no-map carveouts), or if you really want to use
>> regular system memory allocated at runtime, then all cached mappings of
>> it need to be dropped, even the kernel logical address (area as painful
>> as that would be).
>>
>
> I agree it's broken, hence my desire to remove it :)
>
> The other problem is that uncached buffers are being used for
> performance reason so anything that would involve getting
> rid of the logical address would probably negate any performance
> benefit.
>

I wouldn't go as far as to remove them just yet.. Liam seems pretty
adamant that they have valid uses. I'm just not sure performance is one
of them, maybe in the case of software locks between devices or
something where there needs to be a lot of back and forth interleaved
access on small amounts of data?

>>> ION buffer is allocated.
>>>
>>> //Camera device records video
>>> dma_buf_attach
>>> dma_map_attachment (buffer needs to be cleaned)
>>
>> Why does the buffer need to be cleaned here? I just got through reading
>> the thread linked by Laura in the other reply. I do like +Brian's
>> suggestion of tracking if the buffer has had CPU access since the last
>> time and only flushing the cache if it has. As unmapped heaps never get
>> CPU mapped this would never be the case for unmapped heaps, it solves my
>> problem.
>>
>>> [camera device writes to buffer]
>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>
>> It doesn't know there will be any further CPU access, it could get freed
>> after this for all we know, the invalidate can be saved until the CPU
>> requests access again.
>>
>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>> down
>>> the pipeline and Camera doesn't know the end of the use case)
>>>
>>
>> This seems like a broken use-case, I understand the desire to keep
>> everything as modular as possible and separate the steps, but at this
>> point no one owns this buffers backing memory, not the CPU or any
>> device. I would go as far as to say DMA-BUF should be free now to
>> de-allocate the backing storage if it wants, that way it could get ready
>> for the next attachment, which may change the required backing memory
>> completely.
>>
>> All devices should attach before the first mapping, and only let go
>> after the task is complete, otherwise this buffers data needs copied off
>> to a different location or the CPU needs to take ownership in-between.
>>
>
> Maybe it's broken but it's the status quo and we spent a good
> amount of time at plumbers concluding there isn't a great way
> to fix it :/
>

Hmm, guess that doesn't prove there is not a great way to fix it either.. :/

Perhaps just stronger rules on sequencing of operations? I'm not saying
I have a good solution either, I just don't see any way forward without
some use-case getting broken, so better to fix now over later.

>>> //buffer is send down the pipeline
>>>
>>> // Usersapce software post processing occurs
>>> mmap buffer
>>
>> Perhaps the invalidate should happen here in mmap.
>>
>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>> devices attached to buffer
>>
>> And that should be okay, mmap does the sync, and if no devices are
>> attached nothing could have changed the underlying memory in the
>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>
>>> [CPU reads/writes to the buffer]
>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>> devices attached to buffer
>>> munmap buffer
>>>
>>> //buffer is send down the pipeline
>>> // Buffer is send to video device (who does compression of raw data) and
>>> writes to a file
>>> dma_buf_attach
>>> dma_map_attachment (buffer needs to be cleaned)
>>> [video device writes to buffer]
>>> dma_buf_unmap_attachment
>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>> down
>>> the pipeline and Video doesn't know the end of the use case)
>>>
>>>
>>>
>>>>> Also ION no longer provides DMA ready memory, so if you are not
>>>>> doing CPU
>>>>> access then there is no requirement (that I am aware of) for you to
>>>>> call
>>>>> {begin,end}_cpu_access before passing the buffer to the device and
>>>>> if this
>>>>> buffer is cached and your device is not IO-coherent then the cache
>>>>> maintenance
>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>
>>>>
>>>> If I am not doing any CPU access then why do I need CPU cache
>>>> maintenance on the buffer?
>>>>
>>>
>>> Because ION no longer provides DMA ready memory.
>>> Take the above example.
>>>
>>> ION allocates memory from buddy allocator and requests zeroing.
>>> Zeros are written to the cache.
>>>
>>> You pass the buffer to the camera device which is not IO-coherent.
>>> The camera devices writes directly to the buffer in DDR.
>>> Since you didn't clean the buffer a dirty cache line (one of the
>>> zeros) is
>>> evicted from the cache, this zero overwrites data the camera device has
>>> written which corrupts your data.
>>>
>>
>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>> for CPU access at the time of zeroing.
>>
>> Andrew
>>
>>> Liam
>>>
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>
>

2019-01-16 23:10:24

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/16/19 9:19 AM, Brian Starkey wrote:
> Hi :-)
>
> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>
>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>
>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>>>>
>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>> ---
>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>>>>
>>>>>>> table = a->table;
>>>>>>>
>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>> - direction))
>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>
>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>> dma_buf_attach then there won't have been a device attached so the calls
>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>
>>>>>
>>>>> That should be okay though, if you have no attachments (or all
>>>>> attachments are IO-coherent) then there is no need for cache
>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>> is attached later after data has already been written. Does that
>>>>> sequence need supporting?
>>>>
>>>> Yes, but also I think there are cases where CPU access can happen before
>>>> in Android, but I will focus on later for now.
>>>>
>>>>> DMA-BUF doesn't have to allocate the backing
>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>> attached and mapped, right?
>>>>>
>>>>
>>>> Here is an example where CPU access can happen later in Android.
>>>>
>>>> Camera device records video -> software post processing -> video device
>>>> (who does compression of raw data) and writes to a file
>>>>
>>>> In this example assume the buffer is cached and the devices are not
>>>> IO-coherent (quite common).
>>>>
>>>
>>> This is the start of the problem, having cached mappings of memory that
>>> is also being accessed non-coherently is going to cause issues one way
>>> or another. On top of the speculative cache fills that have to be
>>> constantly fought back against with CMOs like below; some coherent
>>> interconnects behave badly when you mix coherent and non-coherent access
>>> (snoop filters get messed up).
>>>
>>> The solution is to either always have the addresses marked non-coherent
>>> (like device memory, no-map carveouts), or if you really want to use
>>> regular system memory allocated at runtime, then all cached mappings of
>>> it need to be dropped, even the kernel logical address (area as painful
>>> as that would be).
>
> Ouch :-( I wasn't aware about these potential interconnect issues. How
> "real" is that? It seems that we aren't really hitting that today on
> real devices.
>

Sadly there is at least one real device like this now (TI AM654). We
spent some time working with the ARM interconnect spec designers to see
if this was allowed behavior, final conclusion was mixing coherent and
non-coherent accesses is never a good idea.. So we have been working to
try to minimize any cases of mixed attributes [0], if a region is
coherent then everyone in the system needs to treat it as such and
vice-versa, even clever CMO that work on other systems wont save you
here. :(

[0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553


>>>
>>>> ION buffer is allocated.
>>>>
>>>> //Camera device records video
>>>> dma_buf_attach
>>>> dma_map_attachment (buffer needs to be cleaned)
>>>
>>> Why does the buffer need to be cleaned here? I just got through reading
>>> the thread linked by Laura in the other reply. I do like +Brian's
>>
>> Actually +Brian this time :)
>>
>>> suggestion of tracking if the buffer has had CPU access since the last
>>> time and only flushing the cache if it has. As unmapped heaps never get
>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>> problem.
>>>
>>>> [camera device writes to buffer]
>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>
>>> It doesn't know there will be any further CPU access, it could get freed
>>> after this for all we know, the invalidate can be saved until the CPU
>>> requests access again.
>
> We don't have any API to allow the invalidate to happen on CPU access
> if all devices already detached. We need a struct device pointer to
> give to the DMA API, otherwise on arm64 there'll be no invalidate.
>
> I had a chat with a few people internally after the previous
> discussion with Liam. One suggestion was to use
> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
> one other device attached (guarantees that we can do an invalidate in
> the future if begin_cpu_access is called). If the last device
> detaches, do a sync then.
>
> Conversely, in map_dma_buf, we would track if there was any CPU access
> and use/skip the sync appropriately.
>

Now that I think this all through I agree this patch is probably wrong.
The real fix needs to be better handling in the dma_map_sg() to deal
with the case of the memory not being mapped (what I'm dealing with for
unmapped heaps), and for cases when the memory in question is not cached
(Liam's issue I think). For both these cases the dma_map_sg() does the
wrong thing.

> I did start poking the code to check out how that would look, but then
> Christmas happened and I'm still catching back up.
>
>>>
>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>
>>>
>>> This seems like a broken use-case, I understand the desire to keep
>>> everything as modular as possible and separate the steps, but at this
>>> point no one owns this buffers backing memory, not the CPU or any
>>> device. I would go as far as to say DMA-BUF should be free now to
>>> de-allocate the backing storage if it wants, that way it could get ready
>>> for the next attachment, which may change the required backing memory
>>> completely.
>>>
>>> All devices should attach before the first mapping, and only let go
>>> after the task is complete, otherwise this buffers data needs copied off
>>> to a different location or the CPU needs to take ownership in-between.
>>>
>
> Yeah.. that's certainly the theory. Are there any DMA-BUF
> implementations which actually do that? I hear it quoted a lot,
> because that's what the docs say - but if the reality doesn't match
> it, maybe we should change the docs.
>

Do you mean on the userspace side? I'm not sure, seems like Android
might be doing this wrong from what I can gather. From kernel side if
you mean the "de-allocate the backing storage", we will have some cases
like this soon, so I want to make sure userspace is not abusing DMA-BUF
in ways not specified in the documentation. Changing the docs to force
the backing memory to always be allocated breaks the central goal in
having attach/map in DMA-BUF separate.

>>>> //buffer is send down the pipeline
>>>>
>>>> // Usersapce software post processing occurs
>>>> mmap buffer
>>>
>>> Perhaps the invalidate should happen here in mmap.
>>>
>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>> devices attached to buffer
>>>
>>> And that should be okay, mmap does the sync, and if no devices are
>>> attached nothing could have changed the underlying memory in the
>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>
> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> Liam was saying that it's too painful for them to do that every time a
> device unmaps - when in many cases (device->device, no CPU) it's not
> needed.

Invalidates are painless, at least compared to a real cache flush, just
set the invalid bit vs actually writing out lines. I thought the issue
was on the map side.

>
>>>
>>>> [CPU reads/writes to the buffer]
>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>> devices attached to buffer
>>>> munmap buffer
>>>>
>>>> //buffer is send down the pipeline
>>>> // Buffer is send to video device (who does compression of raw data) and
>>>> writes to a file
>>>> dma_buf_attach
>>>> dma_map_attachment (buffer needs to be cleaned)
>>>> [video device writes to buffer]
>>>> dma_buf_unmap_attachment
>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>> the pipeline and Video doesn't know the end of the use case)
>>>>
>>>>
>>>>
>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>>>>> access then there is no requirement (that I am aware of) for you to call
>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>
>>>>>
>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>> maintenance on the buffer?
>>>>>
>>>>
>>>> Because ION no longer provides DMA ready memory.
>>>> Take the above example.
>>>>
>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>> Zeros are written to the cache.
>>>>
>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>> The camera devices writes directly to the buffer in DDR.
>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
>>>> evicted from the cache, this zero overwrites data the camera device has
>>>> written which corrupts your data.
>>>>
>>>
>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>> for CPU access at the time of zeroing.
>>>
>
> Actually that should be at the point of the first non-coherent device
> mapping the buffer right? No point in doing CMO if the future accesses
> are coherent.

I see your point, as long as the zeroing is guaranteed to be the first
access to this buffer then it should be safe.

Andrew

>
> Cheers,
> -Brian
>
>>> Andrew
>>>
>>>> Liam
>>>>
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>> a Linux Foundation Collaborative Project
>>>>

2019-01-16 23:12:03

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 11/14] staging: android: ion: Allow heap name to be null

On 1/16/19 9:28 AM, Brian Starkey wrote:
> Hi Andrew,
>
> On Fri, Jan 11, 2019 at 12:05:20PM -0600, Andrew F. Davis wrote:
>> The heap name can be used for debugging but otherwise does not seem
>> to be required and no other part of the code will fail if left NULL
>> except here. We can make it required and check for it at some point,
>> for now lets just prevent this from causing a NULL pointer exception.
>
> I'm not so keen on this one. In the "new" API with heap querying, the
> name string is the only way to identify the heap. I think Laura
> mentioned at XDC2017 that it was expected that userspace should use
> the strings to find the heap they want.
>

Right now the names are only for debug. I accidentally left the name
null once and got a kernel crash. This is the only spot where it is
needed so I fixed it up. The other option is to make the name mandatory
and properly error out, I don't want to do that right now until the
below discussion is had to see if names really do matter or not.

> I'd actually be in favor of making the string a more strict UAPI than
> allowing it to be empty (at least, if heap name strings is the API we
> decide on for identifying heaps - which is another discussion).
>

I would think identifying heaps by name would be less portable, but as
you said that is a whole different discussion..

Thanks,
Andrew

> Cheers,
> -Brian
>
>>
>> Signed-off-by: Andrew F. Davis <[email protected]>
>> ---
>> drivers/staging/android/ion/ion.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>> index bba5f682bc25..14e48f6eb734 100644
>> --- a/drivers/staging/android/ion/ion.c
>> +++ b/drivers/staging/android/ion/ion.c
>> @@ -467,7 +467,7 @@ static int ion_query_heaps(struct ion_heap_query *query)
>> max_cnt = query->cnt;
>>
>> plist_for_each_entry(heap, &dev->heaps, node) {
>> - strncpy(hdata.name, heap->name, MAX_HEAP_NAME);
>> + strncpy(hdata.name, heap->name ?: "(null)", MAX_HEAP_NAME);
>> hdata.name[sizeof(hdata.name) - 1] = '\0';
>> hdata.type = heap->type;
>> hdata.heap_id = heap->id;
>> --
>> 2.19.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

2019-01-17 00:47:15

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Wed, 16 Jan 2019, Andrew F. Davis wrote:

> On 1/15/19 1:05 PM, Laura Abbott wrote:
> > On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> >> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>
> >>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
> >>>>>> here.
> >>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
> >>>>>> anyway.
> >>>>>>
> >>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>> ---
> >>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/staging/android/ion/ion.c
> >>>>>> b/drivers/staging/android/ion/ion.c
> >>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
> >>>>>> dma_buf_attachment *attachment,
> >>>>>>         table = a->table;
> >>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>> -            direction))
> >>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>
> >>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>> You can't rely on {begin,end}_cpu_access calls to do cache
> >>>>> maintenance.
> >>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>> dma_buf_attach then there won't have been a device attached so the
> >>>>> calls
> >>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>
> >>>>
> >>>> That should be okay though, if you have no attachments (or all
> >>>> attachments are IO-coherent) then there is no need for cache
> >>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>> is attached later after data has already been written. Does that
> >>>> sequence need supporting?
> >>>
> >>> Yes, but also I think there are cases where CPU access can happen before
> >>> in Android, but I will focus on later for now.
> >>>
> >>>> DMA-BUF doesn't have to allocate the backing
> >>>> memory until map_dma_buf() time, and that should only happen after all
> >>>> the devices have attached so it can know where to put the buffer. So we
> >>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>> attached and mapped, right?
> >>>>
> >>>
> >>> Here is an example where CPU access can happen later in Android.
> >>>
> >>> Camera device records video -> software post processing -> video device
> >>> (who does compression of raw data) and writes to a file
> >>>
> >>> In this example assume the buffer is cached and the devices are not
> >>> IO-coherent (quite common).
> >>>
> >>
> >> This is the start of the problem, having cached mappings of memory that
> >> is also being accessed non-coherently is going to cause issues one way
> >> or another. On top of the speculative cache fills that have to be
> >> constantly fought back against with CMOs like below; some coherent
> >> interconnects behave badly when you mix coherent and non-coherent access
> >> (snoop filters get messed up).
> >>
> >> The solution is to either always have the addresses marked non-coherent
> >> (like device memory, no-map carveouts), or if you really want to use
> >> regular system memory allocated at runtime, then all cached mappings of
> >> it need to be dropped, even the kernel logical address (area as painful
> >> as that would be).
> >>
> >
> > I agree it's broken, hence my desire to remove it :)
> >
> > The other problem is that uncached buffers are being used for
> > performance reason so anything that would involve getting
> > rid of the logical address would probably negate any performance
> > benefit.
> >
>
> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> adamant that they have valid uses. I'm just not sure performance is one
> of them, maybe in the case of software locks between devices or
> something where there needs to be a lot of back and forth interleaved
> access on small amounts of data?
>

I wasn't aware that ARM considered this not supported, I thought it was
supported but they advised against it because of the potential performance
impact.

This is after all supported in the DMA APIs and up until now devices have
been successfully commercializing with this configurations, and I think
they will continue to commercialize with these configurations for quite a
while.

It would be really unfortunate if support was removed as I think that
would drive clients away from using upstream ION.

> >>> ION buffer is allocated.
> >>>
> >>> //Camera device records video
> >>> dma_buf_attach
> >>> dma_map_attachment (buffer needs to be cleaned)
> >>
> >> Why does the buffer need to be cleaned here? I just got through reading
> >> the thread linked by Laura in the other reply. I do like +Brian's
> >> suggestion of tracking if the buffer has had CPU access since the last
> >> time and only flushing the cache if it has. As unmapped heaps never get
> >> CPU mapped this would never be the case for unmapped heaps, it solves my
> >> problem.
> >>
> >>> [camera device writes to buffer]
> >>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>
> >> It doesn't know there will be any further CPU access, it could get freed
> >> after this for all we know, the invalidate can be saved until the CPU
> >> requests access again.
> >>
> >>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>> down
> >>> the pipeline and Camera doesn't know the end of the use case)
> >>>
> >>
> >> This seems like a broken use-case, I understand the desire to keep
> >> everything as modular as possible and separate the steps, but at this
> >> point no one owns this buffers backing memory, not the CPU or any
> >> device. I would go as far as to say DMA-BUF should be free now to
> >> de-allocate the backing storage if it wants, that way it could get ready
> >> for the next attachment, which may change the required backing memory
> >> completely.
> >>
> >> All devices should attach before the first mapping, and only let go
> >> after the task is complete, otherwise this buffers data needs copied off
> >> to a different location or the CPU needs to take ownership in-between.
> >>
> >
> > Maybe it's broken but it's the status quo and we spent a good
> > amount of time at plumbers concluding there isn't a great way
> > to fix it :/
> >
>
> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
>
> Perhaps just stronger rules on sequencing of operations? I'm not saying
> I have a good solution either, I just don't see any way forward without
> some use-case getting broken, so better to fix now over later.
>

I can see the benefits of Android doing things the way they do, I would
request that changes we make continue to support Android, or we find a way
to convice them to change, as they are the main ION client and I assume
other ION clients in the future will want to do this as well.

I am concerned that if you go with a solution which enforces what you
mention above, and bring ION out of staging that way, it will make it that
much harder to solve this for Android and therefore harder to get
Android clients to move to the upstream ION (and get everybody off their
vendor modified Android versions).

> >>> //buffer is send down the pipeline
> >>>
> >>> // Usersapce software post processing occurs
> >>> mmap buffer
> >>
> >> Perhaps the invalidate should happen here in mmap.
> >>
> >>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>> devices attached to buffer
> >>
> >> And that should be okay, mmap does the sync, and if no devices are
> >> attached nothing could have changed the underlying memory in the
> >> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>
> >>> [CPU reads/writes to the buffer]
> >>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>> devices attached to buffer
> >>> munmap buffer
> >>>
> >>> //buffer is send down the pipeline
> >>> // Buffer is send to video device (who does compression of raw data) and
> >>> writes to a file
> >>> dma_buf_attach
> >>> dma_map_attachment (buffer needs to be cleaned)
> >>> [video device writes to buffer]
> >>> dma_buf_unmap_attachment
> >>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>> down
> >>> the pipeline and Video doesn't know the end of the use case)
> >>>
> >>>
> >>>
> >>>>> Also ION no longer provides DMA ready memory, so if you are not
> >>>>> doing CPU
> >>>>> access then there is no requirement (that I am aware of) for you to
> >>>>> call
> >>>>> {begin,end}_cpu_access before passing the buffer to the device and
> >>>>> if this
> >>>>> buffer is cached and your device is not IO-coherent then the cache
> >>>>> maintenance
> >>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>
> >>>>
> >>>> If I am not doing any CPU access then why do I need CPU cache
> >>>> maintenance on the buffer?
> >>>>
> >>>
> >>> Because ION no longer provides DMA ready memory.
> >>> Take the above example.
> >>>
> >>> ION allocates memory from buddy allocator and requests zeroing.
> >>> Zeros are written to the cache.
> >>>
> >>> You pass the buffer to the camera device which is not IO-coherent.
> >>> The camera devices writes directly to the buffer in DDR.
> >>> Since you didn't clean the buffer a dirty cache line (one of the
> >>> zeros) is
> >>> evicted from the cache, this zero overwrites data the camera device has
> >>> written which corrupts your data.
> >>>
> >>
> >> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >> for CPU access at the time of zeroing.
> >>
> >> Andrew
> >>
> >>> Liam
> >>>
> >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> a Linux Foundation Collaborative Project
> >>>
> >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-17 00:51:05

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Wed, 16 Jan 2019, Andrew F. Davis wrote:

> On 1/16/19 9:19 AM, Brian Starkey wrote:
> > Hi :-)
> >
> > On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>
> >>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>
> >>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>>>>>>
> >>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>> ---
> >>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> >>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>>>>>
> >>>>>>> table = a->table;
> >>>>>>>
> >>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>> - direction))
> >>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>
> >>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> >>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>> dma_buf_attach then there won't have been a device attached so the calls
> >>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>
> >>>>>
> >>>>> That should be okay though, if you have no attachments (or all
> >>>>> attachments are IO-coherent) then there is no need for cache
> >>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>> is attached later after data has already been written. Does that
> >>>>> sequence need supporting?
> >>>>
> >>>> Yes, but also I think there are cases where CPU access can happen before
> >>>> in Android, but I will focus on later for now.
> >>>>
> >>>>> DMA-BUF doesn't have to allocate the backing
> >>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>> attached and mapped, right?
> >>>>>
> >>>>
> >>>> Here is an example where CPU access can happen later in Android.
> >>>>
> >>>> Camera device records video -> software post processing -> video device
> >>>> (who does compression of raw data) and writes to a file
> >>>>
> >>>> In this example assume the buffer is cached and the devices are not
> >>>> IO-coherent (quite common).
> >>>>
> >>>
> >>> This is the start of the problem, having cached mappings of memory that
> >>> is also being accessed non-coherently is going to cause issues one way
> >>> or another. On top of the speculative cache fills that have to be
> >>> constantly fought back against with CMOs like below; some coherent
> >>> interconnects behave badly when you mix coherent and non-coherent access
> >>> (snoop filters get messed up).
> >>>
> >>> The solution is to either always have the addresses marked non-coherent
> >>> (like device memory, no-map carveouts), or if you really want to use
> >>> regular system memory allocated at runtime, then all cached mappings of
> >>> it need to be dropped, even the kernel logical address (area as painful
> >>> as that would be).
> >
> > Ouch :-( I wasn't aware about these potential interconnect issues. How
> > "real" is that? It seems that we aren't really hitting that today on
> > real devices.
> >
>
> Sadly there is at least one real device like this now (TI AM654). We
> spent some time working with the ARM interconnect spec designers to see
> if this was allowed behavior, final conclusion was mixing coherent and
> non-coherent accesses is never a good idea.. So we have been working to
> try to minimize any cases of mixed attributes [0], if a region is
> coherent then everyone in the system needs to treat it as such and
> vice-versa, even clever CMO that work on other systems wont save you
> here. :(
>
> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
>
>
> >>>
> >>>> ION buffer is allocated.
> >>>>
> >>>> //Camera device records video
> >>>> dma_buf_attach
> >>>> dma_map_attachment (buffer needs to be cleaned)
> >>>
> >>> Why does the buffer need to be cleaned here? I just got through reading
> >>> the thread linked by Laura in the other reply. I do like +Brian's
> >>
> >> Actually +Brian this time :)
> >>
> >>> suggestion of tracking if the buffer has had CPU access since the last
> >>> time and only flushing the cache if it has. As unmapped heaps never get
> >>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>> problem.
> >>>
> >>>> [camera device writes to buffer]
> >>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>
> >>> It doesn't know there will be any further CPU access, it could get freed
> >>> after this for all we know, the invalidate can be saved until the CPU
> >>> requests access again.
> >
> > We don't have any API to allow the invalidate to happen on CPU access
> > if all devices already detached. We need a struct device pointer to
> > give to the DMA API, otherwise on arm64 there'll be no invalidate.
> >
> > I had a chat with a few people internally after the previous
> > discussion with Liam. One suggestion was to use
> > DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
> > one other device attached (guarantees that we can do an invalidate in
> > the future if begin_cpu_access is called). If the last device
> > detaches, do a sync then.
> >
> > Conversely, in map_dma_buf, we would track if there was any CPU access
> > and use/skip the sync appropriately.
> >
>
> Now that I think this all through I agree this patch is probably wrong.
> The real fix needs to be better handling in the dma_map_sg() to deal
> with the case of the memory not being mapped (what I'm dealing with for
> unmapped heaps), and for cases when the memory in question is not cached
> (Liam's issue I think). For both these cases the dma_map_sg() does the
> wrong thing.
>
> > I did start poking the code to check out how that would look, but then
> > Christmas happened and I'm still catching back up.
> >
> >>>
> >>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>
> >>>
> >>> This seems like a broken use-case, I understand the desire to keep
> >>> everything as modular as possible and separate the steps, but at this
> >>> point no one owns this buffers backing memory, not the CPU or any
> >>> device. I would go as far as to say DMA-BUF should be free now to
> >>> de-allocate the backing storage if it wants, that way it could get ready
> >>> for the next attachment, which may change the required backing memory
> >>> completely.
> >>>
> >>> All devices should attach before the first mapping, and only let go
> >>> after the task is complete, otherwise this buffers data needs copied off
> >>> to a different location or the CPU needs to take ownership in-between.
> >>>
> >
> > Yeah.. that's certainly the theory. Are there any DMA-BUF
> > implementations which actually do that? I hear it quoted a lot,
> > because that's what the docs say - but if the reality doesn't match
> > it, maybe we should change the docs.
> >
>
> Do you mean on the userspace side? I'm not sure, seems like Android
> might be doing this wrong from what I can gather. From kernel side if
> you mean the "de-allocate the backing storage", we will have some cases
> like this soon, so I want to make sure userspace is not abusing DMA-BUF
> in ways not specified in the documentation. Changing the docs to force
> the backing memory to always be allocated breaks the central goal in
> having attach/map in DMA-BUF separate.
>
> >>>> //buffer is send down the pipeline
> >>>>
> >>>> // Usersapce software post processing occurs
> >>>> mmap buffer
> >>>
> >>> Perhaps the invalidate should happen here in mmap.
> >>>
> >>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>> devices attached to buffer
> >>>
> >>> And that should be okay, mmap does the sync, and if no devices are
> >>> attached nothing could have changed the underlying memory in the
> >>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >
> > Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> > Liam was saying that it's too painful for them to do that every time a
> > device unmaps - when in many cases (device->device, no CPU) it's not
> > needed.
>
> Invalidates are painless, at least compared to a real cache flush, just
> set the invalid bit vs actually writing out lines. I thought the issue
> was on the map side.
>

Invalidates aren't painless for us because we have a coherent system cache
so clean lines get written out.
And these invalidates can occur on fairly large buffers.

That is why we haven't went with using cached ION memory and "tracking CPU
access" because it only solves half the problem, ie there isn't a way to
safely skip the invalidate (because we can't read the future).
Our solution was to go with uncached ION memory (when possible), but as
you can see in other discussions upstream support for uncached memory has
its own issues.

> >
> >>>
> >>>> [CPU reads/writes to the buffer]
> >>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>> devices attached to buffer
> >>>> munmap buffer
> >>>>
> >>>> //buffer is send down the pipeline
> >>>> // Buffer is send to video device (who does compression of raw data) and
> >>>> writes to a file
> >>>> dma_buf_attach
> >>>> dma_map_attachment (buffer needs to be cleaned)
> >>>> [video device writes to buffer]
> >>>> dma_buf_unmap_attachment
> >>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>> the pipeline and Video doesn't know the end of the use case)
> >>>>
> >>>>
> >>>>
> >>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> >>>>>> access then there is no requirement (that I am aware of) for you to call
> >>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> >>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> >>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>
> >>>>>
> >>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>> maintenance on the buffer?
> >>>>>
> >>>>
> >>>> Because ION no longer provides DMA ready memory.
> >>>> Take the above example.
> >>>>
> >>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>> Zeros are written to the cache.
> >>>>
> >>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>> The camera devices writes directly to the buffer in DDR.
> >>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> >>>> evicted from the cache, this zero overwrites data the camera device has
> >>>> written which corrupts your data.
> >>>>
> >>>
> >>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>> for CPU access at the time of zeroing.
> >>>
> >
> > Actually that should be at the point of the first non-coherent device
> > mapping the buffer right? No point in doing CMO if the future accesses
> > are coherent.
>
> I see your point, as long as the zeroing is guaranteed to be the first
> access to this buffer then it should be safe.
>
> Andrew
>
> >
> > Cheers,
> > -Brian
> >
> >>> Andrew
> >>>
> >>>> Liam
> >>>>
> >>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>> a Linux Foundation Collaborative Project
> >>>>
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-17 09:47:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Wed, Jan 16, 2019 at 10:17:23AM -0600, Andrew F. Davis wrote:
> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> adamant that they have valid uses. I'm just not sure performance is one
> of them, maybe in the case of software locks between devices or
> something where there needs to be a lot of back and forth interleaved
> access on small amounts of data?

We shouldn't add unused features to start with. If mainline users for
them appear we can always add them back once we've reviewed the uses
cases and agree that they are sane and have no better solution.

2019-01-17 16:17:08

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/16/19 4:48 PM, Liam Mark wrote:
> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>
>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>
>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>>>> here.
>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>>>> anyway.
>>>>>>>>
>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>> ---
>>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>>>>>>>> dma_buf_attachment *attachment,
>>>>>>>>         table = a->table;
>>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>> -            direction))
>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>
>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>>>> maintenance.
>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>>>> calls
>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>
>>>>>>
>>>>>> That should be okay though, if you have no attachments (or all
>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>> is attached later after data has already been written. Does that
>>>>>> sequence need supporting?
>>>>>
>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>> in Android, but I will focus on later for now.
>>>>>
>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>> attached and mapped, right?
>>>>>>
>>>>>
>>>>> Here is an example where CPU access can happen later in Android.
>>>>>
>>>>> Camera device records video -> software post processing -> video device
>>>>> (who does compression of raw data) and writes to a file
>>>>>
>>>>> In this example assume the buffer is cached and the devices are not
>>>>> IO-coherent (quite common).
>>>>>
>>>>
>>>> This is the start of the problem, having cached mappings of memory that
>>>> is also being accessed non-coherently is going to cause issues one way
>>>> or another. On top of the speculative cache fills that have to be
>>>> constantly fought back against with CMOs like below; some coherent
>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>> (snoop filters get messed up).
>>>>
>>>> The solution is to either always have the addresses marked non-coherent
>>>> (like device memory, no-map carveouts), or if you really want to use
>>>> regular system memory allocated at runtime, then all cached mappings of
>>>> it need to be dropped, even the kernel logical address (area as painful
>>>> as that would be).
>>>>
>>>
>>> I agree it's broken, hence my desire to remove it :)
>>>
>>> The other problem is that uncached buffers are being used for
>>> performance reason so anything that would involve getting
>>> rid of the logical address would probably negate any performance
>>> benefit.
>>>
>>
>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>> adamant that they have valid uses. I'm just not sure performance is one
>> of them, maybe in the case of software locks between devices or
>> something where there needs to be a lot of back and forth interleaved
>> access on small amounts of data?
>>
>
> I wasn't aware that ARM considered this not supported, I thought it was
> supported but they advised against it because of the potential performance
> impact.
>

Not sure what you mean by "this" being not supported, do you mean mixed
attribute mappings? If so, it will certainly cause problems, and the
problems will change from platform to platform, avoid at all costs is my
understanding of ARM's position.

> This is after all supported in the DMA APIs and up until now devices have
> been successfully commercializing with this configurations, and I think
> they will continue to commercialize with these configurations for quite a
> while.
>

Use of uncached memory mappings are almost always wrong in my experience
and are used to work around some bug or because the user doesn't want to
implement proper CMOs. Counter examples welcome.

> It would be really unfortunate if support was removed as I think that
> would drive clients away from using upstream ION.
>

I'm not petitioning to remove support, but at very least lets reverse
the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
default, to get uncached you should need to add a flag to your
allocation command pointing out you know what you are doing.

>>>>> ION buffer is allocated.
>>>>>
>>>>> //Camera device records video
>>>>> dma_buf_attach
>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>
>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>> problem.
>>>>
>>>>> [camera device writes to buffer]
>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>
>>>> It doesn't know there will be any further CPU access, it could get freed
>>>> after this for all we know, the invalidate can be saved until the CPU
>>>> requests access again.
>>>>
>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>> down
>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>
>>>>
>>>> This seems like a broken use-case, I understand the desire to keep
>>>> everything as modular as possible and separate the steps, but at this
>>>> point no one owns this buffers backing memory, not the CPU or any
>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>> for the next attachment, which may change the required backing memory
>>>> completely.
>>>>
>>>> All devices should attach before the first mapping, and only let go
>>>> after the task is complete, otherwise this buffers data needs copied off
>>>> to a different location or the CPU needs to take ownership in-between.
>>>>
>>>
>>> Maybe it's broken but it's the status quo and we spent a good
>>> amount of time at plumbers concluding there isn't a great way
>>> to fix it :/
>>>
>>
>> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
>>
>> Perhaps just stronger rules on sequencing of operations? I'm not saying
>> I have a good solution either, I just don't see any way forward without
>> some use-case getting broken, so better to fix now over later.
>>
>
> I can see the benefits of Android doing things the way they do, I would
> request that changes we make continue to support Android, or we find a way
> to convice them to change, as they are the main ION client and I assume
> other ION clients in the future will want to do this as well.
>

Android may be the biggest user today (makes sense, Ion come out of the
Android project), but that can change, and getting changes into Android
will be easier that the upstream kernel once Ion is out of staging.

Unlike some other big ARM vendors, we (TI) do not primarily build mobile
chips targeting Android, our core offerings target more traditional
Linux userspaces, and I'm guessing others will start to do the same as
ARM tries to push more into desktop, server, and other spaces again.

> I am concerned that if you go with a solution which enforces what you
> mention above, and bring ION out of staging that way, it will make it that
> much harder to solve this for Android and therefore harder to get
> Android clients to move to the upstream ION (and get everybody off their
> vendor modified Android versions).
>

That would be an Android problem, reducing functionality in upstream to
match what some evil vendor trees do to support Android is not the way
forward on this. At least for us we are going to try to make all our
software offerings follow proper buffer ownership (including our Android
offering).

>>>>> //buffer is send down the pipeline
>>>>>
>>>>> // Usersapce software post processing occurs
>>>>> mmap buffer
>>>>
>>>> Perhaps the invalidate should happen here in mmap.
>>>>
>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>>> devices attached to buffer
>>>>
>>>> And that should be okay, mmap does the sync, and if no devices are
>>>> attached nothing could have changed the underlying memory in the
>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>>>
>>>>> [CPU reads/writes to the buffer]
>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>>> devices attached to buffer
>>>>> munmap buffer
>>>>>
>>>>> //buffer is send down the pipeline
>>>>> // Buffer is send to video device (who does compression of raw data) and
>>>>> writes to a file
>>>>> dma_buf_attach
>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>> [video device writes to buffer]
>>>>> dma_buf_unmap_attachment
>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>> down
>>>>> the pipeline and Video doesn't know the end of the use case)
>>>>>
>>>>>
>>>>>
>>>>>>> Also ION no longer provides DMA ready memory, so if you are not
>>>>>>> doing CPU
>>>>>>> access then there is no requirement (that I am aware of) for you to
>>>>>>> call
>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and
>>>>>>> if this
>>>>>>> buffer is cached and your device is not IO-coherent then the cache
>>>>>>> maintenance
>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>>
>>>>>>
>>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>>> maintenance on the buffer?
>>>>>>
>>>>>
>>>>> Because ION no longer provides DMA ready memory.
>>>>> Take the above example.
>>>>>
>>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>>> Zeros are written to the cache.
>>>>>
>>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>>> The camera devices writes directly to the buffer in DDR.
>>>>> Since you didn't clean the buffer a dirty cache line (one of the
>>>>> zeros) is
>>>>> evicted from the cache, this zero overwrites data the camera device has
>>>>> written which corrupts your data.
>>>>>
>>>>
>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>>> for CPU access at the time of zeroing.
>>>>
>>>> Andrew
>>>>
>>>>> Liam
>>>>>
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>> a Linux Foundation Collaborative Project
>>>>>
>>>
>>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-17 16:27:05

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/16/19 4:54 PM, Liam Mark wrote:
> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/16/19 9:19 AM, Brian Starkey wrote:
>>> Hi :-)
>>>
>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>
>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>
>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>> ---
>>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>>>>>>
>>>>>>>>> table = a->table;
>>>>>>>>>
>>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>>> - direction))
>>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>
>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>
>>>>>>>
>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>>> is attached later after data has already been written. Does that
>>>>>>> sequence need supporting?
>>>>>>
>>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>>> in Android, but I will focus on later for now.
>>>>>>
>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>>> attached and mapped, right?
>>>>>>>
>>>>>>
>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>
>>>>>> Camera device records video -> software post processing -> video device
>>>>>> (who does compression of raw data) and writes to a file
>>>>>>
>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>> IO-coherent (quite common).
>>>>>>
>>>>>
>>>>> This is the start of the problem, having cached mappings of memory that
>>>>> is also being accessed non-coherently is going to cause issues one way
>>>>> or another. On top of the speculative cache fills that have to be
>>>>> constantly fought back against with CMOs like below; some coherent
>>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>>> (snoop filters get messed up).
>>>>>
>>>>> The solution is to either always have the addresses marked non-coherent
>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>> regular system memory allocated at runtime, then all cached mappings of
>>>>> it need to be dropped, even the kernel logical address (area as painful
>>>>> as that would be).
>>>
>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
>>> "real" is that? It seems that we aren't really hitting that today on
>>> real devices.
>>>
>>
>> Sadly there is at least one real device like this now (TI AM654). We
>> spent some time working with the ARM interconnect spec designers to see
>> if this was allowed behavior, final conclusion was mixing coherent and
>> non-coherent accesses is never a good idea.. So we have been working to
>> try to minimize any cases of mixed attributes [0], if a region is
>> coherent then everyone in the system needs to treat it as such and
>> vice-versa, even clever CMO that work on other systems wont save you
>> here. :(
>>
>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
>>
>>
>>>>>
>>>>>> ION buffer is allocated.
>>>>>>
>>>>>> //Camera device records video
>>>>>> dma_buf_attach
>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>
>>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>
>>>> Actually +Brian this time :)
>>>>
>>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>>> problem.
>>>>>
>>>>>> [camera device writes to buffer]
>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>
>>>>> It doesn't know there will be any further CPU access, it could get freed
>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>> requests access again.
>>>
>>> We don't have any API to allow the invalidate to happen on CPU access
>>> if all devices already detached. We need a struct device pointer to
>>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
>>>
>>> I had a chat with a few people internally after the previous
>>> discussion with Liam. One suggestion was to use
>>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
>>> one other device attached (guarantees that we can do an invalidate in
>>> the future if begin_cpu_access is called). If the last device
>>> detaches, do a sync then.
>>>
>>> Conversely, in map_dma_buf, we would track if there was any CPU access
>>> and use/skip the sync appropriately.
>>>
>>
>> Now that I think this all through I agree this patch is probably wrong.
>> The real fix needs to be better handling in the dma_map_sg() to deal
>> with the case of the memory not being mapped (what I'm dealing with for
>> unmapped heaps), and for cases when the memory in question is not cached
>> (Liam's issue I think). For both these cases the dma_map_sg() does the
>> wrong thing.
>>
>>> I did start poking the code to check out how that would look, but then
>>> Christmas happened and I'm still catching back up.
>>>
>>>>>
>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>
>>>>>
>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>> everything as modular as possible and separate the steps, but at this
>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>>> for the next attachment, which may change the required backing memory
>>>>> completely.
>>>>>
>>>>> All devices should attach before the first mapping, and only let go
>>>>> after the task is complete, otherwise this buffers data needs copied off
>>>>> to a different location or the CPU needs to take ownership in-between.
>>>>>
>>>
>>> Yeah.. that's certainly the theory. Are there any DMA-BUF
>>> implementations which actually do that? I hear it quoted a lot,
>>> because that's what the docs say - but if the reality doesn't match
>>> it, maybe we should change the docs.
>>>
>>
>> Do you mean on the userspace side? I'm not sure, seems like Android
>> might be doing this wrong from what I can gather. From kernel side if
>> you mean the "de-allocate the backing storage", we will have some cases
>> like this soon, so I want to make sure userspace is not abusing DMA-BUF
>> in ways not specified in the documentation. Changing the docs to force
>> the backing memory to always be allocated breaks the central goal in
>> having attach/map in DMA-BUF separate.
>>
>>>>>> //buffer is send down the pipeline
>>>>>>
>>>>>> // Usersapce software post processing occurs
>>>>>> mmap buffer
>>>>>
>>>>> Perhaps the invalidate should happen here in mmap.
>>>>>
>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>>>> devices attached to buffer
>>>>>
>>>>> And that should be okay, mmap does the sync, and if no devices are
>>>>> attached nothing could have changed the underlying memory in the
>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>>
>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
>>> Liam was saying that it's too painful for them to do that every time a
>>> device unmaps - when in many cases (device->device, no CPU) it's not
>>> needed.
>>
>> Invalidates are painless, at least compared to a real cache flush, just
>> set the invalid bit vs actually writing out lines. I thought the issue
>> was on the map side.
>>
>
> Invalidates aren't painless for us because we have a coherent system cache
> so clean lines get written out.

That seems very broken, why would clean lines ever need to be written
out, that defeats the whole point of having the invalidate separate from
clean. How do you deal with stale cache lines? I guess in your case this
is what forces you to have to use uncached memory for DMA-able memory.

> And these invalidates can occur on fairly large buffers.
>
> That is why we haven't went with using cached ION memory and "tracking CPU
> access" because it only solves half the problem, ie there isn't a way to
> safely skip the invalidate (because we can't read the future).
> Our solution was to go with uncached ION memory (when possible), but as
> you can see in other discussions upstream support for uncached memory has
> its own issues.
>

Sounds like you need to fix upstream support then, finding a way to drop
all cacheable mappings of memory you want to make uncached mappings for
seems to be the only solution.

>>>
>>>>>
>>>>>> [CPU reads/writes to the buffer]
>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>>>> devices attached to buffer
>>>>>> munmap buffer
>>>>>>
>>>>>> //buffer is send down the pipeline
>>>>>> // Buffer is send to video device (who does compression of raw data) and
>>>>>> writes to a file
>>>>>> dma_buf_attach
>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>> [video device writes to buffer]
>>>>>> dma_buf_unmap_attachment
>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>>>> the pipeline and Video doesn't know the end of the use case)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>>>>>>> access then there is no requirement (that I am aware of) for you to call
>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>>>
>>>>>>>
>>>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>>>> maintenance on the buffer?
>>>>>>>
>>>>>>
>>>>>> Because ION no longer provides DMA ready memory.
>>>>>> Take the above example.
>>>>>>
>>>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>>>> Zeros are written to the cache.
>>>>>>
>>>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>>>> The camera devices writes directly to the buffer in DDR.
>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
>>>>>> evicted from the cache, this zero overwrites data the camera device has
>>>>>> written which corrupts your data.
>>>>>>
>>>>>
>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>>>> for CPU access at the time of zeroing.
>>>>>
>>>
>>> Actually that should be at the point of the first non-coherent device
>>> mapping the buffer right? No point in doing CMO if the future accesses
>>> are coherent.
>>
>> I see your point, as long as the zeroing is guaranteed to be the first
>> access to this buffer then it should be safe.
>>
>> Andrew
>>
>>>
>>> Cheers,
>>> -Brian
>>>
>>>>> Andrew
>>>>>
>>>>>> Liam
>>>>>>
>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>>> a Linux Foundation Collaborative Project
>>>>>>
>>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-18 01:08:15

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Thu, 17 Jan 2019, Andrew F. Davis wrote:

> On 1/16/19 4:48 PM, Liam Mark wrote:
> > On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/15/19 1:05 PM, Laura Abbott wrote:
> >>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> >>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>>
> >>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>
> >>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
> >>>>>>>> here.
> >>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
> >>>>>>>> anyway.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>>> ---
> >>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
> >>>>>>>> b/drivers/staging/android/ion/ion.c
> >>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
> >>>>>>>> dma_buf_attachment *attachment,
> >>>>>>>>         table = a->table;
> >>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>>> -            direction))
> >>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>>
> >>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
> >>>>>>> maintenance.
> >>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>>> dma_buf_attach then there won't have been a device attached so the
> >>>>>>> calls
> >>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>>
> >>>>>>
> >>>>>> That should be okay though, if you have no attachments (or all
> >>>>>> attachments are IO-coherent) then there is no need for cache
> >>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>>> is attached later after data has already been written. Does that
> >>>>>> sequence need supporting?
> >>>>>
> >>>>> Yes, but also I think there are cases where CPU access can happen before
> >>>>> in Android, but I will focus on later for now.
> >>>>>
> >>>>>> DMA-BUF doesn't have to allocate the backing
> >>>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>>> attached and mapped, right?
> >>>>>>
> >>>>>
> >>>>> Here is an example where CPU access can happen later in Android.
> >>>>>
> >>>>> Camera device records video -> software post processing -> video device
> >>>>> (who does compression of raw data) and writes to a file
> >>>>>
> >>>>> In this example assume the buffer is cached and the devices are not
> >>>>> IO-coherent (quite common).
> >>>>>
> >>>>
> >>>> This is the start of the problem, having cached mappings of memory that
> >>>> is also being accessed non-coherently is going to cause issues one way
> >>>> or another. On top of the speculative cache fills that have to be
> >>>> constantly fought back against with CMOs like below; some coherent
> >>>> interconnects behave badly when you mix coherent and non-coherent access
> >>>> (snoop filters get messed up).
> >>>>
> >>>> The solution is to either always have the addresses marked non-coherent
> >>>> (like device memory, no-map carveouts), or if you really want to use
> >>>> regular system memory allocated at runtime, then all cached mappings of
> >>>> it need to be dropped, even the kernel logical address (area as painful
> >>>> as that would be).
> >>>>
> >>>
> >>> I agree it's broken, hence my desire to remove it :)
> >>>
> >>> The other problem is that uncached buffers are being used for
> >>> performance reason so anything that would involve getting
> >>> rid of the logical address would probably negate any performance
> >>> benefit.
> >>>
> >>
> >> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> >> adamant that they have valid uses. I'm just not sure performance is one
> >> of them, maybe in the case of software locks between devices or
> >> something where there needs to be a lot of back and forth interleaved
> >> access on small amounts of data?
> >>
> >
> > I wasn't aware that ARM considered this not supported, I thought it was
> > supported but they advised against it because of the potential performance
> > impact.
> >
>
> Not sure what you mean by "this" being not supported, do you mean mixed
> attribute mappings? If so, it will certainly cause problems, and the
> problems will change from platform to platform, avoid at all costs is my
> understanding of ARM's position.
>
> > This is after all supported in the DMA APIs and up until now devices have
> > been successfully commercializing with this configurations, and I think
> > they will continue to commercialize with these configurations for quite a
> > while.
> >
>
> Use of uncached memory mappings are almost always wrong in my experience
> and are used to work around some bug or because the user doesn't want to
> implement proper CMOs. Counter examples welcome.
>

Okay, let me first try to clarify what I am referring to, as perhaps I am
misunderstanding the conversation.

In this discussion I was originally referring to a use case with cached
memory being accessed by a non io-cohernet device.

"In this example assume the buffer is cached and the devices are not
IO-coherent (quite common)."

to which you did not think was supported:

"This is the start of the problem, having cached mappings of memory
that is also being accessed non-coherently is going to cause issues
one way or another.
"

And I interpreted Laura's comment below as saying she wanted to remove
support in ION for cached memory being accessed by non io-cohernet
devices:
"I agree it's broken, hence my desire to remove it :)"

So assuming my understanding above is correct (and you are not talking
about something separate such as removing uncached ION allocation
support).

Then I guess I am not clear why current uses which use cached memory with
non IO-coherent devices are considered to be working around some bug or
are not implementing proper CMOs.

They use CPU cached mappings because that is the most effective way to
access the memory from the CPU side and the devices have an uncached
IOMMU mapping because they don't support IO-coherency, and currenlty in
the CPU they do cache mainteance at the time of dma map and dma umap so
to me they are implementing correct CMOs.

> > It would be really unfortunate if support was removed as I think that
> > would drive clients away from using upstream ION.
> >
>
> I'm not petitioning to remove support, but at very least lets reverse
> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
> default, to get uncached you should need to add a flag to your
> allocation command pointing out you know what you are doing.
>

You may not be petitioning to remove support for using cached memory with
non io-coherent devices but I interpreted Laura's comment as wanting to do
so, and I had concerns about that.

> >>>>> ION buffer is allocated.
> >>>>>
> >>>>> //Camera device records video
> >>>>> dma_buf_attach
> >>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>
> >>>> Why does the buffer need to be cleaned here? I just got through reading
> >>>> the thread linked by Laura in the other reply. I do like +Brian's
> >>>> suggestion of tracking if the buffer has had CPU access since the last
> >>>> time and only flushing the cache if it has. As unmapped heaps never get
> >>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>>> problem.
> >>>>
> >>>>> [camera device writes to buffer]
> >>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>>
> >>>> It doesn't know there will be any further CPU access, it could get freed
> >>>> after this for all we know, the invalidate can be saved until the CPU
> >>>> requests access again.
> >>>>
> >>>>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>>>> down
> >>>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>>
> >>>>
> >>>> This seems like a broken use-case, I understand the desire to keep
> >>>> everything as modular as possible and separate the steps, but at this
> >>>> point no one owns this buffers backing memory, not the CPU or any
> >>>> device. I would go as far as to say DMA-BUF should be free now to
> >>>> de-allocate the backing storage if it wants, that way it could get ready
> >>>> for the next attachment, which may change the required backing memory
> >>>> completely.
> >>>>
> >>>> All devices should attach before the first mapping, and only let go
> >>>> after the task is complete, otherwise this buffers data needs copied off
> >>>> to a different location or the CPU needs to take ownership in-between.
> >>>>
> >>>
> >>> Maybe it's broken but it's the status quo and we spent a good
> >>> amount of time at plumbers concluding there isn't a great way
> >>> to fix it :/
> >>>
> >>
> >> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
> >>
> >> Perhaps just stronger rules on sequencing of operations? I'm not saying
> >> I have a good solution either, I just don't see any way forward without
> >> some use-case getting broken, so better to fix now over later.
> >>
> >
> > I can see the benefits of Android doing things the way they do, I would
> > request that changes we make continue to support Android, or we find a way
> > to convice them to change, as they are the main ION client and I assume
> > other ION clients in the future will want to do this as well.
> >
>
> Android may be the biggest user today (makes sense, Ion come out of the
> Android project), but that can change, and getting changes into Android
> will be easier that the upstream kernel once Ion is out of staging.
>
> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
> chips targeting Android, our core offerings target more traditional
> Linux userspaces, and I'm guessing others will start to do the same as
> ARM tries to push more into desktop, server, and other spaces again.
>
> > I am concerned that if you go with a solution which enforces what you
> > mention above, and bring ION out of staging that way, it will make it that
> > much harder to solve this for Android and therefore harder to get
> > Android clients to move to the upstream ION (and get everybody off their
> > vendor modified Android versions).
> >
>
> That would be an Android problem, reducing functionality in upstream to
> match what some evil vendor trees do to support Android is not the way
> forward on this. At least for us we are going to try to make all our
> software offerings follow proper buffer ownership (including our Android
> offering).
>
> >>>>> //buffer is send down the pipeline
> >>>>>
> >>>>> // Usersapce software post processing occurs
> >>>>> mmap buffer
> >>>>
> >>>> Perhaps the invalidate should happen here in mmap.
> >>>>
> >>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>> devices attached to buffer
> >>>>
> >>>> And that should be okay, mmap does the sync, and if no devices are
> >>>> attached nothing could have changed the underlying memory in the
> >>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>>
> >>>>> [CPU reads/writes to the buffer]
> >>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>> devices attached to buffer
> >>>>> munmap buffer
> >>>>>
> >>>>> //buffer is send down the pipeline
> >>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>> writes to a file
> >>>>> dma_buf_attach
> >>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>> [video device writes to buffer]
> >>>>> dma_buf_unmap_attachment
> >>>>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>>>> down
> >>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> Also ION no longer provides DMA ready memory, so if you are not
> >>>>>>> doing CPU
> >>>>>>> access then there is no requirement (that I am aware of) for you to
> >>>>>>> call
> >>>>>>> {begin,end}_cpu_access before passing the buffer to the device and
> >>>>>>> if this
> >>>>>>> buffer is cached and your device is not IO-coherent then the cache
> >>>>>>> maintenance
> >>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>
> >>>>>>
> >>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>> maintenance on the buffer?
> >>>>>>
> >>>>>
> >>>>> Because ION no longer provides DMA ready memory.
> >>>>> Take the above example.
> >>>>>
> >>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>> Zeros are written to the cache.
> >>>>>
> >>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>> The camera devices writes directly to the buffer in DDR.
> >>>>> Since you didn't clean the buffer a dirty cache line (one of the
> >>>>> zeros) is
> >>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>> written which corrupts your data.
> >>>>>
> >>>>
> >>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>> for CPU access at the time of zeroing.
> >>>>
> >>>> Andrew
> >>>>
> >>>>> Liam
> >>>>>
> >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>> a Linux Foundation Collaborative Project
> >>>>>
> >>>
> >>
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-18 01:13:43

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Thu, 17 Jan 2019, Andrew F. Davis wrote:

> On 1/16/19 4:54 PM, Liam Mark wrote:
> > On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/16/19 9:19 AM, Brian Starkey wrote:
> >>> Hi :-)
> >>>
> >>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>>>
> >>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>
> >>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>>>> ---
> >>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> >>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>>>>>>>
> >>>>>>>>> table = a->table;
> >>>>>>>>>
> >>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>>>> - direction))
> >>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>>>
> >>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> >>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
> >>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>>>
> >>>>>>>
> >>>>>>> That should be okay though, if you have no attachments (or all
> >>>>>>> attachments are IO-coherent) then there is no need for cache
> >>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>>>> is attached later after data has already been written. Does that
> >>>>>>> sequence need supporting?
> >>>>>>
> >>>>>> Yes, but also I think there are cases where CPU access can happen before
> >>>>>> in Android, but I will focus on later for now.
> >>>>>>
> >>>>>>> DMA-BUF doesn't have to allocate the backing
> >>>>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>>>> attached and mapped, right?
> >>>>>>>
> >>>>>>
> >>>>>> Here is an example where CPU access can happen later in Android.
> >>>>>>
> >>>>>> Camera device records video -> software post processing -> video device
> >>>>>> (who does compression of raw data) and writes to a file
> >>>>>>
> >>>>>> In this example assume the buffer is cached and the devices are not
> >>>>>> IO-coherent (quite common).
> >>>>>>
> >>>>>
> >>>>> This is the start of the problem, having cached mappings of memory that
> >>>>> is also being accessed non-coherently is going to cause issues one way
> >>>>> or another. On top of the speculative cache fills that have to be
> >>>>> constantly fought back against with CMOs like below; some coherent
> >>>>> interconnects behave badly when you mix coherent and non-coherent access
> >>>>> (snoop filters get messed up).
> >>>>>
> >>>>> The solution is to either always have the addresses marked non-coherent
> >>>>> (like device memory, no-map carveouts), or if you really want to use
> >>>>> regular system memory allocated at runtime, then all cached mappings of
> >>>>> it need to be dropped, even the kernel logical address (area as painful
> >>>>> as that would be).
> >>>
> >>> Ouch :-( I wasn't aware about these potential interconnect issues. How
> >>> "real" is that? It seems that we aren't really hitting that today on
> >>> real devices.
> >>>
> >>
> >> Sadly there is at least one real device like this now (TI AM654). We
> >> spent some time working with the ARM interconnect spec designers to see
> >> if this was allowed behavior, final conclusion was mixing coherent and
> >> non-coherent accesses is never a good idea.. So we have been working to
> >> try to minimize any cases of mixed attributes [0], if a region is
> >> coherent then everyone in the system needs to treat it as such and
> >> vice-versa, even clever CMO that work on other systems wont save you
> >> here. :(
> >>
> >> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
> >>
> >>
> >>>>>
> >>>>>> ION buffer is allocated.
> >>>>>>
> >>>>>> //Camera device records video
> >>>>>> dma_buf_attach
> >>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>
> >>>>> Why does the buffer need to be cleaned here? I just got through reading
> >>>>> the thread linked by Laura in the other reply. I do like +Brian's
> >>>>
> >>>> Actually +Brian this time :)
> >>>>
> >>>>> suggestion of tracking if the buffer has had CPU access since the last
> >>>>> time and only flushing the cache if it has. As unmapped heaps never get
> >>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>>>> problem.
> >>>>>
> >>>>>> [camera device writes to buffer]
> >>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>>>
> >>>>> It doesn't know there will be any further CPU access, it could get freed
> >>>>> after this for all we know, the invalidate can be saved until the CPU
> >>>>> requests access again.
> >>>
> >>> We don't have any API to allow the invalidate to happen on CPU access
> >>> if all devices already detached. We need a struct device pointer to
> >>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
> >>>
> >>> I had a chat with a few people internally after the previous
> >>> discussion with Liam. One suggestion was to use
> >>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
> >>> one other device attached (guarantees that we can do an invalidate in
> >>> the future if begin_cpu_access is called). If the last device
> >>> detaches, do a sync then.
> >>>
> >>> Conversely, in map_dma_buf, we would track if there was any CPU access
> >>> and use/skip the sync appropriately.
> >>>
> >>
> >> Now that I think this all through I agree this patch is probably wrong.
> >> The real fix needs to be better handling in the dma_map_sg() to deal
> >> with the case of the memory not being mapped (what I'm dealing with for
> >> unmapped heaps), and for cases when the memory in question is not cached
> >> (Liam's issue I think). For both these cases the dma_map_sg() does the
> >> wrong thing.
> >>
> >>> I did start poking the code to check out how that would look, but then
> >>> Christmas happened and I'm still catching back up.
> >>>
> >>>>>
> >>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>>>
> >>>>>
> >>>>> This seems like a broken use-case, I understand the desire to keep
> >>>>> everything as modular as possible and separate the steps, but at this
> >>>>> point no one owns this buffers backing memory, not the CPU or any
> >>>>> device. I would go as far as to say DMA-BUF should be free now to
> >>>>> de-allocate the backing storage if it wants, that way it could get ready
> >>>>> for the next attachment, which may change the required backing memory
> >>>>> completely.
> >>>>>
> >>>>> All devices should attach before the first mapping, and only let go
> >>>>> after the task is complete, otherwise this buffers data needs copied off
> >>>>> to a different location or the CPU needs to take ownership in-between.
> >>>>>
> >>>
> >>> Yeah.. that's certainly the theory. Are there any DMA-BUF
> >>> implementations which actually do that? I hear it quoted a lot,
> >>> because that's what the docs say - but if the reality doesn't match
> >>> it, maybe we should change the docs.
> >>>
> >>
> >> Do you mean on the userspace side? I'm not sure, seems like Android
> >> might be doing this wrong from what I can gather. From kernel side if
> >> you mean the "de-allocate the backing storage", we will have some cases
> >> like this soon, so I want to make sure userspace is not abusing DMA-BUF
> >> in ways not specified in the documentation. Changing the docs to force
> >> the backing memory to always be allocated breaks the central goal in
> >> having attach/map in DMA-BUF separate.
> >>
> >>>>>> //buffer is send down the pipeline
> >>>>>>
> >>>>>> // Usersapce software post processing occurs
> >>>>>> mmap buffer
> >>>>>
> >>>>> Perhaps the invalidate should happen here in mmap.
> >>>>>
> >>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>>> devices attached to buffer
> >>>>>
> >>>>> And that should be okay, mmap does the sync, and if no devices are
> >>>>> attached nothing could have changed the underlying memory in the
> >>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>
> >>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> >>> Liam was saying that it's too painful for them to do that every time a
> >>> device unmaps - when in many cases (device->device, no CPU) it's not
> >>> needed.
> >>
> >> Invalidates are painless, at least compared to a real cache flush, just
> >> set the invalid bit vs actually writing out lines. I thought the issue
> >> was on the map side.
> >>
> >
> > Invalidates aren't painless for us because we have a coherent system cache
> > so clean lines get written out.
>
> That seems very broken, why would clean lines ever need to be written
> out, that defeats the whole point of having the invalidate separate from
> clean. How do you deal with stale cache lines? I guess in your case this
> is what forces you to have to use uncached memory for DMA-able memory.
>

My understanding is that our ARM invalidate is a clean + invalidate, I had
concerns about the clean lines being written to the the system cache as
part of the 'clean', but the following 'invalidate' would take care of
actually invalidating the lines (so nothign broken).
But i am probably wrong on this and it is probably smart enough not to the
writing of the clean lines.

But regardless, targets supporting a coherent system cache is a legitamate
configuration and an invalidate on this configuration does have to go to
the bus to invalidate the system cache (which isn't free) so I dont' think
you can make the assumption that invalidates are cheap so that it is okay
to do them (even if they are not needed) on every dma unmap.

> > And these invalidates can occur on fairly large buffers.
> >
> > That is why we haven't went with using cached ION memory and "tracking CPU
> > access" because it only solves half the problem, ie there isn't a way to
> > safely skip the invalidate (because we can't read the future).
> > Our solution was to go with uncached ION memory (when possible), but as
> > you can see in other discussions upstream support for uncached memory has
> > its own issues.
> >
>
> Sounds like you need to fix upstream support then, finding a way to drop
> all cacheable mappings of memory you want to make uncached mappings for
> seems to be the only solution.
>

I think we can probably agree that there woudln't be a good way to remove
cached mappings without causing an unacceptable performance degradation
since it would fragment all the nice 1GB kernel mappings we have.

So I am trying to find an alternative solution.

> >>>
> >>>>>
> >>>>>> [CPU reads/writes to the buffer]
> >>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>>> devices attached to buffer
> >>>>>> munmap buffer
> >>>>>>
> >>>>>> //buffer is send down the pipeline
> >>>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>>> writes to a file
> >>>>>> dma_buf_attach
> >>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>> [video device writes to buffer]
> >>>>>> dma_buf_unmap_attachment
> >>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> >>>>>>>> access then there is no requirement (that I am aware of) for you to call
> >>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> >>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> >>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>>
> >>>>>>>
> >>>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>>> maintenance on the buffer?
> >>>>>>>
> >>>>>>
> >>>>>> Because ION no longer provides DMA ready memory.
> >>>>>> Take the above example.
> >>>>>>
> >>>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>>> Zeros are written to the cache.
> >>>>>>
> >>>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>>> The camera devices writes directly to the buffer in DDR.
> >>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> >>>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>>> written which corrupts your data.
> >>>>>>
> >>>>>
> >>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>>> for CPU access at the time of zeroing.
> >>>>>
> >>>
> >>> Actually that should be at the point of the first non-coherent device
> >>> mapping the buffer right? No point in doing CMO if the future accesses
> >>> are coherent.
> >>
> >> I see your point, as long as the zeroing is guaranteed to be the first
> >> access to this buffer then it should be safe.
> >>
> >> Andrew
> >>
> >>>
> >>> Cheers,
> >>> -Brian
> >>>
> >>>>> Andrew
> >>>>>
> >>>>>> Liam
> >>>>>>
> >>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>>> a Linux Foundation Collaborative Project
> >>>>>>
> >>
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-18 09:58:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On Fri, Jan 11, 2019 at 12:05:09PM -0600, Andrew F. Davis wrote:
> Hello all,
>
> This is a set of (hopefully) non-controversial cleanups for the ION
> framework and current set of heaps. These were found as I start to
> familiarize myself with the framework to help in whatever way I
> can in getting all this up to the standards needed for de-staging.

I've applied the first 10 of these to the tree, thanks.

greg k-h

2019-01-18 10:02:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 12/14] staging: android: ion: Declare helpers for carveout and chunk heaps

On Fri, Jan 11, 2019 at 12:05:21PM -0600, Andrew F. Davis wrote:
> When enabled the helpers functions for creating carveout and chunk heaps
> should have declarations in the ION header.

Why? No one calls these from what I can tell.

Which makes me believe we should just delete the
drivers/staging/android/ion/ion_carveout_heap.c and
drivers/staging/android/ion/ion_chunk_heap.c files as there are no
in-tree users?

Any objection to me doing that?

thanks,

greg k-h

2019-01-18 16:11:38

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 12/14] staging: android: ion: Declare helpers for carveout and chunk heaps

On 1/18/19 3:59 AM, Greg Kroah-Hartman wrote:
> On Fri, Jan 11, 2019 at 12:05:21PM -0600, Andrew F. Davis wrote:
>> When enabled the helpers functions for creating carveout and chunk heaps
>> should have declarations in the ION header.
>
> Why? No one calls these from what I can tell.
>
> Which makes me believe we should just delete the
> drivers/staging/android/ion/ion_carveout_heap.c and
> drivers/staging/android/ion/ion_chunk_heap.c files as there are no
> in-tree users?
>
> Any objection to me doing that?
>

I use those when creating carveout heaps. My exporter is out of tree
still as it uses DT and the proper bindings have not been agreed upon
yet. These helpers also make good heap creation references, even if not
called by anyone in-tree right now.

Thanks,
Andrew

> thanks,
>
> greg k-h
>

2019-01-18 16:53:48

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/17/19 7:04 PM, Liam Mark wrote:
> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/16/19 4:48 PM, Liam Mark wrote:
>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>>
>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>
>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>>>>>> here.
>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>>>>>> anyway.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>>>>>>>>>> dma_buf_attachment *attachment,
>>>>>>>>>>         table = a->table;
>>>>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>>>> -            direction))
>>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>>
>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>>>>>> maintenance.
>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>>>>>> calls
>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>>>> is attached later after data has already been written. Does that
>>>>>>>> sequence need supporting?
>>>>>>>
>>>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>>>> in Android, but I will focus on later for now.
>>>>>>>
>>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>>>> attached and mapped, right?
>>>>>>>>
>>>>>>>
>>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>>
>>>>>>> Camera device records video -> software post processing -> video device
>>>>>>> (who does compression of raw data) and writes to a file
>>>>>>>
>>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>>> IO-coherent (quite common).
>>>>>>>
>>>>>>
>>>>>> This is the start of the problem, having cached mappings of memory that
>>>>>> is also being accessed non-coherently is going to cause issues one way
>>>>>> or another. On top of the speculative cache fills that have to be
>>>>>> constantly fought back against with CMOs like below; some coherent
>>>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>>>> (snoop filters get messed up).
>>>>>>
>>>>>> The solution is to either always have the addresses marked non-coherent
>>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>>> regular system memory allocated at runtime, then all cached mappings of
>>>>>> it need to be dropped, even the kernel logical address (area as painful
>>>>>> as that would be).
>>>>>>
>>>>>
>>>>> I agree it's broken, hence my desire to remove it :)
>>>>>
>>>>> The other problem is that uncached buffers are being used for
>>>>> performance reason so anything that would involve getting
>>>>> rid of the logical address would probably negate any performance
>>>>> benefit.
>>>>>
>>>>
>>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>>>> adamant that they have valid uses. I'm just not sure performance is one
>>>> of them, maybe in the case of software locks between devices or
>>>> something where there needs to be a lot of back and forth interleaved
>>>> access on small amounts of data?
>>>>
>>>
>>> I wasn't aware that ARM considered this not supported, I thought it was
>>> supported but they advised against it because of the potential performance
>>> impact.
>>>
>>
>> Not sure what you mean by "this" being not supported, do you mean mixed
>> attribute mappings? If so, it will certainly cause problems, and the
>> problems will change from platform to platform, avoid at all costs is my
>> understanding of ARM's position.
>>
>>> This is after all supported in the DMA APIs and up until now devices have
>>> been successfully commercializing with this configurations, and I think
>>> they will continue to commercialize with these configurations for quite a
>>> while.
>>>
>>
>> Use of uncached memory mappings are almost always wrong in my experience
>> and are used to work around some bug or because the user doesn't want to
>> implement proper CMOs. Counter examples welcome.
>>
>
> Okay, let me first try to clarify what I am referring to, as perhaps I am
> misunderstanding the conversation.
>
> In this discussion I was originally referring to a use case with cached
> memory being accessed by a non io-cohernet device.
>
> "In this example assume the buffer is cached and the devices are not
> IO-coherent (quite common)."
>
> to which you did not think was supported:
>
> "This is the start of the problem, having cached mappings of memory
> that is also being accessed non-coherently is going to cause issues
> one way or another.
> "
>
> And I interpreted Laura's comment below as saying she wanted to remove
> support in ION for cached memory being accessed by non io-cohernet
> devices:
> "I agree it's broken, hence my desire to remove it :)"
>
> So assuming my understanding above is correct (and you are not talking
> about something separate such as removing uncached ION allocation
> support).
>

Ah, I think here is where we diverged, I'm assuming Laura's comment to
be referencing my issue with uncached mappings being handed out without
first removing all cached mappings of the same memory. Therefore it is
uncached heaps that are broken.

> Then I guess I am not clear why current uses which use cached memory with
> non IO-coherent devices are considered to be working around some bug or
> are not implementing proper CMOs.
>
> They use CPU cached mappings because that is the most effective way to
> access the memory from the CPU side and the devices have an uncached
> IOMMU mapping because they don't support IO-coherency, and currenlty in
> the CPU they do cache mainteance at the time of dma map and dma umap so
> to me they are implementing correct CMOs.
>

Fully agree here, using cached mappings and performing CMOs when needed
is the way to go when dealing with memory. IMHO the *only* time when
uncached mappings are appropriate is for memory mapped I/O (although it
looks like video memory was often treated as uncached (wc)).

>>> It would be really unfortunate if support was removed as I think that
>>> would drive clients away from using upstream ION.
>>>
>>
>> I'm not petitioning to remove support, but at very least lets reverse
>> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
>> default, to get uncached you should need to add a flag to your
>> allocation command pointing out you know what you are doing.
>>
>
> You may not be petitioning to remove support for using cached memory with
> non io-coherent devices but I interpreted Laura's comment as wanting to do
> so, and I had concerns about that.
>

What I would like is for the default memory handed out by Ion to be
normal cacheable memory, just like is always handed out to users-space.
DMA-BUF already provides the means to deal with the CMOs required to
work with non-io-coherent devices so all should be good here.

If you want Ion to give out uncached memory then I think you should need
to explicitly state so with an allocation flag. And right now the
uncached memory you will get back may have other cached mappings (kernel
lowmem mappings) meaning you will have hard to predict results (on ARM
at least). I just don't see much use for them (uncached mappings of
regular memory) right now.

>>>>>>> ION buffer is allocated.
>>>>>>>
>>>>>>> //Camera device records video
>>>>>>> dma_buf_attach
>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>
>>>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>>>> problem.
>>>>>>
>>>>>>> [camera device writes to buffer]
>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>>
>>>>>> It doesn't know there will be any further CPU access, it could get freed
>>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>>> requests access again.
>>>>>>
>>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>>>> down
>>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>>
>>>>>>
>>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>>> everything as modular as possible and separate the steps, but at this
>>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>>>> for the next attachment, which may change the required backing memory
>>>>>> completely.
>>>>>>
>>>>>> All devices should attach before the first mapping, and only let go
>>>>>> after the task is complete, otherwise this buffers data needs copied off
>>>>>> to a different location or the CPU needs to take ownership in-between.
>>>>>>
>>>>>
>>>>> Maybe it's broken but it's the status quo and we spent a good
>>>>> amount of time at plumbers concluding there isn't a great way
>>>>> to fix it :/
>>>>>
>>>>
>>>> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
>>>>
>>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
>>>> I have a good solution either, I just don't see any way forward without
>>>> some use-case getting broken, so better to fix now over later.
>>>>
>>>
>>> I can see the benefits of Android doing things the way they do, I would
>>> request that changes we make continue to support Android, or we find a way
>>> to convice them to change, as they are the main ION client and I assume
>>> other ION clients in the future will want to do this as well.
>>>
>>
>> Android may be the biggest user today (makes sense, Ion come out of the
>> Android project), but that can change, and getting changes into Android
>> will be easier that the upstream kernel once Ion is out of staging.
>>
>> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
>> chips targeting Android, our core offerings target more traditional
>> Linux userspaces, and I'm guessing others will start to do the same as
>> ARM tries to push more into desktop, server, and other spaces again.
>>
>>> I am concerned that if you go with a solution which enforces what you
>>> mention above, and bring ION out of staging that way, it will make it that
>>> much harder to solve this for Android and therefore harder to get
>>> Android clients to move to the upstream ION (and get everybody off their
>>> vendor modified Android versions).
>>>
>>
>> That would be an Android problem, reducing functionality in upstream to
>> match what some evil vendor trees do to support Android is not the way
>> forward on this. At least for us we are going to try to make all our
>> software offerings follow proper buffer ownership (including our Android
>> offering).
>>
>>>>>>> //buffer is send down the pipeline
>>>>>>>
>>>>>>> // Usersapce software post processing occurs
>>>>>>> mmap buffer
>>>>>>
>>>>>> Perhaps the invalidate should happen here in mmap.
>>>>>>
>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>>>>> devices attached to buffer
>>>>>>
>>>>>> And that should be okay, mmap does the sync, and if no devices are
>>>>>> attached nothing could have changed the underlying memory in the
>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>>>>>
>>>>>>> [CPU reads/writes to the buffer]
>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>>>>> devices attached to buffer
>>>>>>> munmap buffer
>>>>>>>
>>>>>>> //buffer is send down the pipeline
>>>>>>> // Buffer is send to video device (who does compression of raw data) and
>>>>>>> writes to a file
>>>>>>> dma_buf_attach
>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>> [video device writes to buffer]
>>>>>>> dma_buf_unmap_attachment
>>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>>>> down
>>>>>>> the pipeline and Video doesn't know the end of the use case)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not
>>>>>>>>> doing CPU
>>>>>>>>> access then there is no requirement (that I am aware of) for you to
>>>>>>>>> call
>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and
>>>>>>>>> if this
>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache
>>>>>>>>> maintenance
>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>>>>> maintenance on the buffer?
>>>>>>>>
>>>>>>>
>>>>>>> Because ION no longer provides DMA ready memory.
>>>>>>> Take the above example.
>>>>>>>
>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>>>>> Zeros are written to the cache.
>>>>>>>
>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>>>>> The camera devices writes directly to the buffer in DDR.
>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the
>>>>>>> zeros) is
>>>>>>> evicted from the cache, this zero overwrites data the camera device has
>>>>>>> written which corrupts your data.
>>>>>>>
>>>>>>
>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>>>>> for CPU access at the time of zeroing.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>>> Liam
>>>>>>>
>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>>>> a Linux Foundation Collaborative Project
>>>>>>>
>>>>>
>>>>
>>>
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>
>>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-18 17:18:38

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/17/19 7:11 PM, Liam Mark wrote:
> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/16/19 4:54 PM, Liam Mark wrote:
>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/16/19 9:19 AM, Brian Starkey wrote:
>>>>> Hi :-)
>>>>>
>>>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>>>>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
>>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>
>>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>>
>>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>>>>>>>>
>>>>>>>>>>> table = a->table;
>>>>>>>>>>>
>>>>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>>>>> - direction))
>>>>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>>>
>>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
>>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>>>>> is attached later after data has already been written. Does that
>>>>>>>>> sequence need supporting?
>>>>>>>>
>>>>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>>>>> in Android, but I will focus on later for now.
>>>>>>>>
>>>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>>>>> attached and mapped, right?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>>>
>>>>>>>> Camera device records video -> software post processing -> video device
>>>>>>>> (who does compression of raw data) and writes to a file
>>>>>>>>
>>>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>>>> IO-coherent (quite common).
>>>>>>>>
>>>>>>>
>>>>>>> This is the start of the problem, having cached mappings of memory that
>>>>>>> is also being accessed non-coherently is going to cause issues one way
>>>>>>> or another. On top of the speculative cache fills that have to be
>>>>>>> constantly fought back against with CMOs like below; some coherent
>>>>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>>>>> (snoop filters get messed up).
>>>>>>>
>>>>>>> The solution is to either always have the addresses marked non-coherent
>>>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>>>> regular system memory allocated at runtime, then all cached mappings of
>>>>>>> it need to be dropped, even the kernel logical address (area as painful
>>>>>>> as that would be).
>>>>>
>>>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
>>>>> "real" is that? It seems that we aren't really hitting that today on
>>>>> real devices.
>>>>>
>>>>
>>>> Sadly there is at least one real device like this now (TI AM654). We
>>>> spent some time working with the ARM interconnect spec designers to see
>>>> if this was allowed behavior, final conclusion was mixing coherent and
>>>> non-coherent accesses is never a good idea.. So we have been working to
>>>> try to minimize any cases of mixed attributes [0], if a region is
>>>> coherent then everyone in the system needs to treat it as such and
>>>> vice-versa, even clever CMO that work on other systems wont save you
>>>> here. :(
>>>>
>>>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
>>>>
>>>>
>>>>>>>
>>>>>>>> ION buffer is allocated.
>>>>>>>>
>>>>>>>> //Camera device records video
>>>>>>>> dma_buf_attach
>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>
>>>>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>>>
>>>>>> Actually +Brian this time :)
>>>>>>
>>>>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>>>>> problem.
>>>>>>>
>>>>>>>> [camera device writes to buffer]
>>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>>>
>>>>>>> It doesn't know there will be any further CPU access, it could get freed
>>>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>>>> requests access again.
>>>>>
>>>>> We don't have any API to allow the invalidate to happen on CPU access
>>>>> if all devices already detached. We need a struct device pointer to
>>>>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
>>>>>
>>>>> I had a chat with a few people internally after the previous
>>>>> discussion with Liam. One suggestion was to use
>>>>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
>>>>> one other device attached (guarantees that we can do an invalidate in
>>>>> the future if begin_cpu_access is called). If the last device
>>>>> detaches, do a sync then.
>>>>>
>>>>> Conversely, in map_dma_buf, we would track if there was any CPU access
>>>>> and use/skip the sync appropriately.
>>>>>
>>>>
>>>> Now that I think this all through I agree this patch is probably wrong.
>>>> The real fix needs to be better handling in the dma_map_sg() to deal
>>>> with the case of the memory not being mapped (what I'm dealing with for
>>>> unmapped heaps), and for cases when the memory in question is not cached
>>>> (Liam's issue I think). For both these cases the dma_map_sg() does the
>>>> wrong thing.
>>>>
>>>>> I did start poking the code to check out how that would look, but then
>>>>> Christmas happened and I'm still catching back up.
>>>>>
>>>>>>>
>>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>>>
>>>>>>>
>>>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>>>> everything as modular as possible and separate the steps, but at this
>>>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>>>>> for the next attachment, which may change the required backing memory
>>>>>>> completely.
>>>>>>>
>>>>>>> All devices should attach before the first mapping, and only let go
>>>>>>> after the task is complete, otherwise this buffers data needs copied off
>>>>>>> to a different location or the CPU needs to take ownership in-between.
>>>>>>>
>>>>>
>>>>> Yeah.. that's certainly the theory. Are there any DMA-BUF
>>>>> implementations which actually do that? I hear it quoted a lot,
>>>>> because that's what the docs say - but if the reality doesn't match
>>>>> it, maybe we should change the docs.
>>>>>
>>>>
>>>> Do you mean on the userspace side? I'm not sure, seems like Android
>>>> might be doing this wrong from what I can gather. From kernel side if
>>>> you mean the "de-allocate the backing storage", we will have some cases
>>>> like this soon, so I want to make sure userspace is not abusing DMA-BUF
>>>> in ways not specified in the documentation. Changing the docs to force
>>>> the backing memory to always be allocated breaks the central goal in
>>>> having attach/map in DMA-BUF separate.
>>>>
>>>>>>>> //buffer is send down the pipeline
>>>>>>>>
>>>>>>>> // Usersapce software post processing occurs
>>>>>>>> mmap buffer
>>>>>>>
>>>>>>> Perhaps the invalidate should happen here in mmap.
>>>>>>>
>>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>>>>>> devices attached to buffer
>>>>>>>
>>>>>>> And that should be okay, mmap does the sync, and if no devices are
>>>>>>> attached nothing could have changed the underlying memory in the
>>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>>>>
>>>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
>>>>> Liam was saying that it's too painful for them to do that every time a
>>>>> device unmaps - when in many cases (device->device, no CPU) it's not
>>>>> needed.
>>>>
>>>> Invalidates are painless, at least compared to a real cache flush, just
>>>> set the invalid bit vs actually writing out lines. I thought the issue
>>>> was on the map side.
>>>>
>>>
>>> Invalidates aren't painless for us because we have a coherent system cache
>>> so clean lines get written out.
>>
>> That seems very broken, why would clean lines ever need to be written
>> out, that defeats the whole point of having the invalidate separate from
>> clean. How do you deal with stale cache lines? I guess in your case this
>> is what forces you to have to use uncached memory for DMA-able memory.
>>
>
> My understanding is that our ARM invalidate is a clean + invalidate, I had
> concerns about the clean lines being written to the the system cache as
> part of the 'clean', but the following 'invalidate' would take care of
> actually invalidating the lines (so nothign broken).
> But i am probably wrong on this and it is probably smart enough not to the
> writing of the clean lines.
>

You are correct that for a lot of ARM cores "invalidate" is always a
"clean + invalidate". At first I thought this was kinda silly as there
is now no way to mark a dirty line invalid without it getting written
out first, but if you think about it any dirty cache-line can be written
out (cleaned) at anytime anyway, so this doesn't actually change system
behavior. You should just not write to memory (make the line dirty)
anything you don't want eventually written out.

Point two, it's not just smart enough to not write-out clean lines, it
is guaranteed not to write them out by the spec. Otherwise since
cache-lines can be randomly filled if those same clean lines got written
out on invalidate operations there would be no way to maintain coherency
and things would be written over top each other all over the place.

> But regardless, targets supporting a coherent system cache is a legitamate
> configuration and an invalidate on this configuration does have to go to
> the bus to invalidate the system cache (which isn't free) so I dont' think
> you can make the assumption that invalidates are cheap so that it is okay
> to do them (even if they are not needed) on every dma unmap.
>

Very true, CMOs need to be broadcast to other coherent masters on a
coherent interconnect (and the interconnect itself if it has a cache as
well (L3)), so not 100% free, but almost, just the infinitesimal cost of
the cache tag check in hardware. If there are no non-coherent devices
attached then the CMOs are no-ops, if there are then the data needs to
be written out either way, doing it every access like is done with
uncached memory (- any write combining) will blow away any saving made
from the one less CMO. Either way you lose with uncached mappings of
memory. If I'm wrong I would love to know.

>>> And these invalidates can occur on fairly large buffers.
>>>
>>> That is why we haven't went with using cached ION memory and "tracking CPU
>>> access" because it only solves half the problem, ie there isn't a way to
>>> safely skip the invalidate (because we can't read the future).
>>> Our solution was to go with uncached ION memory (when possible), but as
>>> you can see in other discussions upstream support for uncached memory has
>>> its own issues.
>>>
>>
>> Sounds like you need to fix upstream support then, finding a way to drop
>> all cacheable mappings of memory you want to make uncached mappings for
>> seems to be the only solution.
>>
>
> I think we can probably agree that there woudln't be a good way to remove
> cached mappings without causing an unacceptable performance degradation
> since it would fragment all the nice 1GB kernel mappings we have.
>
> So I am trying to find an alternative solution.
>

I'm not sure there is a better solution. How hard is this solution to
implement anyway? The kernel already has to make gaps and cut up that
nice 1GB mapping when you make a reserved memory space in the lowmem
area, so all the logic is probably already implemented. Just need to
allow it to be hooked into from Ion when doing doing the uncached mappings.

>>>>>
>>>>>>>
>>>>>>>> [CPU reads/writes to the buffer]
>>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>>>>>> devices attached to buffer
>>>>>>>> munmap buffer
>>>>>>>>
>>>>>>>> //buffer is send down the pipeline
>>>>>>>> // Buffer is send to video device (who does compression of raw data) and
>>>>>>>> writes to a file
>>>>>>>> dma_buf_attach
>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>> [video device writes to buffer]
>>>>>>>> dma_buf_unmap_attachment
>>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>>>>>> the pipeline and Video doesn't know the end of the use case)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>>>>>>>>> access then there is no requirement (that I am aware of) for you to call
>>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>>>>>> maintenance on the buffer?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Because ION no longer provides DMA ready memory.
>>>>>>>> Take the above example.
>>>>>>>>
>>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>>>>>> Zeros are written to the cache.
>>>>>>>>
>>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>>>>>> The camera devices writes directly to the buffer in DDR.
>>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
>>>>>>>> evicted from the cache, this zero overwrites data the camera device has
>>>>>>>> written which corrupts your data.
>>>>>>>>
>>>>>>>
>>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>>>>>> for CPU access at the time of zeroing.
>>>>>>>
>>>>>
>>>>> Actually that should be at the point of the first non-coherent device
>>>>> mapping the buffer right? No point in doing CMO if the future accesses
>>>>> are coherent.
>>>>
>>>> I see your point, as long as the zeroing is guaranteed to be the first
>>>> access to this buffer then it should be safe.
>>>>
>>>> Andrew
>>>>
>>>>>
>>>>> Cheers,
>>>>> -Brian
>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>>> Liam
>>>>>>>>
>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>>>>> a Linux Foundation Collaborative Project
>>>>>>>>
>>>>
>>>
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>
>>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-18 19:55:42

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 11/14] staging: android: ion: Allow heap name to be null

On 1/16/19 9:12 AM, Andrew F. Davis wrote:
> On 1/16/19 9:28 AM, Brian Starkey wrote:
>> Hi Andrew,
>>
>> On Fri, Jan 11, 2019 at 12:05:20PM -0600, Andrew F. Davis wrote:
>>> The heap name can be used for debugging but otherwise does not seem
>>> to be required and no other part of the code will fail if left NULL
>>> except here. We can make it required and check for it at some point,
>>> for now lets just prevent this from causing a NULL pointer exception.
>>
>> I'm not so keen on this one. In the "new" API with heap querying, the
>> name string is the only way to identify the heap. I think Laura
>> mentioned at XDC2017 that it was expected that userspace should use
>> the strings to find the heap they want.
>>
>
> Right now the names are only for debug. I accidentally left the name
> null once and got a kernel crash. This is the only spot where it is
> needed so I fixed it up. The other option is to make the name mandatory
> and properly error out, I don't want to do that right now until the
> below discussion is had to see if names really do matter or not.
>

Yes, the heap names are part of the query API and are the expected
way to identify individual heaps for the API at the moment so having
a null heap name is incorrect. The heap name seemed like the best way
to identify heaps to userspace but if you have an alternative proposal
I'd be interested.

Thanks,
Laura

>



2019-01-18 19:57:11

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 12/14] staging: android: ion: Declare helpers for carveout and chunk heaps

On 1/18/19 1:59 AM, Greg Kroah-Hartman wrote:
> On Fri, Jan 11, 2019 at 12:05:21PM -0600, Andrew F. Davis wrote:
>> When enabled the helpers functions for creating carveout and chunk heaps
>> should have declarations in the ION header.
>
> Why? No one calls these from what I can tell.
>
> Which makes me believe we should just delete the
> drivers/staging/android/ion/ion_carveout_heap.c and
> drivers/staging/android/ion/ion_chunk_heap.c files as there are no
> in-tree users?
>
> Any objection to me doing that?
>
> thanks,
>
> greg k-h
>

I'd rather not delete it quite yet. Part of this entire thread is a
discussion on how to let those heaps and associated function actually
be called in some way in tree. I expect them to either get called
in tree or be replaced.

Thanks,
Laura

2019-01-18 20:21:23

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On 1/16/19 8:05 AM, Andrew F. Davis wrote:
> On 1/15/19 12:58 PM, Laura Abbott wrote:
>> On 1/15/19 9:47 AM, Andrew F. Davis wrote:
>>> On 1/14/19 8:39 PM, Laura Abbott wrote:
>>>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>>>> Hello all,
>>>>>
>>>>> This is a set of (hopefully) non-controversial cleanups for the ION
>>>>> framework and current set of heaps. These were found as I start to
>>>>> familiarize myself with the framework to help in whatever way I
>>>>> can in getting all this up to the standards needed for de-staging.
>>>>>
>>>>> I would like to get some ideas of what is left to work on to get ION
>>>>> out of staging. Has there been some kind of agreement on what ION
>>>>> should
>>>>> eventually end up being? To me it looks like it is being whittled
>>>>> away at
>>>>> to it's most core functions. To me that is looking like being a DMA-BUF
>>>>> user-space front end, simply advertising available memory backings in a
>>>>> system and providing allocations as DMA-BUF handles. If this is the
>>>>> case
>>>>> then it looks close to being ready to me at least, but I would love to
>>>>> hear any other opinions and concerns.
>>>>>
>>>>
>>>> Yes, at this point the only functionality that people are really
>>>> depending on is the ability to allocate a dma_buf easily from userspace.
>>>>
>>>>> Back to this patchset, the last patch may be a bit different than the
>>>>> others, it adds an unmapped heaps type and creation helper. I wanted to
>>>>> get this in to show off another heap type and maybe some issues we may
>>>>> have with the current ION framework. The unmapped heap is used when the
>>>>> backing memory should not (or cannot) be touched. Currently this kind
>>>>> of heap is used for firewalled secure memory that can be allocated like
>>>>> normal heap memory but only used by secure devices (OP-TEE, crypto HW,
>>>>> etc). It is basically just copied from the "carveout" heap type with
>>>>> the
>>>>> only difference being it is not mappable to userspace and we do not
>>>>> clear
>>>>> the memory (as we should not map it either). So should this really be a
>>>>> new heap type? Or maybe advertised as a carveout heap but with an
>>>>> additional allocation flag? Perhaps we do away with "types" altogether
>>>>> and just have flags, coherent/non-coherent, mapped/unmapped, etc.
>>>>>
>>>>> Maybe more thinking will be needed afterall..
>>>>>
>>>>
>>>> So the cleanup looks okay (I need to finish reviewing) but I'm not a
>>>> fan of adding another heaptype without solving the problem of adding
>>>> some sort of devicetree binding or other method of allocating and
>>>> placing Ion heaps. That plus uncached buffers are one of the big
>>>> open problems that need to be solved for destaging Ion. See
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>>
>>>> for some background on that problem.
>>>>
>>>
>>> I'm under the impression that adding heaps like carveouts/chunk will be
>>> rather system specific and so do not lend themselves well to a universal
>>> DT style exporter. For instance a carveout memory space can be reported
>>> by a device at runtime, then the driver managing that device should go
>>> and use the carveout heap helpers to export that heap. If this is the
>>> case then I'm not sure it is a problem for the ION core framework to
>>> solve, but rather the users of it to figure out how best to create the
>>> various heaps. All Ion needs to do is allow exporting and advertising
>>> them IMHO.
>>>
>>
>> I think it is a problem for the Ion core framework to take care of.
>> Ion is useless if you don't actually have the heaps. Nobody has
>> actually gotten a full Ion solution end-to-end with a carveout heap
>> working in mainline because any proposals have been rejected. I think
>> we need at least one example in mainline of how creating a carveout
>> heap would work.
>
> In our evil vendor trees we have several examples. The issue being that
> Ion is still staging and attempts for generic DT heap definitions
> haven't seemed to go so well. So for now we just keep it specific to our
> platforms until upstream makes a direction decision.
>

Yeah, it's been a bit of a chicken and egg in that this has been
blocking Ion getting out of staging but we don't actually have
in-tree users because it's still in staging.

>>
>>> Thanks for the background thread link, I've been looking for some info
>>> on current status of all this and "ion" is a bit hard to search the
>>> lists for. The core reason for posting this cleanup series is to throw
>>> my hat into the ring of all this Ion work and start getting familiar
>>> with the pending issues. The last two patches are not all that important
>>> to get in right now.
>>>
>>> In that thread you linked above, it seems we may have arrived at a
>>> similar problem for different reasons. I think the root issue is the Ion
>>> core makes too many assumptions about the heap memory. My proposal would
>>> be to allow the heap exporters more control over the DMA-BUF ops, maybe
>>> even going as far as letting them provide their own complete struct
>>> dma_buf_ops.
>>>
>>> Let me give an example where I think this is going to be useful. We have
>>> the classic constraint solving problem on our SoCs. Our SoCs are full of
>>> various coherent and non-coherent devices, some require contiguous
>>> memory allocations, others have in-line IOMMUs so can operate on
>>> non-contiguous, etc..
>>>
>>> DMA-BUF has a solution designed in for this we can use, namely
>>> allocation at map time after all the attachments have come in. The
>>> checking of each attached device to find the right backing memory is
>>> something the DMA-BUF exporter has to do, and so this SoC specific logic
>>> would have to be added to each exporting framework (DRM, V4L2, etc),
>>> unless we have one unified system exporter everyone uses, Ion.
>>>
>>
>> That's how dmabuf is supposed to work in theory but in practice we
>> also have the case of userspace allocates memory, mmaps, and then
>> a device attaches to it. The issue is we end up having to do work
>> and make decisions before all devices are actually attached.
>>
>
> That just seems wrong, DMA-BUF should be used for, well, DMA-able
> buffers.. Userspace should not be using these buffers without devices
> attached, otherwise why not use a regular buffer. If you need to fill
> the buffer then you should attach/map it first so the DMA-BUF exporter
> can pick the appropriate backing memory first.
>
> Maybe a couple more rules on the ordering of DMA-BUF operations are
> needed to prevent having to deal with all these non-useful permutations.
>
> Sumit? ^^
>

I'd love to just say "don't do that" but it's existing userspace
behavior and it's really hard to change that.

>>> Then each system can define one (maybe typeless) heap, the correct
>>> backing type is system specific anyway, so let the system specific
>>> backing logic in the unified system exporter heap handle picking that.
>>> To allow that heaps need direct control of dma_buf_ops.
>>>
>>> Direct heap control of dma_buf_ops also fixes the cache/non-cache issue,
>>> and my unmapped memory issue, each heap type handles the quirks of its
>>> backing storage in its own way, instead of trying to find some one size
>>> fits all memory operations like we are doing now.
>>>
>>
>> I don't think this is an issue of one-size fits all. We have flags
>> to differentiate between cached and uncached paths, the issue is
>> that doing the synchronization for uncached buffers is difficult.
>>
>
> It is difficult, hence why letting an uncached heap exporter do all the
> heavy work, instead of trying to deal with all these cases in the Ion
> core framework.
>
>> I'm just not sure how an extra set of dma_buf ops actually solves
>> the problem of needing to synchronize alias mappings.
>>
>
> It doesn't solve it, it just moves the work out of the framework. There
> are going to be a lot more interesting problems than this with some
> types heaps we will have in the future, dealing with all the logic in
> the framework core is not going to scale.
>

That is a good point. My immediate concern though is getting Ion out
of staging. If the per heap dma_buf ops will help with that I'd
certainly like to see them.

Thanks,
Laura

> Thanks,
> Andrew
>
>> Thanks,
>> Laura
>>
>>> We can provide helpers for the simple heap types still, but with this
>>> much of the heavy lifting moves out of the Ion core framework making it
>>> much more simple, something I think it will need for de-staging.
>>>
>>> Anyway, I might be completely off base in my direction here, just let me
>>> know :)
>>>
>>


2019-01-18 20:33:48

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/17/19 8:13 AM, Andrew F. Davis wrote:
> On 1/16/19 4:48 PM, Liam Mark wrote:
>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>
>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>
>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>
>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>>>>> here.
>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>>>>> anyway.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>> ---
>>>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>>>>>>>>> dma_buf_attachment *attachment,
>>>>>>>>>         table = a->table;
>>>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>>> -            direction))
>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>
>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>>>>> maintenance.
>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>>>>> calls
>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>
>>>>>>>
>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>>> is attached later after data has already been written. Does that
>>>>>>> sequence need supporting?
>>>>>>
>>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>>> in Android, but I will focus on later for now.
>>>>>>
>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>>> attached and mapped, right?
>>>>>>>
>>>>>>
>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>
>>>>>> Camera device records video -> software post processing -> video device
>>>>>> (who does compression of raw data) and writes to a file
>>>>>>
>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>> IO-coherent (quite common).
>>>>>>
>>>>>
>>>>> This is the start of the problem, having cached mappings of memory that
>>>>> is also being accessed non-coherently is going to cause issues one way
>>>>> or another. On top of the speculative cache fills that have to be
>>>>> constantly fought back against with CMOs like below; some coherent
>>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>>> (snoop filters get messed up).
>>>>>
>>>>> The solution is to either always have the addresses marked non-coherent
>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>> regular system memory allocated at runtime, then all cached mappings of
>>>>> it need to be dropped, even the kernel logical address (area as painful
>>>>> as that would be).
>>>>>
>>>>
>>>> I agree it's broken, hence my desire to remove it :)
>>>>
>>>> The other problem is that uncached buffers are being used for
>>>> performance reason so anything that would involve getting
>>>> rid of the logical address would probably negate any performance
>>>> benefit.
>>>>
>>>
>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>>> adamant that they have valid uses. I'm just not sure performance is one
>>> of them, maybe in the case of software locks between devices or
>>> something where there needs to be a lot of back and forth interleaved
>>> access on small amounts of data?
>>>
>>
>> I wasn't aware that ARM considered this not supported, I thought it was
>> supported but they advised against it because of the potential performance
>> impact.
>>
>
> Not sure what you mean by "this" being not supported, do you mean mixed
> attribute mappings? If so, it will certainly cause problems, and the
> problems will change from platform to platform, avoid at all costs is my
> understanding of ARM's position.
>
>> This is after all supported in the DMA APIs and up until now devices have
>> been successfully commercializing with this configurations, and I think
>> they will continue to commercialize with these configurations for quite a
>> while.
>>
>
> Use of uncached memory mappings are almost always wrong in my experience
> and are used to work around some bug or because the user doesn't want to
> implement proper CMOs. Counter examples welcome.
>
>> It would be really unfortunate if support was removed as I think that
>> would drive clients away from using upstream ION.
>>
>
> I'm not petitioning to remove support, but at very least lets reverse
> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
> default, to get uncached you should need to add a flag to your
> allocation command pointing out you know what you are doing.
>

I thought about doing that, the problem is it becomes an ABI break for
existing users which I really didn't want to do again. If it
ends up being the last thing we do before moving out of staging,
I'd consider doing it.

>>>>>> ION buffer is allocated.
>>>>>>
>>>>>> //Camera device records video
>>>>>> dma_buf_attach
>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>
>>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>>> problem.
>>>>>
>>>>>> [camera device writes to buffer]
>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>
>>>>> It doesn't know there will be any further CPU access, it could get freed
>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>> requests access again.
>>>>>
>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>>> down
>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>
>>>>>
>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>> everything as modular as possible and separate the steps, but at this
>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>>> for the next attachment, which may change the required backing memory
>>>>> completely.
>>>>>
>>>>> All devices should attach before the first mapping, and only let go
>>>>> after the task is complete, otherwise this buffers data needs copied off
>>>>> to a different location or the CPU needs to take ownership in-between.
>>>>>
>>>>
>>>> Maybe it's broken but it's the status quo and we spent a good
>>>> amount of time at plumbers concluding there isn't a great way
>>>> to fix it :/
>>>>
>>>
>>> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
>>>
>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
>>> I have a good solution either, I just don't see any way forward without
>>> some use-case getting broken, so better to fix now over later.
>>>
>>
>> I can see the benefits of Android doing things the way they do, I would
>> request that changes we make continue to support Android, or we find a way
>> to convice them to change, as they are the main ION client and I assume
>> other ION clients in the future will want to do this as well.
>>
>
> Android may be the biggest user today (makes sense, Ion come out of the
> Android project), but that can change, and getting changes into Android
> will be easier that the upstream kernel once Ion is out of staging.
>
> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
> chips targeting Android, our core offerings target more traditional
> Linux userspaces, and I'm guessing others will start to do the same as
> ARM tries to push more into desktop, server, and other spaces again.
>
>> I am concerned that if you go with a solution which enforces what you
>> mention above, and bring ION out of staging that way, it will make it that
>> much harder to solve this for Android and therefore harder to get
>> Android clients to move to the upstream ION (and get everybody off their
>> vendor modified Android versions).
>>
>
> That would be an Android problem, reducing functionality in upstream to
> match what some evil vendor trees do to support Android is not the way
> forward on this. At least for us we are going to try to make all our
> software offerings follow proper buffer ownership (including our Android
> offering).
>

I don't think this is reducing functionality, it's about not breaking
what already works. There is some level of Android testing on a mainline
tree (hikey boards). I would say if we can come to an agreement on
a correct API, we could always merge the 'correct' version out of
staging and keep a legacy driver around for some time as a transition.

Thanks,
Laura

2019-01-18 20:45:22

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/18/19 2:31 PM, Laura Abbott wrote:
> On 1/17/19 8:13 AM, Andrew F. Davis wrote:
>> On 1/16/19 4:48 PM, Liam Mark wrote:
>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>>
>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>
>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>>>>>> here.
>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>>>>>> anyway.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>    drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table
>>>>>>>>>> *ion_map_dma_buf(struct
>>>>>>>>>> dma_buf_attachment *attachment,
>>>>>>>>>>          table = a->table;
>>>>>>>>>>    -    if (!dma_map_sg(attachment->dev, table->sgl,
>>>>>>>>>> table->nents,
>>>>>>>>>> -            direction))
>>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl,
>>>>>>>>>> table->nents,
>>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>>
>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>>>>>> maintenance.
>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the
>>>>>>>>> call to
>>>>>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>>>>>> calls
>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent
>>>>>>>> device
>>>>>>>> is attached later after data has already been written. Does that
>>>>>>>> sequence need supporting?
>>>>>>>
>>>>>>> Yes, but also I think there are cases where CPU access can happen
>>>>>>> before
>>>>>>> in Android, but I will focus on later for now.
>>>>>>>
>>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>>> memory until map_dma_buf() time, and that should only happen
>>>>>>>> after all
>>>>>>>> the devices have attached so it can know where to put the
>>>>>>>> buffer. So we
>>>>>>>> shouldn't expect any CPU access to buffers before all the
>>>>>>>> devices are
>>>>>>>> attached and mapped, right?
>>>>>>>>
>>>>>>>
>>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>>
>>>>>>> Camera device records video -> software post processing -> video
>>>>>>> device
>>>>>>> (who does compression of raw data) and writes to a file
>>>>>>>
>>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>>> IO-coherent (quite common).
>>>>>>>
>>>>>>
>>>>>> This is the start of the problem, having cached mappings of memory
>>>>>> that
>>>>>> is also being accessed non-coherently is going to cause issues one
>>>>>> way
>>>>>> or another. On top of the speculative cache fills that have to be
>>>>>> constantly fought back against with CMOs like below; some coherent
>>>>>> interconnects behave badly when you mix coherent and non-coherent
>>>>>> access
>>>>>> (snoop filters get messed up).
>>>>>>
>>>>>> The solution is to either always have the addresses marked
>>>>>> non-coherent
>>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>>> regular system memory allocated at runtime, then all cached
>>>>>> mappings of
>>>>>> it need to be dropped, even the kernel logical address (area as
>>>>>> painful
>>>>>> as that would be).
>>>>>>
>>>>>
>>>>> I agree it's broken, hence my desire to remove it :)
>>>>>
>>>>> The other problem is that uncached buffers are being used for
>>>>> performance reason so anything that would involve getting
>>>>> rid of the logical address would probably negate any performance
>>>>> benefit.
>>>>>
>>>>
>>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>>>> adamant that they have valid uses. I'm just not sure performance is one
>>>> of them, maybe in the case of software locks between devices or
>>>> something where there needs to be a lot of back and forth interleaved
>>>> access on small amounts of data?
>>>>
>>>
>>> I wasn't aware that ARM considered this not supported, I thought it was
>>> supported but they advised against it because of the potential
>>> performance
>>> impact.
>>>
>>
>> Not sure what you mean by "this" being not supported, do you mean mixed
>> attribute mappings? If so, it will certainly cause problems, and the
>> problems will change from platform to platform, avoid at all costs is my
>> understanding of ARM's position.
>>
>>> This is after all supported in the DMA APIs and up until now devices
>>> have
>>> been successfully commercializing with this configurations, and I think
>>> they will continue to commercialize with these configurations for
>>> quite a
>>> while.
>>>
>>
>> Use of uncached memory mappings are almost always wrong in my experience
>> and are used to work around some bug or because the user doesn't want to
>> implement proper CMOs. Counter examples welcome.
>>
>>> It would be really unfortunate if support was removed as I think that
>>> would drive clients away from using upstream ION.
>>>
>>
>> I'm not petitioning to remove support, but at very least lets reverse
>> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
>> default, to get uncached you should need to add a flag to your
>> allocation command pointing out you know what you are doing.
>>
>
> I thought about doing that, the problem is it becomes an ABI break for
> existing users which I really didn't want to do again. If it
> ends up being the last thing we do before moving out of staging,
> I'd consider doing it.
>
>>>>>>> ION buffer is allocated.
>>>>>>>
>>>>>>> //Camera device records video
>>>>>>> dma_buf_attach
>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>
>>>>>> Why does the buffer need to be cleaned here? I just got through
>>>>>> reading
>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>>> suggestion of tracking if the buffer has had CPU access since the
>>>>>> last
>>>>>> time and only flushing the cache if it has. As unmapped heaps
>>>>>> never get
>>>>>> CPU mapped this would never be the case for unmapped heaps, it
>>>>>> solves my
>>>>>> problem.
>>>>>>
>>>>>>> [camera device writes to buffer]
>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>>
>>>>>> It doesn't know there will be any further CPU access, it could get
>>>>>> freed
>>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>>> requests access again.
>>>>>>
>>>>>>> dma_buf_detach  (device cannot stay attached because it is being
>>>>>>> sent
>>>>>>> down
>>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>>
>>>>>>
>>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>>> everything as modular as possible and separate the steps, but at this
>>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>>> de-allocate the backing storage if it wants, that way it could get
>>>>>> ready
>>>>>> for the next attachment, which may change the required backing memory
>>>>>> completely.
>>>>>>
>>>>>> All devices should attach before the first mapping, and only let go
>>>>>> after the task is complete, otherwise this buffers data needs
>>>>>> copied off
>>>>>> to a different location or the CPU needs to take ownership
>>>>>> in-between.
>>>>>>
>>>>>
>>>>> Maybe it's broken but it's the status quo and we spent a good
>>>>> amount of time at plumbers concluding there isn't a great way
>>>>> to fix it :/
>>>>>
>>>>
>>>> Hmm, guess that doesn't prove there is not a great way to fix it
>>>> either.. :/
>>>>
>>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
>>>> I have a good solution either, I just don't see any way forward without
>>>> some use-case getting broken, so better to fix now over later.
>>>>
>>>
>>> I can see the benefits of Android doing things the way they do, I would
>>> request that changes we make continue to support Android, or we find
>>> a way
>>> to convice them to change, as they are the main ION client and I assume
>>> other ION clients in the future will want to do this as well.
>>>
>>
>> Android may be the biggest user today (makes sense, Ion come out of the
>> Android project), but that can change, and getting changes into Android
>> will be easier that the upstream kernel once Ion is out of staging.
>>
>> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
>> chips targeting Android, our core offerings target more traditional
>> Linux userspaces, and I'm guessing others will start to do the same as
>> ARM tries to push more into desktop, server, and other spaces again.
>>
>>> I am concerned that if you go with a solution which enforces what you
>>> mention above, and bring ION out of staging that way, it will make it
>>> that
>>> much harder to solve this for Android and therefore harder to get
>>> Android clients to move to the upstream ION (and get everybody off their
>>> vendor modified Android versions).
>>>
>>
>> That would be an Android problem, reducing functionality in upstream to
>> match what some evil vendor trees do to support Android is not the way
>> forward on this. At least for us we are going to try to make all our
>> software offerings follow proper buffer ownership (including our Android
>> offering).
>>
>
> I don't think this is reducing functionality, it's about not breaking
> what already works. There is some level of Android testing on a mainline
> tree (hikey boards). I would say if we can come to an agreement on
> a correct API, we could always merge the 'correct' version out of
> staging and keep a legacy driver around for some time as a transition.
>

I'm not sure that is what staging should be for, but I can certainly see
why you would want that (I help maintain our Android offering and every
kernel migration I get to go fixup libion and all its users..).

I'm sure we all know the API will get broken to get this out of staging,
so maybe we need to start a list (or update the TODO) with all the
things we agree need changed during the last step before destaging.
Sounds like you agree about the ION_FLAG_CACHED reversal for starters. I
think direct heap managed dma_buf_ops will be needed.

What's left, do we have any current proposals for the heap query
floating around that can go up for review?

Thanks,
Andrew

> Thanks,
> Laura

2019-01-18 20:48:54

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/18/19 12:43 PM, Andrew F. Davis wrote:
> On 1/18/19 2:31 PM, Laura Abbott wrote:
>> On 1/17/19 8:13 AM, Andrew F. Davis wrote:
>>> On 1/16/19 4:48 PM, Liam Mark wrote:
>>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>>
>>>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>
>>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>>
>>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>>>>>>> here.
>>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>>>>>>> anyway.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>>    drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table
>>>>>>>>>>> *ion_map_dma_buf(struct
>>>>>>>>>>> dma_buf_attachment *attachment,
>>>>>>>>>>>          table = a->table;
>>>>>>>>>>>    -    if (!dma_map_sg(attachment->dev, table->sgl,
>>>>>>>>>>> table->nents,
>>>>>>>>>>> -            direction))
>>>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl,
>>>>>>>>>>> table->nents,
>>>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>>>
>>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>>>>>>> maintenance.
>>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the
>>>>>>>>>> call to
>>>>>>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>>>>>>> calls
>>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent
>>>>>>>>> device
>>>>>>>>> is attached later after data has already been written. Does that
>>>>>>>>> sequence need supporting?
>>>>>>>>
>>>>>>>> Yes, but also I think there are cases where CPU access can happen
>>>>>>>> before
>>>>>>>> in Android, but I will focus on later for now.
>>>>>>>>
>>>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>>>> memory until map_dma_buf() time, and that should only happen
>>>>>>>>> after all
>>>>>>>>> the devices have attached so it can know where to put the
>>>>>>>>> buffer. So we
>>>>>>>>> shouldn't expect any CPU access to buffers before all the
>>>>>>>>> devices are
>>>>>>>>> attached and mapped, right?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>>>
>>>>>>>> Camera device records video -> software post processing -> video
>>>>>>>> device
>>>>>>>> (who does compression of raw data) and writes to a file
>>>>>>>>
>>>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>>>> IO-coherent (quite common).
>>>>>>>>
>>>>>>>
>>>>>>> This is the start of the problem, having cached mappings of memory
>>>>>>> that
>>>>>>> is also being accessed non-coherently is going to cause issues one
>>>>>>> way
>>>>>>> or another. On top of the speculative cache fills that have to be
>>>>>>> constantly fought back against with CMOs like below; some coherent
>>>>>>> interconnects behave badly when you mix coherent and non-coherent
>>>>>>> access
>>>>>>> (snoop filters get messed up).
>>>>>>>
>>>>>>> The solution is to either always have the addresses marked
>>>>>>> non-coherent
>>>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>>>> regular system memory allocated at runtime, then all cached
>>>>>>> mappings of
>>>>>>> it need to be dropped, even the kernel logical address (area as
>>>>>>> painful
>>>>>>> as that would be).
>>>>>>>
>>>>>>
>>>>>> I agree it's broken, hence my desire to remove it :)
>>>>>>
>>>>>> The other problem is that uncached buffers are being used for
>>>>>> performance reason so anything that would involve getting
>>>>>> rid of the logical address would probably negate any performance
>>>>>> benefit.
>>>>>>
>>>>>
>>>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>>>>> adamant that they have valid uses. I'm just not sure performance is one
>>>>> of them, maybe in the case of software locks between devices or
>>>>> something where there needs to be a lot of back and forth interleaved
>>>>> access on small amounts of data?
>>>>>
>>>>
>>>> I wasn't aware that ARM considered this not supported, I thought it was
>>>> supported but they advised against it because of the potential
>>>> performance
>>>> impact.
>>>>
>>>
>>> Not sure what you mean by "this" being not supported, do you mean mixed
>>> attribute mappings? If so, it will certainly cause problems, and the
>>> problems will change from platform to platform, avoid at all costs is my
>>> understanding of ARM's position.
>>>
>>>> This is after all supported in the DMA APIs and up until now devices
>>>> have
>>>> been successfully commercializing with this configurations, and I think
>>>> they will continue to commercialize with these configurations for
>>>> quite a
>>>> while.
>>>>
>>>
>>> Use of uncached memory mappings are almost always wrong in my experience
>>> and are used to work around some bug or because the user doesn't want to
>>> implement proper CMOs. Counter examples welcome.
>>>
>>>> It would be really unfortunate if support was removed as I think that
>>>> would drive clients away from using upstream ION.
>>>>
>>>
>>> I'm not petitioning to remove support, but at very least lets reverse
>>> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
>>> default, to get uncached you should need to add a flag to your
>>> allocation command pointing out you know what you are doing.
>>>
>>
>> I thought about doing that, the problem is it becomes an ABI break for
>> existing users which I really didn't want to do again. If it
>> ends up being the last thing we do before moving out of staging,
>> I'd consider doing it.
>>
>>>>>>>> ION buffer is allocated.
>>>>>>>>
>>>>>>>> //Camera device records video
>>>>>>>> dma_buf_attach
>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>
>>>>>>> Why does the buffer need to be cleaned here? I just got through
>>>>>>> reading
>>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>>>> suggestion of tracking if the buffer has had CPU access since the
>>>>>>> last
>>>>>>> time and only flushing the cache if it has. As unmapped heaps
>>>>>>> never get
>>>>>>> CPU mapped this would never be the case for unmapped heaps, it
>>>>>>> solves my
>>>>>>> problem.
>>>>>>>
>>>>>>>> [camera device writes to buffer]
>>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>>>
>>>>>>> It doesn't know there will be any further CPU access, it could get
>>>>>>> freed
>>>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>>>> requests access again.
>>>>>>>
>>>>>>>> dma_buf_detach  (device cannot stay attached because it is being
>>>>>>>> sent
>>>>>>>> down
>>>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>>>
>>>>>>>
>>>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>>>> everything as modular as possible and separate the steps, but at this
>>>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>>>> de-allocate the backing storage if it wants, that way it could get
>>>>>>> ready
>>>>>>> for the next attachment, which may change the required backing memory
>>>>>>> completely.
>>>>>>>
>>>>>>> All devices should attach before the first mapping, and only let go
>>>>>>> after the task is complete, otherwise this buffers data needs
>>>>>>> copied off
>>>>>>> to a different location or the CPU needs to take ownership
>>>>>>> in-between.
>>>>>>>
>>>>>>
>>>>>> Maybe it's broken but it's the status quo and we spent a good
>>>>>> amount of time at plumbers concluding there isn't a great way
>>>>>> to fix it :/
>>>>>>
>>>>>
>>>>> Hmm, guess that doesn't prove there is not a great way to fix it
>>>>> either.. :/
>>>>>
>>>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
>>>>> I have a good solution either, I just don't see any way forward without
>>>>> some use-case getting broken, so better to fix now over later.
>>>>>
>>>>
>>>> I can see the benefits of Android doing things the way they do, I would
>>>> request that changes we make continue to support Android, or we find
>>>> a way
>>>> to convice them to change, as they are the main ION client and I assume
>>>> other ION clients in the future will want to do this as well.
>>>>
>>>
>>> Android may be the biggest user today (makes sense, Ion come out of the
>>> Android project), but that can change, and getting changes into Android
>>> will be easier that the upstream kernel once Ion is out of staging.
>>>
>>> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
>>> chips targeting Android, our core offerings target more traditional
>>> Linux userspaces, and I'm guessing others will start to do the same as
>>> ARM tries to push more into desktop, server, and other spaces again.
>>>
>>>> I am concerned that if you go with a solution which enforces what you
>>>> mention above, and bring ION out of staging that way, it will make it
>>>> that
>>>> much harder to solve this for Android and therefore harder to get
>>>> Android clients to move to the upstream ION (and get everybody off their
>>>> vendor modified Android versions).
>>>>
>>>
>>> That would be an Android problem, reducing functionality in upstream to
>>> match what some evil vendor trees do to support Android is not the way
>>> forward on this. At least for us we are going to try to make all our
>>> software offerings follow proper buffer ownership (including our Android
>>> offering).
>>>
>>
>> I don't think this is reducing functionality, it's about not breaking
>> what already works. There is some level of Android testing on a mainline
>> tree (hikey boards). I would say if we can come to an agreement on
>> a correct API, we could always merge the 'correct' version out of
>> staging and keep a legacy driver around for some time as a transition.
>>
>
> I'm not sure that is what staging should be for, but I can certainly see
> why you would want that (I help maintain our Android offering and every
> kernel migration I get to go fixup libion and all its users..).
>
> I'm sure we all know the API will get broken to get this out of staging,
> so maybe we need to start a list (or update the TODO) with all the
> things we agree need changed during the last step before destaging.
> Sounds like you agree about the ION_FLAG_CACHED reversal for starters. I
> think direct heap managed dma_buf_ops will be needed.
>
> What's left, do we have any current proposals for the heap query
> floating around that can go up for review?
>

I was hoping the last time I broke the API would be the last time
we would need to break it again. The query ioctl is already merged
and I haven't seen any other counter proposals around for discussion.
The TODO list could probably use some updating though.

> Thanks,
> Andrew
>
>> Thanks,
>> Laura


2019-01-18 21:46:33

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Fri, 18 Jan 2019, Andrew F. Davis wrote:

> On 1/17/19 7:04 PM, Liam Mark wrote:
> > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/16/19 4:48 PM, Liam Mark wrote:
> >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
> >>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> >>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>
> >>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>>
> >>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
> >>>>>>>>>> here.
> >>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
> >>>>>>>>>> anyway.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>>>>> ---
> >>>>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
> >>>>>>>>>> b/drivers/staging/android/ion/ion.c
> >>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
> >>>>>>>>>> dma_buf_attachment *attachment,
> >>>>>>>>>>         table = a->table;
> >>>>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>> -            direction))
> >>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>>>>
> >>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
> >>>>>>>>> maintenance.
> >>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>>>>> dma_buf_attach then there won't have been a device attached so the
> >>>>>>>>> calls
> >>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> That should be okay though, if you have no attachments (or all
> >>>>>>>> attachments are IO-coherent) then there is no need for cache
> >>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>>>>> is attached later after data has already been written. Does that
> >>>>>>>> sequence need supporting?
> >>>>>>>
> >>>>>>> Yes, but also I think there are cases where CPU access can happen before
> >>>>>>> in Android, but I will focus on later for now.
> >>>>>>>
> >>>>>>>> DMA-BUF doesn't have to allocate the backing
> >>>>>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>>>>> attached and mapped, right?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Here is an example where CPU access can happen later in Android.
> >>>>>>>
> >>>>>>> Camera device records video -> software post processing -> video device
> >>>>>>> (who does compression of raw data) and writes to a file
> >>>>>>>
> >>>>>>> In this example assume the buffer is cached and the devices are not
> >>>>>>> IO-coherent (quite common).
> >>>>>>>
> >>>>>>
> >>>>>> This is the start of the problem, having cached mappings of memory that
> >>>>>> is also being accessed non-coherently is going to cause issues one way
> >>>>>> or another. On top of the speculative cache fills that have to be
> >>>>>> constantly fought back against with CMOs like below; some coherent
> >>>>>> interconnects behave badly when you mix coherent and non-coherent access
> >>>>>> (snoop filters get messed up).
> >>>>>>
> >>>>>> The solution is to either always have the addresses marked non-coherent
> >>>>>> (like device memory, no-map carveouts), or if you really want to use
> >>>>>> regular system memory allocated at runtime, then all cached mappings of
> >>>>>> it need to be dropped, even the kernel logical address (area as painful
> >>>>>> as that would be).
> >>>>>>
> >>>>>
> >>>>> I agree it's broken, hence my desire to remove it :)
> >>>>>
> >>>>> The other problem is that uncached buffers are being used for
> >>>>> performance reason so anything that would involve getting
> >>>>> rid of the logical address would probably negate any performance
> >>>>> benefit.
> >>>>>
> >>>>
> >>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> >>>> adamant that they have valid uses. I'm just not sure performance is one
> >>>> of them, maybe in the case of software locks between devices or
> >>>> something where there needs to be a lot of back and forth interleaved
> >>>> access on small amounts of data?
> >>>>
> >>>
> >>> I wasn't aware that ARM considered this not supported, I thought it was
> >>> supported but they advised against it because of the potential performance
> >>> impact.
> >>>
> >>
> >> Not sure what you mean by "this" being not supported, do you mean mixed
> >> attribute mappings? If so, it will certainly cause problems, and the
> >> problems will change from platform to platform, avoid at all costs is my
> >> understanding of ARM's position.
> >>
> >>> This is after all supported in the DMA APIs and up until now devices have
> >>> been successfully commercializing with this configurations, and I think
> >>> they will continue to commercialize with these configurations for quite a
> >>> while.
> >>>
> >>
> >> Use of uncached memory mappings are almost always wrong in my experience
> >> and are used to work around some bug or because the user doesn't want to
> >> implement proper CMOs. Counter examples welcome.
> >>
> >
> > Okay, let me first try to clarify what I am referring to, as perhaps I am
> > misunderstanding the conversation.
> >
> > In this discussion I was originally referring to a use case with cached
> > memory being accessed by a non io-cohernet device.
> >
> > "In this example assume the buffer is cached and the devices are not
> > IO-coherent (quite common)."
> >
> > to which you did not think was supported:
> >
> > "This is the start of the problem, having cached mappings of memory
> > that is also being accessed non-coherently is going to cause issues
> > one way or another.
> > "
> >
> > And I interpreted Laura's comment below as saying she wanted to remove
> > support in ION for cached memory being accessed by non io-cohernet
> > devices:
> > "I agree it's broken, hence my desire to remove it :)"
> >
> > So assuming my understanding above is correct (and you are not talking
> > about something separate such as removing uncached ION allocation
> > support).
> >
>
> Ah, I think here is where we diverged, I'm assuming Laura's comment to
> be referencing my issue with uncached mappings being handed out without
> first removing all cached mappings of the same memory. Therefore it is
> uncached heaps that are broken.
>

I am glad that is clarified, but can you then clarify for me your
following statement, I am stil not clear what the problem is then.

In response to:
"In this example assume the buffer is cached and the devices are
not IO-coherent (quite common)."

You said:

"This is the start of the problem, having cached mappings of memory that
is also being accessed non-coherently is going to cause issues one way or
another. "

> > Then I guess I am not clear why current uses which use cached memory with
> > non IO-coherent devices are considered to be working around some bug or
> > are not implementing proper CMOs.
> >
> > They use CPU cached mappings because that is the most effective way to
> > access the memory from the CPU side and the devices have an uncached
> > IOMMU mapping because they don't support IO-coherency, and currenlty in
> > the CPU they do cache mainteance at the time of dma map and dma umap so
> > to me they are implementing correct CMOs.
> >
>
> Fully agree here, using cached mappings and performing CMOs when needed
> is the way to go when dealing with memory. IMHO the *only* time when
> uncached mappings are appropriate is for memory mapped I/O (although it
> looks like video memory was often treated as uncached (wc)).
>
> >>> It would be really unfortunate if support was removed as I think that
> >>> would drive clients away from using upstream ION.
> >>>
> >>
> >> I'm not petitioning to remove support, but at very least lets reverse
> >> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
> >> default, to get uncached you should need to add a flag to your
> >> allocation command pointing out you know what you are doing.
> >>
> >
> > You may not be petitioning to remove support for using cached memory with
> > non io-coherent devices but I interpreted Laura's comment as wanting to do
> > so, and I had concerns about that.
> >
>
> What I would like is for the default memory handed out by Ion to be
> normal cacheable memory, just like is always handed out to users-space.
> DMA-BUF already provides the means to deal with the CMOs required to
> work with non-io-coherent devices so all should be good here.
>
> If you want Ion to give out uncached memory then I think you should need
> to explicitly state so with an allocation flag. And right now the
> uncached memory you will get back may have other cached mappings (kernel
> lowmem mappings) meaning you will have hard to predict results (on ARM
> at least).

Yes, I can understand why it would make sense to default to cached memory.

> I just don't see much use for them (uncached mappings of
> regular memory) right now.
>

I can understand why people don't like ION providing uncached support, but
I have been pushing to keep un-cached support in order to keep the
performance of Android ION clients on par with the previous version of ION
until a better solution can be found (either by changing ION or Android).

Basically most ION use cases don't involve any (or very little) CPU access
so for now by using uncached ION allocations we can avoid the uncessary
cache maitenance and we are safe if some CPU access is required. Of course
for use cases involving a lot of CPU access the clients switch over to
using cached buffers.


> >>>>>>> ION buffer is allocated.
> >>>>>>>
> >>>>>>> //Camera device records video
> >>>>>>> dma_buf_attach
> >>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>
> >>>>>> Why does the buffer need to be cleaned here? I just got through reading
> >>>>>> the thread linked by Laura in the other reply. I do like +Brian's
> >>>>>> suggestion of tracking if the buffer has had CPU access since the last
> >>>>>> time and only flushing the cache if it has. As unmapped heaps never get
> >>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>>>>> problem.
> >>>>>>
> >>>>>>> [camera device writes to buffer]
> >>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>>>>
> >>>>>> It doesn't know there will be any further CPU access, it could get freed
> >>>>>> after this for all we know, the invalidate can be saved until the CPU
> >>>>>> requests access again.
> >>>>>>
> >>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>>>>>> down
> >>>>>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>>>>
> >>>>>>
> >>>>>> This seems like a broken use-case, I understand the desire to keep
> >>>>>> everything as modular as possible and separate the steps, but at this
> >>>>>> point no one owns this buffers backing memory, not the CPU or any
> >>>>>> device. I would go as far as to say DMA-BUF should be free now to
> >>>>>> de-allocate the backing storage if it wants, that way it could get ready
> >>>>>> for the next attachment, which may change the required backing memory
> >>>>>> completely.
> >>>>>>
> >>>>>> All devices should attach before the first mapping, and only let go
> >>>>>> after the task is complete, otherwise this buffers data needs copied off
> >>>>>> to a different location or the CPU needs to take ownership in-between.
> >>>>>>
> >>>>>
> >>>>> Maybe it's broken but it's the status quo and we spent a good
> >>>>> amount of time at plumbers concluding there isn't a great way
> >>>>> to fix it :/
> >>>>>
> >>>>
> >>>> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
> >>>>
> >>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
> >>>> I have a good solution either, I just don't see any way forward without
> >>>> some use-case getting broken, so better to fix now over later.
> >>>>
> >>>
> >>> I can see the benefits of Android doing things the way they do, I would
> >>> request that changes we make continue to support Android, or we find a way
> >>> to convice them to change, as they are the main ION client and I assume
> >>> other ION clients in the future will want to do this as well.
> >>>
> >>
> >> Android may be the biggest user today (makes sense, Ion come out of the
> >> Android project), but that can change, and getting changes into Android
> >> will be easier that the upstream kernel once Ion is out of staging.
> >>
> >> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
> >> chips targeting Android, our core offerings target more traditional
> >> Linux userspaces, and I'm guessing others will start to do the same as
> >> ARM tries to push more into desktop, server, and other spaces again.
> >>
> >>> I am concerned that if you go with a solution which enforces what you
> >>> mention above, and bring ION out of staging that way, it will make it that
> >>> much harder to solve this for Android and therefore harder to get
> >>> Android clients to move to the upstream ION (and get everybody off their
> >>> vendor modified Android versions).
> >>>
> >>
> >> That would be an Android problem, reducing functionality in upstream to
> >> match what some evil vendor trees do to support Android is not the way
> >> forward on this. At least for us we are going to try to make all our
> >> software offerings follow proper buffer ownership (including our Android
> >> offering).
> >>
> >>>>>>> //buffer is send down the pipeline
> >>>>>>>
> >>>>>>> // Usersapce software post processing occurs
> >>>>>>> mmap buffer
> >>>>>>
> >>>>>> Perhaps the invalidate should happen here in mmap.
> >>>>>>
> >>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>>>> devices attached to buffer
> >>>>>>
> >>>>>> And that should be okay, mmap does the sync, and if no devices are
> >>>>>> attached nothing could have changed the underlying memory in the
> >>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>>>>
> >>>>>>> [CPU reads/writes to the buffer]
> >>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>>>> devices attached to buffer
> >>>>>>> munmap buffer
> >>>>>>>
> >>>>>>> //buffer is send down the pipeline
> >>>>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>>>> writes to a file
> >>>>>>> dma_buf_attach
> >>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>> [video device writes to buffer]
> >>>>>>> dma_buf_unmap_attachment
> >>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>>>>>> down
> >>>>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not
> >>>>>>>>> doing CPU
> >>>>>>>>> access then there is no requirement (that I am aware of) for you to
> >>>>>>>>> call
> >>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and
> >>>>>>>>> if this
> >>>>>>>>> buffer is cached and your device is not IO-coherent then the cache
> >>>>>>>>> maintenance
> >>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>>>> maintenance on the buffer?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Because ION no longer provides DMA ready memory.
> >>>>>>> Take the above example.
> >>>>>>>
> >>>>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>>>> Zeros are written to the cache.
> >>>>>>>
> >>>>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>>>> The camera devices writes directly to the buffer in DDR.
> >>>>>>> Since you didn't clean the buffer a dirty cache line (one of the
> >>>>>>> zeros) is
> >>>>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>>>> written which corrupts your data.
> >>>>>>>
> >>>>>>
> >>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>>>> for CPU access at the time of zeroing.
> >>>>>>
> >>>>>> Andrew
> >>>>>>
> >>>>>>> Liam
> >>>>>>>
> >>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>>>> a Linux Foundation Collaborative Project
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> a Linux Foundation Collaborative Project
> >>>
> >>
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-19 10:13:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Fri, Jan 18, 2019 at 12:31:34PM -0800, Laura Abbott wrote:
> I thought about doing that, the problem is it becomes an ABI break for
> existing users which I really didn't want to do again. If it
> ends up being the last thing we do before moving out of staging,
> I'd consider doing it.

This is staging code, so any existing users by defintion do not matter.
Let's not drag Linux development down over this.

2019-01-21 11:24:36

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

Hi,

Sorry for being a bit sporadic on this. I was out travelling last week
with little time for email.

On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
> On 1/17/19 7:11 PM, Liam Mark wrote:
> > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/16/19 4:54 PM, Liam Mark wrote:
> >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/16/19 9:19 AM, Brian Starkey wrote:
> >>>>> Hi :-)
> >>>>>
> >>>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >>>>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>
> >>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>>>>>> ---
> >>>>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> >>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>>>>>>>>>
> >>>>>>>>>>> table = a->table;
> >>>>>>>>>>>
> >>>>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>>> - direction))
> >>>>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> >>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
> >>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That should be okay though, if you have no attachments (or all
> >>>>>>>>> attachments are IO-coherent) then there is no need for cache
> >>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>>>>>> is attached later after data has already been written. Does that
> >>>>>>>>> sequence need supporting?
> >>>>>>>>
> >>>>>>>> Yes, but also I think there are cases where CPU access can happen before
> >>>>>>>> in Android, but I will focus on later for now.
> >>>>>>>>
> >>>>>>>>> DMA-BUF doesn't have to allocate the backing
> >>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>>>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>>>>>> attached and mapped, right?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Here is an example where CPU access can happen later in Android.
> >>>>>>>>
> >>>>>>>> Camera device records video -> software post processing -> video device
> >>>>>>>> (who does compression of raw data) and writes to a file
> >>>>>>>>
> >>>>>>>> In this example assume the buffer is cached and the devices are not
> >>>>>>>> IO-coherent (quite common).
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is the start of the problem, having cached mappings of memory that
> >>>>>>> is also being accessed non-coherently is going to cause issues one way
> >>>>>>> or another. On top of the speculative cache fills that have to be
> >>>>>>> constantly fought back against with CMOs like below; some coherent
> >>>>>>> interconnects behave badly when you mix coherent and non-coherent access
> >>>>>>> (snoop filters get messed up).
> >>>>>>>
> >>>>>>> The solution is to either always have the addresses marked non-coherent
> >>>>>>> (like device memory, no-map carveouts), or if you really want to use
> >>>>>>> regular system memory allocated at runtime, then all cached mappings of
> >>>>>>> it need to be dropped, even the kernel logical address (area as painful
> >>>>>>> as that would be).
> >>>>>
> >>>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
> >>>>> "real" is that? It seems that we aren't really hitting that today on
> >>>>> real devices.
> >>>>>
> >>>>
> >>>> Sadly there is at least one real device like this now (TI AM654). We
> >>>> spent some time working with the ARM interconnect spec designers to see
> >>>> if this was allowed behavior, final conclusion was mixing coherent and
> >>>> non-coherent accesses is never a good idea.. So we have been working to
> >>>> try to minimize any cases of mixed attributes [0], if a region is
> >>>> coherent then everyone in the system needs to treat it as such and
> >>>> vice-versa, even clever CMO that work on other systems wont save you
> >>>> here. :(
> >>>>
> >>>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
> >>>>

"Never a good idea" - but I think it should still be well defined by
the ARMv8 ARM (Section B2.8). Does this apply to your system?

"If the mismatched attributes for a memory location all assign the
same shareability attribute to a Location that has a cacheable
attribute, any loss of uniprocessor semantics, ordering, or coherency
within a shareability domain can be avoided by use of software cache
management"

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

If the cache is invalidated when switching between access types,
shouldn't the snoop filters get un-messed-up?

> >>>>
> >>>>>>>
> >>>>>>>> ION buffer is allocated.
> >>>>>>>>
> >>>>>>>> //Camera device records video
> >>>>>>>> dma_buf_attach
> >>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>
> >>>>>>> Why does the buffer need to be cleaned here? I just got through reading
> >>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
> >>>>>>
> >>>>>> Actually +Brian this time :)
> >>>>>>
> >>>>>>> suggestion of tracking if the buffer has had CPU access since the last
> >>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
> >>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>>>>>> problem.
> >>>>>>>
> >>>>>>>> [camera device writes to buffer]
> >>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>>>>>
> >>>>>>> It doesn't know there will be any further CPU access, it could get freed
> >>>>>>> after this for all we know, the invalidate can be saved until the CPU
> >>>>>>> requests access again.
> >>>>>
> >>>>> We don't have any API to allow the invalidate to happen on CPU access
> >>>>> if all devices already detached. We need a struct device pointer to
> >>>>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
> >>>>>
> >>>>> I had a chat with a few people internally after the previous
> >>>>> discussion with Liam. One suggestion was to use
> >>>>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
> >>>>> one other device attached (guarantees that we can do an invalidate in
> >>>>> the future if begin_cpu_access is called). If the last device
> >>>>> detaches, do a sync then.
> >>>>>
> >>>>> Conversely, in map_dma_buf, we would track if there was any CPU access
> >>>>> and use/skip the sync appropriately.
> >>>>>
> >>>>
> >>>> Now that I think this all through I agree this patch is probably wrong.
> >>>> The real fix needs to be better handling in the dma_map_sg() to deal
> >>>> with the case of the memory not being mapped (what I'm dealing with for
> >>>> unmapped heaps), and for cases when the memory in question is not cached
> >>>> (Liam's issue I think). For both these cases the dma_map_sg() does the
> >>>> wrong thing.
> >>>>
> >>>>> I did start poking the code to check out how that would look, but then
> >>>>> Christmas happened and I'm still catching back up.
> >>>>>
> >>>>>>>
> >>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>>>>>
> >>>>>>>
> >>>>>>> This seems like a broken use-case, I understand the desire to keep
> >>>>>>> everything as modular as possible and separate the steps, but at this
> >>>>>>> point no one owns this buffers backing memory, not the CPU or any
> >>>>>>> device. I would go as far as to say DMA-BUF should be free now to
> >>>>>>> de-allocate the backing storage if it wants, that way it could get ready
> >>>>>>> for the next attachment, which may change the required backing memory
> >>>>>>> completely.
> >>>>>>>
> >>>>>>> All devices should attach before the first mapping, and only let go
> >>>>>>> after the task is complete, otherwise this buffers data needs copied off
> >>>>>>> to a different location or the CPU needs to take ownership in-between.
> >>>>>>>
> >>>>>
> >>>>> Yeah.. that's certainly the theory. Are there any DMA-BUF
> >>>>> implementations which actually do that? I hear it quoted a lot,
> >>>>> because that's what the docs say - but if the reality doesn't match
> >>>>> it, maybe we should change the docs.
> >>>>>
> >>>>
> >>>> Do you mean on the userspace side? I'm not sure, seems like Android
> >>>> might be doing this wrong from what I can gather. From kernel side if
> >>>> you mean the "de-allocate the backing storage", we will have some cases
> >>>> like this soon, so I want to make sure userspace is not abusing DMA-BUF
> >>>> in ways not specified in the documentation. Changing the docs to force
> >>>> the backing memory to always be allocated breaks the central goal in
> >>>> having attach/map in DMA-BUF separate.

Actually I meant in the kernel, in exporters. I haven't seen anyone
using the API as it was intended (defer allocation until first map,
migrate between different attachments, etc.). Mostly, backing storage
seems to get allocated at the point of export, and device mappings are
often held persistently (e.g. the DRM prime code maps the buffer at
import time, and keeps it mapped: drm_gem_prime_import_dev).

I wasn't aware that CPU access before first device access was
considered an abuse of the API - it seems like a valid thing to want
to do.

> >>>>
> >>>>>>>> //buffer is send down the pipeline
> >>>>>>>>
> >>>>>>>> // Usersapce software post processing occurs
> >>>>>>>> mmap buffer
> >>>>>>>
> >>>>>>> Perhaps the invalidate should happen here in mmap.
> >>>>>>>
> >>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>>>>> devices attached to buffer
> >>>>>>>
> >>>>>>> And that should be okay, mmap does the sync, and if no devices are
> >>>>>>> attached nothing could have changed the underlying memory in the
> >>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>>>
> >>>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> >>>>> Liam was saying that it's too painful for them to do that every time a
> >>>>> device unmaps - when in many cases (device->device, no CPU) it's not
> >>>>> needed.
> >>>>
> >>>> Invalidates are painless, at least compared to a real cache flush, just
> >>>> set the invalid bit vs actually writing out lines. I thought the issue
> >>>> was on the map side.
> >>>>
> >>>
> >>> Invalidates aren't painless for us because we have a coherent system cache
> >>> so clean lines get written out.
> >>
> >> That seems very broken, why would clean lines ever need to be written
> >> out, that defeats the whole point of having the invalidate separate from
> >> clean. How do you deal with stale cache lines? I guess in your case this
> >> is what forces you to have to use uncached memory for DMA-able memory.
> >>
> >
> > My understanding is that our ARM invalidate is a clean + invalidate, I had
> > concerns about the clean lines being written to the the system cache as
> > part of the 'clean', but the following 'invalidate' would take care of
> > actually invalidating the lines (so nothign broken).
> > But i am probably wrong on this and it is probably smart enough not to the
> > writing of the clean lines.
> >
>
> You are correct that for a lot of ARM cores "invalidate" is always a
> "clean + invalidate". At first I thought this was kinda silly as there
> is now no way to mark a dirty line invalid without it getting written
> out first, but if you think about it any dirty cache-line can be written
> out (cleaned) at anytime anyway, so this doesn't actually change system
> behavior. You should just not write to memory (make the line dirty)
> anything you don't want eventually written out.
>
> Point two, it's not just smart enough to not write-out clean lines, it
> is guaranteed not to write them out by the spec. Otherwise since
> cache-lines can be randomly filled if those same clean lines got written
> out on invalidate operations there would be no way to maintain coherency
> and things would be written over top each other all over the place.
>
> > But regardless, targets supporting a coherent system cache is a legitamate
> > configuration and an invalidate on this configuration does have to go to
> > the bus to invalidate the system cache (which isn't free) so I dont' think
> > you can make the assumption that invalidates are cheap so that it is okay
> > to do them (even if they are not needed) on every dma unmap.
> >
>
> Very true, CMOs need to be broadcast to other coherent masters on a
> coherent interconnect (and the interconnect itself if it has a cache as
> well (L3)), so not 100% free, but almost, just the infinitesimal cost of
> the cache tag check in hardware. If there are no non-coherent devices
> attached then the CMOs are no-ops, if there are then the data needs to
> be written out either way, doing it every access like is done with
> uncached memory (- any write combining) will blow away any saving made
> from the one less CMO. Either way you lose with uncached mappings of
> memory. If I'm wrong I would love to know.
>

From what I understand, the current DMA APIs are not equipped to
handle having coherent and non-coherent devices attached at the same
time. The buffer is either in "CPU land" or "Device land", there's no
smaller granule of "Coherent Device land" or "Non-Coherent Device
land".

I think if there's devices which are making coherent accesses, and
devices which are making non-coherent accesses, then we can't support
them being attached at the same time without some enhancements to the
APIs.

> >>> And these invalidates can occur on fairly large buffers.
> >>>
> >>> That is why we haven't went with using cached ION memory and "tracking CPU
> >>> access" because it only solves half the problem, ie there isn't a way to
> >>> safely skip the invalidate (because we can't read the future).
> >>> Our solution was to go with uncached ION memory (when possible), but as
> >>> you can see in other discussions upstream support for uncached memory has
> >>> its own issues.
> >>>

@Liam, in your problematic use-cases, are both devices detached when
the buffer moves between them?

1) dev 1 map, access, unmap
2) dev 1 detach
3) (maybe) CPU access
4) dev 2 attach
5) dev 2 map, access

I still think a pretty pragmatic solution is to use
DMA_ATTR_SKIP_CPU_SYNC until the last device detaches. That won't work
if your access sequence looks like above...

...however, if your sequence looks like above, then you probably need
to keep at least one of the devices attached anyway. Otherwise, per
the API, the buffer could get migrated after 2)/before 5). That will
surely hurt a lot more than an invalidate.

> >>
> >> Sounds like you need to fix upstream support then, finding a way to drop
> >> all cacheable mappings of memory you want to make uncached mappings for
> >> seems to be the only solution.
> >>
> >
> > I think we can probably agree that there woudln't be a good way to remove
> > cached mappings without causing an unacceptable performance degradation
> > since it would fragment all the nice 1GB kernel mappings we have.
> >
> > So I am trying to find an alternative solution.
> >
>
> I'm not sure there is a better solution. How hard is this solution to
> implement anyway? The kernel already has to make gaps and cut up that
> nice 1GB mapping when you make a reserved memory space in the lowmem
> area, so all the logic is probably already implemented. Just need to
> allow it to be hooked into from Ion when doing doing the uncached mappings.
>

I haven't looked recently, but I'm not sure the early memblock code
can be reused as-is at runtime. I seem to remember it makes a bunch of
assumptions about the fact that it's running "early".

If CPU uncached mappings of normal system memory is really the way
forward, I could envisage a heap which maintains a pool of chunks of
memory which it removed from the kernel mapping. The pool could grow
(remove more pages from the kernel mapping)/shrink (add them back to
the kernel mapping) as needed.

John Reitan implemented a compound-page heap, which used compaction to
get a pool of 2MB contiguous pages. Something like that would at least
prevent needing full 4kB granularity when removing things from the
kernel mapping.

Even better, could it somehow be restricted to a region which is
already fragmented? (e.g. the one which was used for the default CMA
heap)

Thanks,
-Brian

> >>>>>
> >>>>>>>
> >>>>>>>> [CPU reads/writes to the buffer]
> >>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>>>>> devices attached to buffer
> >>>>>>>> munmap buffer
> >>>>>>>>
> >>>>>>>> //buffer is send down the pipeline
> >>>>>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>>>>> writes to a file
> >>>>>>>> dma_buf_attach
> >>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>> [video device writes to buffer]
> >>>>>>>> dma_buf_unmap_attachment
> >>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> >>>>>>>>>> access then there is no requirement (that I am aware of) for you to call
> >>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> >>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> >>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>>>>> maintenance on the buffer?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Because ION no longer provides DMA ready memory.
> >>>>>>>> Take the above example.
> >>>>>>>>
> >>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>>>>> Zeros are written to the cache.
> >>>>>>>>
> >>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>>>>> The camera devices writes directly to the buffer in DDR.
> >>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> >>>>>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>>>>> written which corrupts your data.
> >>>>>>>>
> >>>>>>>
> >>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>>>>> for CPU access at the time of zeroing.
> >>>>>>>
> >>>>>
> >>>>> Actually that should be at the point of the first non-coherent device
> >>>>> mapping the buffer right? No point in doing CMO if the future accesses
> >>>>> are coherent.
> >>>>
> >>>> I see your point, as long as the zeroing is guaranteed to be the first
> >>>> access to this buffer then it should be safe.
> >>>>
> >>>> Andrew
> >>>>
> >>>>>
> >>>>> Cheers,
> >>>>> -Brian
> >>>>>
> >>>>>>> Andrew
> >>>>>>>
> >>>>>>>> Liam
> >>>>>>>>
> >>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>>>>> a Linux Foundation Collaborative Project
> >>>>>>>>
> >>>>
> >>>
> >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> a Linux Foundation Collaborative Project
> >>>
> >>
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >

2019-01-21 14:33:10

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 11/14] staging: android: ion: Allow heap name to be null

On 1/18/19 1:53 PM, Laura Abbott wrote:
> On 1/16/19 9:12 AM, Andrew F. Davis wrote:
>> On 1/16/19 9:28 AM, Brian Starkey wrote:
>>> Hi Andrew,
>>>
>>> On Fri, Jan 11, 2019 at 12:05:20PM -0600, Andrew F. Davis wrote:
>>>> The heap name can be used for debugging but otherwise does not seem
>>>> to be required and no other part of the code will fail if left NULL
>>>> except here. We can make it required and check for it at some point,
>>>> for now lets just prevent this from causing a NULL pointer exception.
>>>
>>> I'm not so keen on this one. In the "new" API with heap querying, the
>>> name string is the only way to identify the heap. I think Laura
>>> mentioned at XDC2017 that it was expected that userspace should use
>>> the strings to find the heap they want.
>>>
>>
>> Right now the names are only for debug. I accidentally left the name
>> null once and got a kernel crash. This is the only spot where it is
>> needed so I fixed it up. The other option is to make the name mandatory
>> and properly error out, I don't want to do that right now until the
>> below discussion is had to see if names really do matter or not.
>>
>
> Yes, the heap names are part of the query API and are the expected
> way to identify individual heaps for the API at the moment so having
> a null heap name is incorrect. The heap name seemed like the best way
> to identify heaps to userspace but if you have an alternative proposal
> I'd be interested.
>

Not sure I have a better proposal right now, I'll re-work this patch to
force heap names to be populated before ion_device_add_heap() instead.

(do you think that function name is now is a misnomer? how do you feel
about renaming that to just ion_add_heap()?)

Andrew

> Thanks,
> Laura
>
>>
>
>

2019-01-21 15:00:37

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap

On 1/18/19 2:19 PM, Laura Abbott wrote:
> On 1/16/19 8:05 AM, Andrew F. Davis wrote:
>> On 1/15/19 12:58 PM, Laura Abbott wrote:
>>> On 1/15/19 9:47 AM, Andrew F. Davis wrote:
>>>> On 1/14/19 8:39 PM, Laura Abbott wrote:
>>>>> On 1/11/19 10:05 AM, Andrew F. Davis wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> This is a set of (hopefully) non-controversial cleanups for the ION
>>>>>> framework and current set of heaps. These were found as I start to
>>>>>> familiarize myself with the framework to help in whatever way I
>>>>>> can in getting all this up to the standards needed for de-staging.
>>>>>>
>>>>>> I would like to get some ideas of what is left to work on to get ION
>>>>>> out of staging. Has there been some kind of agreement on what ION
>>>>>> should
>>>>>> eventually end up being? To me it looks like it is being whittled
>>>>>> away at
>>>>>> to it's most core functions. To me that is looking like being a
>>>>>> DMA-BUF
>>>>>> user-space front end, simply advertising available memory backings
>>>>>> in a
>>>>>> system and providing allocations as DMA-BUF handles. If this is the
>>>>>> case
>>>>>> then it looks close to being ready to me at least, but I would
>>>>>> love to
>>>>>> hear any other opinions and concerns.
>>>>>>
>>>>>
>>>>> Yes, at this point the only functionality that people are really
>>>>> depending on is the ability to allocate a dma_buf easily from
>>>>> userspace.
>>>>>
>>>>>> Back to this patchset, the last patch may be a bit different than the
>>>>>> others, it adds an unmapped heaps type and creation helper. I
>>>>>> wanted to
>>>>>> get this in to show off another heap type and maybe some issues we
>>>>>> may
>>>>>> have with the current ION framework. The unmapped heap is used
>>>>>> when the
>>>>>> backing memory should not (or cannot) be touched. Currently this kind
>>>>>> of heap is used for firewalled secure memory that can be allocated
>>>>>> like
>>>>>> normal heap memory but only used by secure devices (OP-TEE, crypto
>>>>>> HW,
>>>>>> etc). It is basically just copied from the "carveout" heap type with
>>>>>> the
>>>>>> only difference being it is not mappable to userspace and we do not
>>>>>> clear
>>>>>> the memory (as we should not map it either). So should this really
>>>>>> be a
>>>>>> new heap type? Or maybe advertised as a carveout heap but with an
>>>>>> additional allocation flag? Perhaps we do away with "types"
>>>>>> altogether
>>>>>> and just have flags, coherent/non-coherent, mapped/unmapped, etc.
>>>>>>
>>>>>> Maybe more thinking will be needed afterall..
>>>>>>
>>>>>
>>>>> So the cleanup looks okay (I need to finish reviewing) but I'm not a
>>>>> fan of adding another heaptype without solving the problem of adding
>>>>> some sort of devicetree binding or other method of allocating and
>>>>> placing Ion heaps. That plus uncached buffers are one of the big
>>>>> open problems that need to be solved for destaging Ion. See
>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>>
>>>>>
>>>>>
>>>>> for some background on that problem.
>>>>>
>>>>
>>>> I'm under the impression that adding heaps like carveouts/chunk will be
>>>> rather system specific and so do not lend themselves well to a
>>>> universal
>>>> DT style exporter. For instance a carveout memory space can be reported
>>>> by a device at runtime, then the driver managing that device should go
>>>> and use the carveout heap helpers to export that heap. If this is the
>>>> case then I'm not sure it is a problem for the ION core framework to
>>>> solve, but rather the users of it to figure out how best to create the
>>>> various heaps. All Ion needs to do is allow exporting and advertising
>>>> them IMHO.
>>>>
>>>
>>> I think it is a problem for the Ion core framework to take care of.
>>> Ion is useless if you don't actually have the heaps. Nobody has
>>> actually gotten a full Ion solution end-to-end with a carveout heap
>>> working in mainline because any proposals have been rejected. I think
>>> we need at least one example in mainline of how creating a carveout
>>> heap would work.
>>
>> In our evil vendor trees we have several examples. The issue being that
>> Ion is still staging and attempts for generic DT heap definitions
>> haven't seemed to go so well. So for now we just keep it specific to our
>> platforms until upstream makes a direction decision.
>>
>
> Yeah, it's been a bit of a chicken and egg in that this has been
> blocking Ion getting out of staging but we don't actually have
> in-tree users because it's still in staging.
>

I could post some of our Ion exporter patches anyway, might not have a
chance at getting in while it's still in staging but could show there
are users trying.

Kinda the reason I posted the unmapped heap. The OP-TEE folks have been
using it out-of-tree waiting for Ion destaging, but without more
users/examples upstream it seems to hold back destaging work in the
first place..

>>>
>>>> Thanks for the background thread link, I've been looking for some info
>>>> on current status of all this and "ion" is a bit hard to search the
>>>> lists for. The core reason for posting this cleanup series is to throw
>>>> my hat into the ring of all this Ion work and start getting familiar
>>>> with the pending issues. The last two patches are not all that
>>>> important
>>>> to get in right now.
>>>>
>>>> In that thread you linked above, it seems we may have arrived at a
>>>> similar problem for different reasons. I think the root issue is the
>>>> Ion
>>>> core makes too many assumptions about the heap memory. My proposal
>>>> would
>>>> be to allow the heap exporters more control over the DMA-BUF ops, maybe
>>>> even going as far as letting them provide their own complete struct
>>>> dma_buf_ops.
>>>>
>>>> Let me give an example where I think this is going to be useful. We
>>>> have
>>>> the classic constraint solving problem on our SoCs. Our SoCs are
>>>> full of
>>>> various coherent and non-coherent devices, some require contiguous
>>>> memory allocations, others have in-line IOMMUs so can operate on
>>>> non-contiguous, etc..
>>>>
>>>> DMA-BUF has a solution designed in for this we can use, namely
>>>> allocation at map time after all the attachments have come in. The
>>>> checking of each attached device to find the right backing memory is
>>>> something the DMA-BUF exporter has to do, and so this SoC specific
>>>> logic
>>>> would have to be added to each exporting framework (DRM, V4L2, etc),
>>>> unless we have one unified system exporter everyone uses, Ion.
>>>>
>>>
>>> That's how dmabuf is supposed to work in theory but in practice we
>>> also have the case of userspace allocates memory, mmaps, and then
>>> a device attaches to it. The issue is we end up having to do work
>>> and make decisions before all devices are actually attached.
>>>
>>
>> That just seems wrong, DMA-BUF should be used for, well, DMA-able
>> buffers.. Userspace should not be using these buffers without devices
>> attached, otherwise why not use a regular buffer. If you need to fill
>> the buffer then you should attach/map it first so the DMA-BUF exporter
>> can pick the appropriate backing memory first.
>>
>> Maybe a couple more rules on the ordering of DMA-BUF operations are
>> needed to prevent having to deal with all these non-useful permutations.
>>
>> Sumit? ^^
>>
>
> I'd love to just say "don't do that" but it's existing userspace
> behavior and it's really hard to change that.
>

Very true, we probably shouldn't ban the use of out of order DMA-BUF
operations, just only guarantee certain ones will always work for all
systems. Then userspace will migrate to the safer orderings that are
guaranteed to work.

>>>> Then each system can define one (maybe typeless) heap, the correct
>>>> backing type is system specific anyway, so let the system specific
>>>> backing logic in the unified system exporter heap handle picking that.
>>>> To allow that heaps need direct control of dma_buf_ops.
>>>>
>>>> Direct heap control of dma_buf_ops also fixes the cache/non-cache
>>>> issue,
>>>> and my unmapped memory issue, each heap type handles the quirks of its
>>>> backing storage in its own way, instead of trying to find some one size
>>>> fits all memory operations like we are doing now.
>>>>
>>>
>>> I don't think this is an issue of one-size fits all. We have flags
>>> to differentiate between cached and uncached paths, the issue is
>>> that doing the synchronization for uncached buffers is difficult.
>>>
>>
>> It is difficult, hence why letting an uncached heap exporter do all the
>> heavy work, instead of trying to deal with all these cases in the Ion
>> core framework.
>>
>>> I'm just not sure how an extra set of dma_buf ops actually solves
>>> the problem of needing to synchronize alias mappings.
>>>
>>
>> It doesn't solve it, it just moves the work out of the framework. There
>> are going to be a lot more interesting problems than this with some
>> types heaps we will have in the future, dealing with all the logic in
>> the framework core is not going to scale.
>>
>
> That is a good point. My immediate concern though is getting Ion out
> of staging. If the per heap dma_buf ops will help with that I'd
> certainly like to see them.
>

It seems to me a lot of the bickering going on is due to everyone
wanting their specific heap's use-cases to be built into the core Ion
framework. This will end with everyones edge cases being made into flags
for the core to deal with.

If instead each heap exporter gets to deal with the raw DMA-BUF ops then
we can deal with each situation based on heap type. For instance my
unmapped heap case needs to not do the CPU sync on device map. Liam's
un-cached heaps look like they will need their own set of attach/map
constraints. I'm working on a late-allocate heap that will just not work
with the current set of assumptions Ion makes about heaps (pre-allocated
backing memory). All that leads me to think one size fits all DMA-BUFs
ops are not going to scale for much longer.

Thanks,
Andrew

> Thanks,
> Laura
>
>> Thanks,
>> Andrew
>>
>>> Thanks,
>>> Laura
>>>
>>>> We can provide helpers for the simple heap types still, but with this
>>>> much of the heavy lifting moves out of the Ion core framework making it
>>>> much more simple, something I think it will need for de-staging.
>>>>
>>>> Anyway, I might be completely off base in my direction here, just
>>>> let me
>>>> know :)
>>>>
>>>
>

2019-01-21 15:16:47

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/18/19 3:43 PM, Liam Mark wrote:
> On Fri, 18 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/17/19 7:04 PM, Liam Mark wrote:
>>> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/16/19 4:48 PM, Liam Mark wrote:
>>>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>>>
>>>>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>>>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>>>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>
>>>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
>>>>>>>>>>>> here.
>>>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
>>>>>>>>>>>> anyway.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>>>>>>>>>>>> dma_buf_attachment *attachment,
>>>>>>>>>>>>         table = a->table;
>>>>>>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>>>>>> -            direction))
>>>>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>>>>>>>>>> maintenance.
>>>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>>>>>> dma_buf_attach then there won't have been a device attached so the
>>>>>>>>>>> calls
>>>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>>>>>> is attached later after data has already been written. Does that
>>>>>>>>>> sequence need supporting?
>>>>>>>>>
>>>>>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>>>>>> in Android, but I will focus on later for now.
>>>>>>>>>
>>>>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>>>>>> attached and mapped, right?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>>>>
>>>>>>>>> Camera device records video -> software post processing -> video device
>>>>>>>>> (who does compression of raw data) and writes to a file
>>>>>>>>>
>>>>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>>>>> IO-coherent (quite common).
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is the start of the problem, having cached mappings of memory that
>>>>>>>> is also being accessed non-coherently is going to cause issues one way
>>>>>>>> or another. On top of the speculative cache fills that have to be
>>>>>>>> constantly fought back against with CMOs like below; some coherent
>>>>>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>>>>>> (snoop filters get messed up).
>>>>>>>>
>>>>>>>> The solution is to either always have the addresses marked non-coherent
>>>>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>>>>> regular system memory allocated at runtime, then all cached mappings of
>>>>>>>> it need to be dropped, even the kernel logical address (area as painful
>>>>>>>> as that would be).
>>>>>>>>
>>>>>>>
>>>>>>> I agree it's broken, hence my desire to remove it :)
>>>>>>>
>>>>>>> The other problem is that uncached buffers are being used for
>>>>>>> performance reason so anything that would involve getting
>>>>>>> rid of the logical address would probably negate any performance
>>>>>>> benefit.
>>>>>>>
>>>>>>
>>>>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>>>>>> adamant that they have valid uses. I'm just not sure performance is one
>>>>>> of them, maybe in the case of software locks between devices or
>>>>>> something where there needs to be a lot of back and forth interleaved
>>>>>> access on small amounts of data?
>>>>>>
>>>>>
>>>>> I wasn't aware that ARM considered this not supported, I thought it was
>>>>> supported but they advised against it because of the potential performance
>>>>> impact.
>>>>>
>>>>
>>>> Not sure what you mean by "this" being not supported, do you mean mixed
>>>> attribute mappings? If so, it will certainly cause problems, and the
>>>> problems will change from platform to platform, avoid at all costs is my
>>>> understanding of ARM's position.
>>>>
>>>>> This is after all supported in the DMA APIs and up until now devices have
>>>>> been successfully commercializing with this configurations, and I think
>>>>> they will continue to commercialize with these configurations for quite a
>>>>> while.
>>>>>
>>>>
>>>> Use of uncached memory mappings are almost always wrong in my experience
>>>> and are used to work around some bug or because the user doesn't want to
>>>> implement proper CMOs. Counter examples welcome.
>>>>
>>>
>>> Okay, let me first try to clarify what I am referring to, as perhaps I am
>>> misunderstanding the conversation.
>>>
>>> In this discussion I was originally referring to a use case with cached
>>> memory being accessed by a non io-cohernet device.
>>>
>>> "In this example assume the buffer is cached and the devices are not
>>> IO-coherent (quite common)."
>>>
>>> to which you did not think was supported:
>>>
>>> "This is the start of the problem, having cached mappings of memory
>>> that is also being accessed non-coherently is going to cause issues
>>> one way or another.
>>> "
>>>
>>> And I interpreted Laura's comment below as saying she wanted to remove
>>> support in ION for cached memory being accessed by non io-cohernet
>>> devices:
>>> "I agree it's broken, hence my desire to remove it :)"
>>>
>>> So assuming my understanding above is correct (and you are not talking
>>> about something separate such as removing uncached ION allocation
>>> support).
>>>
>>
>> Ah, I think here is where we diverged, I'm assuming Laura's comment to
>> be referencing my issue with uncached mappings being handed out without
>> first removing all cached mappings of the same memory. Therefore it is
>> uncached heaps that are broken.
>>
>
> I am glad that is clarified, but can you then clarify for me your
> following statement, I am stil not clear what the problem is then.
>
> In response to:
> "In this example assume the buffer is cached and the devices are
> not IO-coherent (quite common)."
>
> You said:
>
> "This is the start of the problem, having cached mappings of memory that
> is also being accessed non-coherently is going to cause issues one way or
> another. "
>

This was a parallel thought, completely my fault for any confusion here.
I was wanting to point out that there will inherently be some issues
with both situations. Not that it is a blocker, was just on my mind.

>>> Then I guess I am not clear why current uses which use cached memory with
>>> non IO-coherent devices are considered to be working around some bug or
>>> are not implementing proper CMOs.
>>>
>>> They use CPU cached mappings because that is the most effective way to
>>> access the memory from the CPU side and the devices have an uncached
>>> IOMMU mapping because they don't support IO-coherency, and currenlty in
>>> the CPU they do cache mainteance at the time of dma map and dma umap so
>>> to me they are implementing correct CMOs.
>>>
>>
>> Fully agree here, using cached mappings and performing CMOs when needed
>> is the way to go when dealing with memory. IMHO the *only* time when
>> uncached mappings are appropriate is for memory mapped I/O (although it
>> looks like video memory was often treated as uncached (wc)).
>>
>>>>> It would be really unfortunate if support was removed as I think that
>>>>> would drive clients away from using upstream ION.
>>>>>
>>>>
>>>> I'm not petitioning to remove support, but at very least lets reverse
>>>> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
>>>> default, to get uncached you should need to add a flag to your
>>>> allocation command pointing out you know what you are doing.
>>>>
>>>
>>> You may not be petitioning to remove support for using cached memory with
>>> non io-coherent devices but I interpreted Laura's comment as wanting to do
>>> so, and I had concerns about that.
>>>
>>
>> What I would like is for the default memory handed out by Ion to be
>> normal cacheable memory, just like is always handed out to users-space.
>> DMA-BUF already provides the means to deal with the CMOs required to
>> work with non-io-coherent devices so all should be good here.
>>
>> If you want Ion to give out uncached memory then I think you should need
>> to explicitly state so with an allocation flag. And right now the
>> uncached memory you will get back may have other cached mappings (kernel
>> lowmem mappings) meaning you will have hard to predict results (on ARM
>> at least).
>
> Yes, I can understand why it would make sense to default to cached memory.
>
>> I just don't see much use for them (uncached mappings of
>> regular memory) right now.
>>
>
> I can understand why people don't like ION providing uncached support, but
> I have been pushing to keep un-cached support in order to keep the
> performance of Android ION clients on par with the previous version of ION
> until a better solution can be found (either by changing ION or Android).
>
> Basically most ION use cases don't involve any (or very little) CPU access
> so for now by using uncached ION allocations we can avoid the uncessary
> cache maitenance and we are safe if some CPU access is required. Of course
> for use cases involving a lot of CPU access the clients switch over to
> using cached buffers.
>

If you have very few CPU accesses then there should be very few CMOs I
agree. It seems the problem is you are constantly detaching and
re-attaching devices each time the buffer is passed around between each
device. To me that is broken usage, the devices should all attach, use
the buffer, then all detach. Otherwise after detach there is no clean
way to know what is the right thing to do with the buffer (CMO or not).

>
>>>>>>>>> ION buffer is allocated.
>>>>>>>>>
>>>>>>>>> //Camera device records video
>>>>>>>>> dma_buf_attach
>>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>>
>>>>>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>>>>>> problem.
>>>>>>>>
>>>>>>>>> [camera device writes to buffer]
>>>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>>>>
>>>>>>>> It doesn't know there will be any further CPU access, it could get freed
>>>>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>>>>> requests access again.
>>>>>>>>
>>>>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>>>>>> down
>>>>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>>>>
>>>>>>>>
>>>>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>>>>> everything as modular as possible and separate the steps, but at this
>>>>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>>>>>> for the next attachment, which may change the required backing memory
>>>>>>>> completely.
>>>>>>>>
>>>>>>>> All devices should attach before the first mapping, and only let go
>>>>>>>> after the task is complete, otherwise this buffers data needs copied off
>>>>>>>> to a different location or the CPU needs to take ownership in-between.
>>>>>>>>
>>>>>>>
>>>>>>> Maybe it's broken but it's the status quo and we spent a good
>>>>>>> amount of time at plumbers concluding there isn't a great way
>>>>>>> to fix it :/
>>>>>>>
>>>>>>
>>>>>> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
>>>>>>
>>>>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
>>>>>> I have a good solution either, I just don't see any way forward without
>>>>>> some use-case getting broken, so better to fix now over later.
>>>>>>
>>>>>
>>>>> I can see the benefits of Android doing things the way they do, I would
>>>>> request that changes we make continue to support Android, or we find a way
>>>>> to convice them to change, as they are the main ION client and I assume
>>>>> other ION clients in the future will want to do this as well.
>>>>>
>>>>
>>>> Android may be the biggest user today (makes sense, Ion come out of the
>>>> Android project), but that can change, and getting changes into Android
>>>> will be easier that the upstream kernel once Ion is out of staging.
>>>>
>>>> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
>>>> chips targeting Android, our core offerings target more traditional
>>>> Linux userspaces, and I'm guessing others will start to do the same as
>>>> ARM tries to push more into desktop, server, and other spaces again.
>>>>
>>>>> I am concerned that if you go with a solution which enforces what you
>>>>> mention above, and bring ION out of staging that way, it will make it that
>>>>> much harder to solve this for Android and therefore harder to get
>>>>> Android clients to move to the upstream ION (and get everybody off their
>>>>> vendor modified Android versions).
>>>>>
>>>>
>>>> That would be an Android problem, reducing functionality in upstream to
>>>> match what some evil vendor trees do to support Android is not the way
>>>> forward on this. At least for us we are going to try to make all our
>>>> software offerings follow proper buffer ownership (including our Android
>>>> offering).
>>>>
>>>>>>>>> //buffer is send down the pipeline
>>>>>>>>>
>>>>>>>>> // Usersapce software post processing occurs
>>>>>>>>> mmap buffer
>>>>>>>>
>>>>>>>> Perhaps the invalidate should happen here in mmap.
>>>>>>>>
>>>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>>>>>>> devices attached to buffer
>>>>>>>>
>>>>>>>> And that should be okay, mmap does the sync, and if no devices are
>>>>>>>> attached nothing could have changed the underlying memory in the
>>>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>>>>>>>
>>>>>>>>> [CPU reads/writes to the buffer]
>>>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>>>>>>> devices attached to buffer
>>>>>>>>> munmap buffer
>>>>>>>>>
>>>>>>>>> //buffer is send down the pipeline
>>>>>>>>> // Buffer is send to video device (who does compression of raw data) and
>>>>>>>>> writes to a file
>>>>>>>>> dma_buf_attach
>>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>>> [video device writes to buffer]
>>>>>>>>> dma_buf_unmap_attachment
>>>>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
>>>>>>>>> down
>>>>>>>>> the pipeline and Video doesn't know the end of the use case)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not
>>>>>>>>>>> doing CPU
>>>>>>>>>>> access then there is no requirement (that I am aware of) for you to
>>>>>>>>>>> call
>>>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and
>>>>>>>>>>> if this
>>>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache
>>>>>>>>>>> maintenance
>>>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>>>>>>> maintenance on the buffer?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Because ION no longer provides DMA ready memory.
>>>>>>>>> Take the above example.
>>>>>>>>>
>>>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>>>>>>> Zeros are written to the cache.
>>>>>>>>>
>>>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>>>>>>> The camera devices writes directly to the buffer in DDR.
>>>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the
>>>>>>>>> zeros) is
>>>>>>>>> evicted from the cache, this zero overwrites data the camera device has
>>>>>>>>> written which corrupts your data.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>>>>>>> for CPU access at the time of zeroing.
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>>> Liam
>>>>>>>>>
>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>>>>>> a Linux Foundation Collaborative Project
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>> a Linux Foundation Collaborative Project
>>>>>
>>>>
>>>
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>
>>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2019-01-21 20:06:37

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Mon, 21 Jan 2019, Andrew F. Davis wrote:

> On 1/18/19 3:43 PM, Liam Mark wrote:
> > On Fri, 18 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/17/19 7:04 PM, Liam Mark wrote:
> >>> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/16/19 4:48 PM, Liam Mark wrote:
> >>>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>>>
> >>>>>> On 1/15/19 1:05 PM, Laura Abbott wrote:
> >>>>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> >>>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>>
> >>>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance
> >>>>>>>>>>>> here.
> >>>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed
> >>>>>>>>>>>> anyway.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>   drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>>>>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
> >>>>>>>>>>>> b/drivers/staging/android/ion/ion.c
> >>>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
> >>>>>>>>>>>> dma_buf_attachment *attachment,
> >>>>>>>>>>>>         table = a->table;
> >>>>>>>>>>>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>>>> -            direction))
> >>>>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>>>>>>
> >>>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache
> >>>>>>>>>>> maintenance.
> >>>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>>>>>>> dma_buf_attach then there won't have been a device attached so the
> >>>>>>>>>>> calls
> >>>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> That should be okay though, if you have no attachments (or all
> >>>>>>>>>> attachments are IO-coherent) then there is no need for cache
> >>>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>>>>>>> is attached later after data has already been written. Does that
> >>>>>>>>>> sequence need supporting?
> >>>>>>>>>
> >>>>>>>>> Yes, but also I think there are cases where CPU access can happen before
> >>>>>>>>> in Android, but I will focus on later for now.
> >>>>>>>>>
> >>>>>>>>>> DMA-BUF doesn't have to allocate the backing
> >>>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>>>>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>>>>>>> attached and mapped, right?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Here is an example where CPU access can happen later in Android.
> >>>>>>>>>
> >>>>>>>>> Camera device records video -> software post processing -> video device
> >>>>>>>>> (who does compression of raw data) and writes to a file
> >>>>>>>>>
> >>>>>>>>> In this example assume the buffer is cached and the devices are not
> >>>>>>>>> IO-coherent (quite common).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> This is the start of the problem, having cached mappings of memory that
> >>>>>>>> is also being accessed non-coherently is going to cause issues one way
> >>>>>>>> or another. On top of the speculative cache fills that have to be
> >>>>>>>> constantly fought back against with CMOs like below; some coherent
> >>>>>>>> interconnects behave badly when you mix coherent and non-coherent access
> >>>>>>>> (snoop filters get messed up).
> >>>>>>>>
> >>>>>>>> The solution is to either always have the addresses marked non-coherent
> >>>>>>>> (like device memory, no-map carveouts), or if you really want to use
> >>>>>>>> regular system memory allocated at runtime, then all cached mappings of
> >>>>>>>> it need to be dropped, even the kernel logical address (area as painful
> >>>>>>>> as that would be).
> >>>>>>>>
> >>>>>>>
> >>>>>>> I agree it's broken, hence my desire to remove it :)
> >>>>>>>
> >>>>>>> The other problem is that uncached buffers are being used for
> >>>>>>> performance reason so anything that would involve getting
> >>>>>>> rid of the logical address would probably negate any performance
> >>>>>>> benefit.
> >>>>>>>
> >>>>>>
> >>>>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> >>>>>> adamant that they have valid uses. I'm just not sure performance is one
> >>>>>> of them, maybe in the case of software locks between devices or
> >>>>>> something where there needs to be a lot of back and forth interleaved
> >>>>>> access on small amounts of data?
> >>>>>>
> >>>>>
> >>>>> I wasn't aware that ARM considered this not supported, I thought it was
> >>>>> supported but they advised against it because of the potential performance
> >>>>> impact.
> >>>>>
> >>>>
> >>>> Not sure what you mean by "this" being not supported, do you mean mixed
> >>>> attribute mappings? If so, it will certainly cause problems, and the
> >>>> problems will change from platform to platform, avoid at all costs is my
> >>>> understanding of ARM's position.
> >>>>
> >>>>> This is after all supported in the DMA APIs and up until now devices have
> >>>>> been successfully commercializing with this configurations, and I think
> >>>>> they will continue to commercialize with these configurations for quite a
> >>>>> while.
> >>>>>
> >>>>
> >>>> Use of uncached memory mappings are almost always wrong in my experience
> >>>> and are used to work around some bug or because the user doesn't want to
> >>>> implement proper CMOs. Counter examples welcome.
> >>>>
> >>>
> >>> Okay, let me first try to clarify what I am referring to, as perhaps I am
> >>> misunderstanding the conversation.
> >>>
> >>> In this discussion I was originally referring to a use case with cached
> >>> memory being accessed by a non io-cohernet device.
> >>>
> >>> "In this example assume the buffer is cached and the devices are not
> >>> IO-coherent (quite common)."
> >>>
> >>> to which you did not think was supported:
> >>>
> >>> "This is the start of the problem, having cached mappings of memory
> >>> that is also being accessed non-coherently is going to cause issues
> >>> one way or another.
> >>> "
> >>>
> >>> And I interpreted Laura's comment below as saying she wanted to remove
> >>> support in ION for cached memory being accessed by non io-cohernet
> >>> devices:
> >>> "I agree it's broken, hence my desire to remove it :)"
> >>>
> >>> So assuming my understanding above is correct (and you are not talking
> >>> about something separate such as removing uncached ION allocation
> >>> support).
> >>>
> >>
> >> Ah, I think here is where we diverged, I'm assuming Laura's comment to
> >> be referencing my issue with uncached mappings being handed out without
> >> first removing all cached mappings of the same memory. Therefore it is
> >> uncached heaps that are broken.
> >>
> >
> > I am glad that is clarified, but can you then clarify for me your
> > following statement, I am stil not clear what the problem is then.
> >
> > In response to:
> > "In this example assume the buffer is cached and the devices are
> > not IO-coherent (quite common)."
> >
> > You said:
> >
> > "This is the start of the problem, having cached mappings of memory that
> > is also being accessed non-coherently is going to cause issues one way or
> > another. "
> >
>
> This was a parallel thought, completely my fault for any confusion here.
> I was wanting to point out that there will inherently be some issues
> with both situations. Not that it is a blocker, was just on my mind.
>
> >>> Then I guess I am not clear why current uses which use cached memory with
> >>> non IO-coherent devices are considered to be working around some bug or
> >>> are not implementing proper CMOs.
> >>>
> >>> They use CPU cached mappings because that is the most effective way to
> >>> access the memory from the CPU side and the devices have an uncached
> >>> IOMMU mapping because they don't support IO-coherency, and currenlty in
> >>> the CPU they do cache mainteance at the time of dma map and dma umap so
> >>> to me they are implementing correct CMOs.
> >>>
> >>
> >> Fully agree here, using cached mappings and performing CMOs when needed
> >> is the way to go when dealing with memory. IMHO the *only* time when
> >> uncached mappings are appropriate is for memory mapped I/O (although it
> >> looks like video memory was often treated as uncached (wc)).
> >>
> >>>>> It would be really unfortunate if support was removed as I think that
> >>>>> would drive clients away from using upstream ION.
> >>>>>
> >>>>
> >>>> I'm not petitioning to remove support, but at very least lets reverse
> >>>> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
> >>>> default, to get uncached you should need to add a flag to your
> >>>> allocation command pointing out you know what you are doing.
> >>>>
> >>>
> >>> You may not be petitioning to remove support for using cached memory with
> >>> non io-coherent devices but I interpreted Laura's comment as wanting to do
> >>> so, and I had concerns about that.
> >>>
> >>
> >> What I would like is for the default memory handed out by Ion to be
> >> normal cacheable memory, just like is always handed out to users-space.
> >> DMA-BUF already provides the means to deal with the CMOs required to
> >> work with non-io-coherent devices so all should be good here.
> >>
> >> If you want Ion to give out uncached memory then I think you should need
> >> to explicitly state so with an allocation flag. And right now the
> >> uncached memory you will get back may have other cached mappings (kernel
> >> lowmem mappings) meaning you will have hard to predict results (on ARM
> >> at least).
> >
> > Yes, I can understand why it would make sense to default to cached memory.
> >
> >> I just don't see much use for them (uncached mappings of
> >> regular memory) right now.
> >>
> >
> > I can understand why people don't like ION providing uncached support, but
> > I have been pushing to keep un-cached support in order to keep the
> > performance of Android ION clients on par with the previous version of ION
> > until a better solution can be found (either by changing ION or Android).
> >
> > Basically most ION use cases don't involve any (or very little) CPU access
> > so for now by using uncached ION allocations we can avoid the uncessary
> > cache maitenance and we are safe if some CPU access is required. Of course
> > for use cases involving a lot of CPU access the clients switch over to
> > using cached buffers.
> >
>
> If you have very few CPU accesses then there should be very few CMOs I
> agree. It seems the problem is you are constantly detaching and
> re-attaching devices each time the buffer is passed around between each
> device. To me that is broken usage, the devices should all attach, use
> the buffer, then all detach. Otherwise after detach there is no clean
> way to know what is the right thing to do with the buffer (CMO or not).
>

Yes it would be be great if all the devices could attach beforehand and be
kept attached.
Unfortunately it would be difficult in a buffer "pipelining" use case such
as Android's to get all the devices to attach beforehand.
We have spoken to the Google Android team about having some kind of
destructor support added so that devices could stay attached and then when
the use case ends be notified of the end of use case (through the
destructor) so that they could know when to detach.
Unfortunately the Google Android team said this was too difficult to add
as they don't track all the buffers in the system.

And please note that there are other complexities to having all the
devices in the pipeline attached at once, for example the
begin_cpu_access/end_cpu_access calls currently do cache maintenance on
the buffer for each of the attached devices, with lots of attached devices
this would result in a lot of duplicated cache maintenance on the buffer.
You would need some way to optimally apply cache maintenance that
satisfies all the devices.

Also you would need to not only keep the devices attached, you need to the
buffers dma-mapped (see thread "dma-buf: add support for mapping with dma
mapping attributes") for the cache maintenance to be applied correctly.

> >
> >>>>>>>>> ION buffer is allocated.
> >>>>>>>>>
> >>>>>>>>> //Camera device records video
> >>>>>>>>> dma_buf_attach
> >>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>>
> >>>>>>>> Why does the buffer need to be cleaned here? I just got through reading
> >>>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
> >>>>>>>> suggestion of tracking if the buffer has had CPU access since the last
> >>>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
> >>>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>>>>>>> problem.
> >>>>>>>>
> >>>>>>>>> [camera device writes to buffer]
> >>>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>>>>>>
> >>>>>>>> It doesn't know there will be any further CPU access, it could get freed
> >>>>>>>> after this for all we know, the invalidate can be saved until the CPU
> >>>>>>>> requests access again.
> >>>>>>>>
> >>>>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>>>>>>>> down
> >>>>>>>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> This seems like a broken use-case, I understand the desire to keep
> >>>>>>>> everything as modular as possible and separate the steps, but at this
> >>>>>>>> point no one owns this buffers backing memory, not the CPU or any
> >>>>>>>> device. I would go as far as to say DMA-BUF should be free now to
> >>>>>>>> de-allocate the backing storage if it wants, that way it could get ready
> >>>>>>>> for the next attachment, which may change the required backing memory
> >>>>>>>> completely.
> >>>>>>>>
> >>>>>>>> All devices should attach before the first mapping, and only let go
> >>>>>>>> after the task is complete, otherwise this buffers data needs copied off
> >>>>>>>> to a different location or the CPU needs to take ownership in-between.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Maybe it's broken but it's the status quo and we spent a good
> >>>>>>> amount of time at plumbers concluding there isn't a great way
> >>>>>>> to fix it :/
> >>>>>>>
> >>>>>>
> >>>>>> Hmm, guess that doesn't prove there is not a great way to fix it either.. :/
> >>>>>>
> >>>>>> Perhaps just stronger rules on sequencing of operations? I'm not saying
> >>>>>> I have a good solution either, I just don't see any way forward without
> >>>>>> some use-case getting broken, so better to fix now over later.
> >>>>>>
> >>>>>
> >>>>> I can see the benefits of Android doing things the way they do, I would
> >>>>> request that changes we make continue to support Android, or we find a way
> >>>>> to convice them to change, as they are the main ION client and I assume
> >>>>> other ION clients in the future will want to do this as well.
> >>>>>
> >>>>
> >>>> Android may be the biggest user today (makes sense, Ion come out of the
> >>>> Android project), but that can change, and getting changes into Android
> >>>> will be easier that the upstream kernel once Ion is out of staging.
> >>>>
> >>>> Unlike some other big ARM vendors, we (TI) do not primarily build mobile
> >>>> chips targeting Android, our core offerings target more traditional
> >>>> Linux userspaces, and I'm guessing others will start to do the same as
> >>>> ARM tries to push more into desktop, server, and other spaces again.
> >>>>
> >>>>> I am concerned that if you go with a solution which enforces what you
> >>>>> mention above, and bring ION out of staging that way, it will make it that
> >>>>> much harder to solve this for Android and therefore harder to get
> >>>>> Android clients to move to the upstream ION (and get everybody off their
> >>>>> vendor modified Android versions).
> >>>>>
> >>>>
> >>>> That would be an Android problem, reducing functionality in upstream to
> >>>> match what some evil vendor trees do to support Android is not the way
> >>>> forward on this. At least for us we are going to try to make all our
> >>>> software offerings follow proper buffer ownership (including our Android
> >>>> offering).
> >>>>
> >>>>>>>>> //buffer is send down the pipeline
> >>>>>>>>>
> >>>>>>>>> // Usersapce software post processing occurs
> >>>>>>>>> mmap buffer
> >>>>>>>>
> >>>>>>>> Perhaps the invalidate should happen here in mmap.
> >>>>>>>>
> >>>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>>>>>> devices attached to buffer
> >>>>>>>>
> >>>>>>>> And that should be okay, mmap does the sync, and if no devices are
> >>>>>>>> attached nothing could have changed the underlying memory in the
> >>>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>>>>>>
> >>>>>>>>> [CPU reads/writes to the buffer]
> >>>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>>>>>> devices attached to buffer
> >>>>>>>>> munmap buffer
> >>>>>>>>>
> >>>>>>>>> //buffer is send down the pipeline
> >>>>>>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>>>>>> writes to a file
> >>>>>>>>> dma_buf_attach
> >>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>>> [video device writes to buffer]
> >>>>>>>>> dma_buf_unmap_attachment
> >>>>>>>>> dma_buf_detach  (device cannot stay attached because it is being sent
> >>>>>>>>> down
> >>>>>>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not
> >>>>>>>>>>> doing CPU
> >>>>>>>>>>> access then there is no requirement (that I am aware of) for you to
> >>>>>>>>>>> call
> >>>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and
> >>>>>>>>>>> if this
> >>>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache
> >>>>>>>>>>> maintenance
> >>>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>>>>>> maintenance on the buffer?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Because ION no longer provides DMA ready memory.
> >>>>>>>>> Take the above example.
> >>>>>>>>>
> >>>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>>>>>> Zeros are written to the cache.
> >>>>>>>>>
> >>>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>>>>>> The camera devices writes directly to the buffer in DDR.
> >>>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the
> >>>>>>>>> zeros) is
> >>>>>>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>>>>>> written which corrupts your data.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>>>>>> for CPU access at the time of zeroing.
> >>>>>>>>
> >>>>>>>> Andrew
> >>>>>>>>
> >>>>>>>>> Liam
> >>>>>>>>>
> >>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>>>>>> a Linux Foundation Collaborative Project
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>> a Linux Foundation Collaborative Project
> >>>>>
> >>>>
> >>>
> >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> a Linux Foundation Collaborative Project
> >>>
> >>
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-21 20:12:47

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Fri, 18 Jan 2019, Andrew F. Davis wrote:

> On 1/17/19 7:11 PM, Liam Mark wrote:
> > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/16/19 4:54 PM, Liam Mark wrote:
> >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/16/19 9:19 AM, Brian Starkey wrote:
> >>>>> Hi :-)
> >>>>>
> >>>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >>>>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>
> >>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> >>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> >>>>>>>>>>> ---
> >>>>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
> >>>>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> >>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
> >>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> >>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>>>>>>>>>
> >>>>>>>>>>> table = a->table;
> >>>>>>>>>>>
> >>>>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>>> - direction))
> >>>>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>>>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> >>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> >>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> >>>>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
> >>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That should be okay though, if you have no attachments (or all
> >>>>>>>>> attachments are IO-coherent) then there is no need for cache
> >>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>>>>>>>> is attached later after data has already been written. Does that
> >>>>>>>>> sequence need supporting?
> >>>>>>>>
> >>>>>>>> Yes, but also I think there are cases where CPU access can happen before
> >>>>>>>> in Android, but I will focus on later for now.
> >>>>>>>>
> >>>>>>>>> DMA-BUF doesn't have to allocate the backing
> >>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
> >>>>>>>>> the devices have attached so it can know where to put the buffer. So we
> >>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
> >>>>>>>>> attached and mapped, right?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Here is an example where CPU access can happen later in Android.
> >>>>>>>>
> >>>>>>>> Camera device records video -> software post processing -> video device
> >>>>>>>> (who does compression of raw data) and writes to a file
> >>>>>>>>
> >>>>>>>> In this example assume the buffer is cached and the devices are not
> >>>>>>>> IO-coherent (quite common).
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is the start of the problem, having cached mappings of memory that
> >>>>>>> is also being accessed non-coherently is going to cause issues one way
> >>>>>>> or another. On top of the speculative cache fills that have to be
> >>>>>>> constantly fought back against with CMOs like below; some coherent
> >>>>>>> interconnects behave badly when you mix coherent and non-coherent access
> >>>>>>> (snoop filters get messed up).
> >>>>>>>
> >>>>>>> The solution is to either always have the addresses marked non-coherent
> >>>>>>> (like device memory, no-map carveouts), or if you really want to use
> >>>>>>> regular system memory allocated at runtime, then all cached mappings of
> >>>>>>> it need to be dropped, even the kernel logical address (area as painful
> >>>>>>> as that would be).
> >>>>>
> >>>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
> >>>>> "real" is that? It seems that we aren't really hitting that today on
> >>>>> real devices.
> >>>>>
> >>>>
> >>>> Sadly there is at least one real device like this now (TI AM654). We
> >>>> spent some time working with the ARM interconnect spec designers to see
> >>>> if this was allowed behavior, final conclusion was mixing coherent and
> >>>> non-coherent accesses is never a good idea.. So we have been working to
> >>>> try to minimize any cases of mixed attributes [0], if a region is
> >>>> coherent then everyone in the system needs to treat it as such and
> >>>> vice-versa, even clever CMO that work on other systems wont save you
> >>>> here. :(
> >>>>
> >>>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
> >>>>
> >>>>
> >>>>>>>
> >>>>>>>> ION buffer is allocated.
> >>>>>>>>
> >>>>>>>> //Camera device records video
> >>>>>>>> dma_buf_attach
> >>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>
> >>>>>>> Why does the buffer need to be cleaned here? I just got through reading
> >>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
> >>>>>>
> >>>>>> Actually +Brian this time :)
> >>>>>>
> >>>>>>> suggestion of tracking if the buffer has had CPU access since the last
> >>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
> >>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> >>>>>>> problem.
> >>>>>>>
> >>>>>>>> [camera device writes to buffer]
> >>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> >>>>>>>
> >>>>>>> It doesn't know there will be any further CPU access, it could get freed
> >>>>>>> after this for all we know, the invalidate can be saved until the CPU
> >>>>>>> requests access again.
> >>>>>
> >>>>> We don't have any API to allow the invalidate to happen on CPU access
> >>>>> if all devices already detached. We need a struct device pointer to
> >>>>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
> >>>>>
> >>>>> I had a chat with a few people internally after the previous
> >>>>> discussion with Liam. One suggestion was to use
> >>>>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
> >>>>> one other device attached (guarantees that we can do an invalidate in
> >>>>> the future if begin_cpu_access is called). If the last device
> >>>>> detaches, do a sync then.
> >>>>>
> >>>>> Conversely, in map_dma_buf, we would track if there was any CPU access
> >>>>> and use/skip the sync appropriately.
> >>>>>
> >>>>
> >>>> Now that I think this all through I agree this patch is probably wrong.
> >>>> The real fix needs to be better handling in the dma_map_sg() to deal
> >>>> with the case of the memory not being mapped (what I'm dealing with for
> >>>> unmapped heaps), and for cases when the memory in question is not cached
> >>>> (Liam's issue I think). For both these cases the dma_map_sg() does the
> >>>> wrong thing.
> >>>>
> >>>>> I did start poking the code to check out how that would look, but then
> >>>>> Christmas happened and I'm still catching back up.
> >>>>>
> >>>>>>>
> >>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>>>> the pipeline and Camera doesn't know the end of the use case)
> >>>>>>>>
> >>>>>>>
> >>>>>>> This seems like a broken use-case, I understand the desire to keep
> >>>>>>> everything as modular as possible and separate the steps, but at this
> >>>>>>> point no one owns this buffers backing memory, not the CPU or any
> >>>>>>> device. I would go as far as to say DMA-BUF should be free now to
> >>>>>>> de-allocate the backing storage if it wants, that way it could get ready
> >>>>>>> for the next attachment, which may change the required backing memory
> >>>>>>> completely.
> >>>>>>>
> >>>>>>> All devices should attach before the first mapping, and only let go
> >>>>>>> after the task is complete, otherwise this buffers data needs copied off
> >>>>>>> to a different location or the CPU needs to take ownership in-between.
> >>>>>>>
> >>>>>
> >>>>> Yeah.. that's certainly the theory. Are there any DMA-BUF
> >>>>> implementations which actually do that? I hear it quoted a lot,
> >>>>> because that's what the docs say - but if the reality doesn't match
> >>>>> it, maybe we should change the docs.
> >>>>>
> >>>>
> >>>> Do you mean on the userspace side? I'm not sure, seems like Android
> >>>> might be doing this wrong from what I can gather. From kernel side if
> >>>> you mean the "de-allocate the backing storage", we will have some cases
> >>>> like this soon, so I want to make sure userspace is not abusing DMA-BUF
> >>>> in ways not specified in the documentation. Changing the docs to force
> >>>> the backing memory to always be allocated breaks the central goal in
> >>>> having attach/map in DMA-BUF separate.
> >>>>
> >>>>>>>> //buffer is send down the pipeline
> >>>>>>>>
> >>>>>>>> // Usersapce software post processing occurs
> >>>>>>>> mmap buffer
> >>>>>>>
> >>>>>>> Perhaps the invalidate should happen here in mmap.
> >>>>>>>
> >>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>>>>> devices attached to buffer
> >>>>>>>
> >>>>>>> And that should be okay, mmap does the sync, and if no devices are
> >>>>>>> attached nothing could have changed the underlying memory in the
> >>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>>>
> >>>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> >>>>> Liam was saying that it's too painful for them to do that every time a
> >>>>> device unmaps - when in many cases (device->device, no CPU) it's not
> >>>>> needed.
> >>>>
> >>>> Invalidates are painless, at least compared to a real cache flush, just
> >>>> set the invalid bit vs actually writing out lines. I thought the issue
> >>>> was on the map side.
> >>>>
> >>>
> >>> Invalidates aren't painless for us because we have a coherent system cache
> >>> so clean lines get written out.
> >>
> >> That seems very broken, why would clean lines ever need to be written
> >> out, that defeats the whole point of having the invalidate separate from
> >> clean. How do you deal with stale cache lines? I guess in your case this
> >> is what forces you to have to use uncached memory for DMA-able memory.
> >>
> >
> > My understanding is that our ARM invalidate is a clean + invalidate, I had
> > concerns about the clean lines being written to the the system cache as
> > part of the 'clean', but the following 'invalidate' would take care of
> > actually invalidating the lines (so nothign broken).
> > But i am probably wrong on this and it is probably smart enough not to the
> > writing of the clean lines.
> >
>
> You are correct that for a lot of ARM cores "invalidate" is always a
> "clean + invalidate". At first I thought this was kinda silly as there
> is now no way to mark a dirty line invalid without it getting written
> out first, but if you think about it any dirty cache-line can be written
> out (cleaned) at anytime anyway, so this doesn't actually change system
> behavior. You should just not write to memory (make the line dirty)
> anything you don't want eventually written out.
>
> Point two, it's not just smart enough to not write-out clean lines, it
> is guaranteed not to write them out by the spec. Otherwise since
> cache-lines can be randomly filled if those same clean lines got written
> out on invalidate operations there would be no way to maintain coherency
> and things would be written over top each other all over the place.
>
> > But regardless, targets supporting a coherent system cache is a legitamate
> > configuration and an invalidate on this configuration does have to go to
> > the bus to invalidate the system cache (which isn't free) so I dont' think
> > you can make the assumption that invalidates are cheap so that it is okay
> > to do them (even if they are not needed) on every dma unmap.
> >
>
> Very true, CMOs need to be broadcast to other coherent masters on a
> coherent interconnect (and the interconnect itself if it has a cache as
> well (L3)), so not 100% free, but almost, just the infinitesimal cost of
> the cache tag check in hardware. If there are no non-coherent devices
> attached then the CMOs are no-ops, if there are then the data needs to
> be written out either way, doing it every access like is done with
> uncached memory (- any write combining) will blow away any saving made
> from the one less CMO. Either way you lose with uncached mappings of
> memory. If I'm wrong I would love to know.
>

I would need to think about this more before replying.

> >>> And these invalidates can occur on fairly large buffers.
> >>>
> >>> That is why we haven't went with using cached ION memory and "tracking CPU
> >>> access" because it only solves half the problem, ie there isn't a way to
> >>> safely skip the invalidate (because we can't read the future).
> >>> Our solution was to go with uncached ION memory (when possible), but as
> >>> you can see in other discussions upstream support for uncached memory has
> >>> its own issues.
> >>>
> >>
> >> Sounds like you need to fix upstream support then, finding a way to drop
> >> all cacheable mappings of memory you want to make uncached mappings for
> >> seems to be the only solution.
> >>
> >
> > I think we can probably agree that there woudln't be a good way to remove
> > cached mappings without causing an unacceptable performance degradation
> > since it would fragment all the nice 1GB kernel mappings we have.
> >
> > So I am trying to find an alternative solution.
> >
>
> I'm not sure there is a better solution. How hard is this solution to
> implement anyway? The kernel already has to make gaps and cut up that
> nice 1GB mapping when you make a reserved memory space in the lowmem
> area, so all the logic is probably already implemented. Just need to
> allow it to be hooked into from Ion when doing doing the uncached mappings.
>

Even before attempting to implement it may be best to first run an
experiment where block mappings are disabled for the kernel mappings in
order to simulate a system where a lot of memory is allocated with
uncached mappings (such as could happen when using ION with Android). Then
try running any important benchmarks, my expectation is
that the performance impact of losing these kernel mappings will be
unacceptable.

It was a couple years ago, but I remember when I played around with the
kernel block mappings it was impacting some of our benchmarks.

> >>>>>
> >>>>>>>
> >>>>>>>> [CPU reads/writes to the buffer]
> >>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>>>>> devices attached to buffer
> >>>>>>>> munmap buffer
> >>>>>>>>
> >>>>>>>> //buffer is send down the pipeline
> >>>>>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>>>>> writes to a file
> >>>>>>>> dma_buf_attach
> >>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>> [video device writes to buffer]
> >>>>>>>> dma_buf_unmap_attachment
> >>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> >>>>>>>>>> access then there is no requirement (that I am aware of) for you to call
> >>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> >>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> >>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>>>>> maintenance on the buffer?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Because ION no longer provides DMA ready memory.
> >>>>>>>> Take the above example.
> >>>>>>>>
> >>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>>>>> Zeros are written to the cache.
> >>>>>>>>
> >>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>>>>> The camera devices writes directly to the buffer in DDR.
> >>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> >>>>>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>>>>> written which corrupts your data.
> >>>>>>>>
> >>>>>>>
> >>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>>>>> for CPU access at the time of zeroing.
> >>>>>>>
> >>>>>
> >>>>> Actually that should be at the point of the first non-coherent device
> >>>>> mapping the buffer right? No point in doing CMO if the future accesses
> >>>>> are coherent.
> >>>>
> >>>> I see your point, as long as the zeroing is guaranteed to be the first
> >>>> access to this buffer then it should be safe.
> >>>>
> >>>> Andrew
> >>>>
> >>>>>
> >>>>> Cheers,
> >>>>> -Brian
> >>>>>
> >>>>>>> Andrew
> >>>>>>>
> >>>>>>>> Liam
> >>>>>>>>
> >>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>>>>> a Linux Foundation Collaborative Project
> >>>>>>>>
> >>>>
> >>>
> >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> a Linux Foundation Collaborative Project
> >>>
> >>
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-21 21:24:49

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/21/19 5:22 AM, Brian Starkey wrote:
> Hi,
>
> Sorry for being a bit sporadic on this. I was out travelling last week
> with little time for email.
>
> On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
>> On 1/17/19 7:11 PM, Liam Mark wrote:
>>> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/16/19 4:54 PM, Liam Mark wrote:
>>>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>>>
>>>>>> On 1/16/19 9:19 AM, Brian Starkey wrote:
>>>>>>> Hi :-)
>>>>>>>
>>>>>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>>>>>>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
>>>>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>>>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>>
>>>>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>>>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>>>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
>>>>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
>>>>>>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
>>>>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>>>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>>>>>>>>>>>>>
>>>>>>>>>>>>> table = a->table;
>>>>>>>>>>>>>
>>>>>>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>>>>>>>>>>>> - direction))
>>>>>>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>>>>>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
>>>>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>>>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
>>>>>>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
>>>>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That should be okay though, if you have no attachments (or all
>>>>>>>>>>> attachments are IO-coherent) then there is no need for cache
>>>>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>>>>>>>>>> is attached later after data has already been written. Does that
>>>>>>>>>>> sequence need supporting?
>>>>>>>>>>
>>>>>>>>>> Yes, but also I think there are cases where CPU access can happen before
>>>>>>>>>> in Android, but I will focus on later for now.
>>>>>>>>>>
>>>>>>>>>>> DMA-BUF doesn't have to allocate the backing
>>>>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
>>>>>>>>>>> the devices have attached so it can know where to put the buffer. So we
>>>>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
>>>>>>>>>>> attached and mapped, right?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is an example where CPU access can happen later in Android.
>>>>>>>>>>
>>>>>>>>>> Camera device records video -> software post processing -> video device
>>>>>>>>>> (who does compression of raw data) and writes to a file
>>>>>>>>>>
>>>>>>>>>> In this example assume the buffer is cached and the devices are not
>>>>>>>>>> IO-coherent (quite common).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is the start of the problem, having cached mappings of memory that
>>>>>>>>> is also being accessed non-coherently is going to cause issues one way
>>>>>>>>> or another. On top of the speculative cache fills that have to be
>>>>>>>>> constantly fought back against with CMOs like below; some coherent
>>>>>>>>> interconnects behave badly when you mix coherent and non-coherent access
>>>>>>>>> (snoop filters get messed up).
>>>>>>>>>
>>>>>>>>> The solution is to either always have the addresses marked non-coherent
>>>>>>>>> (like device memory, no-map carveouts), or if you really want to use
>>>>>>>>> regular system memory allocated at runtime, then all cached mappings of
>>>>>>>>> it need to be dropped, even the kernel logical address (area as painful
>>>>>>>>> as that would be).
>>>>>>>
>>>>>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
>>>>>>> "real" is that? It seems that we aren't really hitting that today on
>>>>>>> real devices.
>>>>>>>
>>>>>>
>>>>>> Sadly there is at least one real device like this now (TI AM654). We
>>>>>> spent some time working with the ARM interconnect spec designers to see
>>>>>> if this was allowed behavior, final conclusion was mixing coherent and
>>>>>> non-coherent accesses is never a good idea.. So we have been working to
>>>>>> try to minimize any cases of mixed attributes [0], if a region is
>>>>>> coherent then everyone in the system needs to treat it as such and
>>>>>> vice-versa, even clever CMO that work on other systems wont save you
>>>>>> here. :(
>>>>>>
>>>>>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
>>>>>>
>
> "Never a good idea" - but I think it should still be well defined by
> the ARMv8 ARM (Section B2.8). Does this apply to your system?
>
> "If the mismatched attributes for a memory location all assign the
> same shareability attribute to a Location that has a cacheable
> attribute, any loss of uniprocessor semantics, ordering, or coherency
> within a shareability domain can be avoided by use of software cache
> management"
>
> https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
>
> If the cache is invalidated when switching between access types,
> shouldn't the snoop filters get un-messed-up?
>

The details of the issue are this, our coherent interconnect (MSMC) has
a snoop filter (for those following along at home it's a list of which
cache lines are currently inside each connected master so snoop requests
can be filtered for masters that wont care). When a "NoSnoop"(non-cached
or non-shareable) transaction is received for a location from any master
it assumes that location cannot be in the cache of *any* master (as the
correct cache-line state transition a given core will take for that line
is not defined by ARM spec), so it drops all records of that line. The
only way to recover from this is for every master to invalidate the line
and pick it back up again so the snoop filer can re-learn who really has
it again. Invalidate on one core also doesn't propagate to the different
cores as those are requests are also blocked by the now confused snoop
filter, so each and every core must manually do it..

It behaves much more like later in ARMv8 ARM (Section B2.8):

"If the mismatched attributes for a Location mean that multiple
cacheable accesses to the Location might be made with different
shareability attributes, then uniprocessor semantics, ordering, and
coherency are guaranteed only if:
• Each PE that accesses the Location with a cacheable attribute performs
a clean and invalidate of the Location before and after accessing that
Location.
• A DMB barrier with scope that covers the full shareability of the
accesses is placed between any accesses to the same memory Location that
use different attributes."

>>>>>>
>>>>>>>>>
>>>>>>>>>> ION buffer is allocated.
>>>>>>>>>>
>>>>>>>>>> //Camera device records video
>>>>>>>>>> dma_buf_attach
>>>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>>>
>>>>>>>>> Why does the buffer need to be cleaned here? I just got through reading
>>>>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
>>>>>>>>
>>>>>>>> Actually +Brian this time :)
>>>>>>>>
>>>>>>>>> suggestion of tracking if the buffer has had CPU access since the last
>>>>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
>>>>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>>> [camera device writes to buffer]
>>>>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>>>>>>>>
>>>>>>>>> It doesn't know there will be any further CPU access, it could get freed
>>>>>>>>> after this for all we know, the invalidate can be saved until the CPU
>>>>>>>>> requests access again.
>>>>>>>
>>>>>>> We don't have any API to allow the invalidate to happen on CPU access
>>>>>>> if all devices already detached. We need a struct device pointer to
>>>>>>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
>>>>>>>
>>>>>>> I had a chat with a few people internally after the previous
>>>>>>> discussion with Liam. One suggestion was to use
>>>>>>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
>>>>>>> one other device attached (guarantees that we can do an invalidate in
>>>>>>> the future if begin_cpu_access is called). If the last device
>>>>>>> detaches, do a sync then.
>>>>>>>
>>>>>>> Conversely, in map_dma_buf, we would track if there was any CPU access
>>>>>>> and use/skip the sync appropriately.
>>>>>>>
>>>>>>
>>>>>> Now that I think this all through I agree this patch is probably wrong.
>>>>>> The real fix needs to be better handling in the dma_map_sg() to deal
>>>>>> with the case of the memory not being mapped (what I'm dealing with for
>>>>>> unmapped heaps), and for cases when the memory in question is not cached
>>>>>> (Liam's issue I think). For both these cases the dma_map_sg() does the
>>>>>> wrong thing.
>>>>>>
>>>>>>> I did start poking the code to check out how that would look, but then
>>>>>>> Christmas happened and I'm still catching back up.
>>>>>>>
>>>>>>>>>
>>>>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>>>>>>>> the pipeline and Camera doesn't know the end of the use case)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This seems like a broken use-case, I understand the desire to keep
>>>>>>>>> everything as modular as possible and separate the steps, but at this
>>>>>>>>> point no one owns this buffers backing memory, not the CPU or any
>>>>>>>>> device. I would go as far as to say DMA-BUF should be free now to
>>>>>>>>> de-allocate the backing storage if it wants, that way it could get ready
>>>>>>>>> for the next attachment, which may change the required backing memory
>>>>>>>>> completely.
>>>>>>>>>
>>>>>>>>> All devices should attach before the first mapping, and only let go
>>>>>>>>> after the task is complete, otherwise this buffers data needs copied off
>>>>>>>>> to a different location or the CPU needs to take ownership in-between.
>>>>>>>>>
>>>>>>>
>>>>>>> Yeah.. that's certainly the theory. Are there any DMA-BUF
>>>>>>> implementations which actually do that? I hear it quoted a lot,
>>>>>>> because that's what the docs say - but if the reality doesn't match
>>>>>>> it, maybe we should change the docs.
>>>>>>>
>>>>>>
>>>>>> Do you mean on the userspace side? I'm not sure, seems like Android
>>>>>> might be doing this wrong from what I can gather. From kernel side if
>>>>>> you mean the "de-allocate the backing storage", we will have some cases
>>>>>> like this soon, so I want to make sure userspace is not abusing DMA-BUF
>>>>>> in ways not specified in the documentation. Changing the docs to force
>>>>>> the backing memory to always be allocated breaks the central goal in
>>>>>> having attach/map in DMA-BUF separate.
>
> Actually I meant in the kernel, in exporters. I haven't seen anyone
> using the API as it was intended (defer allocation until first map,
> migrate between different attachments, etc.). Mostly, backing storage
> seems to get allocated at the point of export, and device mappings are
> often held persistently (e.g. the DRM prime code maps the buffer at
> import time, and keeps it mapped: drm_gem_prime_import_dev).
>

I haven't either, which is a shame as it allows for some really useful
management strategies for shared memory resources. I'm working on one
such case right now, maybe I'll get to be the first to upstream one :)

> I wasn't aware that CPU access before first device access was
> considered an abuse of the API - it seems like a valid thing to want
> to do.
>

That's just it, I don't know if it is an abuse of API, I'm trying to get
some clarity on that. If we do want to allow early CPU access then that
seems to be in contrast to the idea of deferred allocation until first
device map, what is supposed to backing the buffer if no devices have
attached or mapped yet? Just some system memory followed by migration on
the first attach to the proper backing? Seems too time wasteful to be
have a valid use.

Maybe it should be up to the exporter if early CPU access is allowed?

I'm hoping someone with authority over the DMA-BUF framework can clarify
original intentions here.

>>>>>>
>>>>>>>>>> //buffer is send down the pipeline
>>>>>>>>>>
>>>>>>>>>> // Usersapce software post processing occurs
>>>>>>>>>> mmap buffer
>>>>>>>>>
>>>>>>>>> Perhaps the invalidate should happen here in mmap.
>>>>>>>>>
>>>>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
>>>>>>>>>> devices attached to buffer
>>>>>>>>>
>>>>>>>>> And that should be okay, mmap does the sync, and if no devices are
>>>>>>>>> attached nothing could have changed the underlying memory in the
>>>>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
>>>>>>>
>>>>>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
>>>>>>> Liam was saying that it's too painful for them to do that every time a
>>>>>>> device unmaps - when in many cases (device->device, no CPU) it's not
>>>>>>> needed.
>>>>>>
>>>>>> Invalidates are painless, at least compared to a real cache flush, just
>>>>>> set the invalid bit vs actually writing out lines. I thought the issue
>>>>>> was on the map side.
>>>>>>
>>>>>
>>>>> Invalidates aren't painless for us because we have a coherent system cache
>>>>> so clean lines get written out.
>>>>
>>>> That seems very broken, why would clean lines ever need to be written
>>>> out, that defeats the whole point of having the invalidate separate from
>>>> clean. How do you deal with stale cache lines? I guess in your case this
>>>> is what forces you to have to use uncached memory for DMA-able memory.
>>>>
>>>
>>> My understanding is that our ARM invalidate is a clean + invalidate, I had
>>> concerns about the clean lines being written to the the system cache as
>>> part of the 'clean', but the following 'invalidate' would take care of
>>> actually invalidating the lines (so nothign broken).
>>> But i am probably wrong on this and it is probably smart enough not to the
>>> writing of the clean lines.
>>>
>>
>> You are correct that for a lot of ARM cores "invalidate" is always a
>> "clean + invalidate". At first I thought this was kinda silly as there
>> is now no way to mark a dirty line invalid without it getting written
>> out first, but if you think about it any dirty cache-line can be written
>> out (cleaned) at anytime anyway, so this doesn't actually change system
>> behavior. You should just not write to memory (make the line dirty)
>> anything you don't want eventually written out.
>>
>> Point two, it's not just smart enough to not write-out clean lines, it
>> is guaranteed not to write them out by the spec. Otherwise since
>> cache-lines can be randomly filled if those same clean lines got written
>> out on invalidate operations there would be no way to maintain coherency
>> and things would be written over top each other all over the place.
>>
>>> But regardless, targets supporting a coherent system cache is a legitamate
>>> configuration and an invalidate on this configuration does have to go to
>>> the bus to invalidate the system cache (which isn't free) so I dont' think
>>> you can make the assumption that invalidates are cheap so that it is okay
>>> to do them (even if they are not needed) on every dma unmap.
>>>
>>
>> Very true, CMOs need to be broadcast to other coherent masters on a
>> coherent interconnect (and the interconnect itself if it has a cache as
>> well (L3)), so not 100% free, but almost, just the infinitesimal cost of
>> the cache tag check in hardware. If there are no non-coherent devices
>> attached then the CMOs are no-ops, if there are then the data needs to
>> be written out either way, doing it every access like is done with
>> uncached memory (- any write combining) will blow away any saving made
>> from the one less CMO. Either way you lose with uncached mappings of
>> memory. If I'm wrong I would love to know.
>>
>
> From what I understand, the current DMA APIs are not equipped to
> handle having coherent and non-coherent devices attached at the same
> time. The buffer is either in "CPU land" or "Device land", there's no
> smaller granule of "Coherent Device land" or "Non-Coherent Device
> land".
>
> I think if there's devices which are making coherent accesses, and
> devices which are making non-coherent accesses, then we can't support
> them being attached at the same time without some enhancements to the
> APIs.
>

I think you are right, we only handle sync to/from the CPU out to
"Device land". To sync from device to device I'm not sure there is
anything right now, they all have to be able to talk to each other
without any maintenance from the host CPU.

This will probably lead to some interesting cases like in OpenVX where a
key selling point is keeping the host out of the loop and let the remote
devices do all the sharing between themselves.

>>>>> And these invalidates can occur on fairly large buffers.
>>>>>
>>>>> That is why we haven't went with using cached ION memory and "tracking CPU
>>>>> access" because it only solves half the problem, ie there isn't a way to
>>>>> safely skip the invalidate (because we can't read the future).
>>>>> Our solution was to go with uncached ION memory (when possible), but as
>>>>> you can see in other discussions upstream support for uncached memory has
>>>>> its own issues.
>>>>>
>
> @Liam, in your problematic use-cases, are both devices detached when
> the buffer moves between them?
>
> 1) dev 1 map, access, unmap
> 2) dev 1 detach
> 3) (maybe) CPU access
> 4) dev 2 attach
> 5) dev 2 map, access
>
> I still think a pretty pragmatic solution is to use
> DMA_ATTR_SKIP_CPU_SYNC until the last device detaches. That won't work
> if your access sequence looks like above...
>
> ...however, if your sequence looks like above, then you probably need
> to keep at least one of the devices attached anyway. Otherwise, per
> the API, the buffer could get migrated after 2)/before 5). That will
> surely hurt a lot more than an invalidate.
>
>>>>
>>>> Sounds like you need to fix upstream support then, finding a way to drop
>>>> all cacheable mappings of memory you want to make uncached mappings for
>>>> seems to be the only solution.
>>>>
>>>
>>> I think we can probably agree that there woudln't be a good way to remove
>>> cached mappings without causing an unacceptable performance degradation
>>> since it would fragment all the nice 1GB kernel mappings we have.
>>>
>>> So I am trying to find an alternative solution.
>>>
>>
>> I'm not sure there is a better solution. How hard is this solution to
>> implement anyway? The kernel already has to make gaps and cut up that
>> nice 1GB mapping when you make a reserved memory space in the lowmem
>> area, so all the logic is probably already implemented. Just need to
>> allow it to be hooked into from Ion when doing doing the uncached mappings.
>>
>
> I haven't looked recently, but I'm not sure the early memblock code
> can be reused as-is at runtime. I seem to remember it makes a bunch of
> assumptions about the fact that it's running "early".
>
> If CPU uncached mappings of normal system memory is really the way
> forward, I could envisage a heap which maintains a pool of chunks of
> memory which it removed from the kernel mapping. The pool could grow
> (remove more pages from the kernel mapping)/shrink (add them back to
> the kernel mapping) as needed.
>
> John Reitan implemented a compound-page heap, which used compaction to
> get a pool of 2MB contiguous pages. Something like that would at least
> prevent needing full 4kB granularity when removing things from the
> kernel mapping.
>
> Even better, could it somehow be restricted to a region which is
> already fragmented? (e.g. the one which was used for the default CMA
> heap)
>
> Thanks,
> -Brian
>
>>>>>>>
>>>>>>>>>
>>>>>>>>>> [CPU reads/writes to the buffer]
>>>>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
>>>>>>>>>> devices attached to buffer
>>>>>>>>>> munmap buffer
>>>>>>>>>>
>>>>>>>>>> //buffer is send down the pipeline
>>>>>>>>>> // Buffer is send to video device (who does compression of raw data) and
>>>>>>>>>> writes to a file
>>>>>>>>>> dma_buf_attach
>>>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
>>>>>>>>>> [video device writes to buffer]
>>>>>>>>>> dma_buf_unmap_attachment
>>>>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
>>>>>>>>>> the pipeline and Video doesn't know the end of the use case)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
>>>>>>>>>>>> access then there is no requirement (that I am aware of) for you to call
>>>>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
>>>>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
>>>>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
>>>>>>>>>>> maintenance on the buffer?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Because ION no longer provides DMA ready memory.
>>>>>>>>>> Take the above example.
>>>>>>>>>>
>>>>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
>>>>>>>>>> Zeros are written to the cache.
>>>>>>>>>>
>>>>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
>>>>>>>>>> The camera devices writes directly to the buffer in DDR.
>>>>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
>>>>>>>>>> evicted from the cache, this zero overwrites data the camera device has
>>>>>>>>>> written which corrupts your data.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
>>>>>>>>> for CPU access at the time of zeroing.
>>>>>>>>>
>>>>>>>
>>>>>>> Actually that should be at the point of the first non-coherent device
>>>>>>> mapping the buffer right? No point in doing CMO if the future accesses
>>>>>>> are coherent.
>>>>>>
>>>>>> I see your point, as long as the zeroing is guaranteed to be the first
>>>>>> access to this buffer then it should be safe.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> -Brian
>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>>> Liam
>>>>>>>>>>
>>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>>>>>>> a Linux Foundation Collaborative Project
>>>>>>>>>>
>>>>>>
>>>>>
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>> a Linux Foundation Collaborative Project
>>>>>
>>>>
>>>
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>

2019-01-22 17:35:31

by Sumit Semwal

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

Hello everyone,

Sincere apologies for chiming in a bit late here, but was off due to
some health issues.

Also, adding Daniel Vetter to the mix, since he has been one of the
core guys who shaped up dma-buf as it is today.

On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis <[email protected]> wrote:
&gt;
&gt; On 1/21/19 5:22 AM, Brian Starkey wrote:
&gt; &gt; Hi,
&gt; &gt;
&gt; &gt; Sorry for being a bit sporadic on this. I was out travelling last week
&gt; &gt; with little time for email.
&gt; &gt;
&gt; &gt; On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
&gt; &gt;&gt; On 1/17/19 7:11 PM, Liam Mark wrote:
&gt; &gt;&gt;&gt; On Thu, 17 Jan 2019, Andrew F. Davis wrote:
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt; On 1/16/19 4:54 PM, Liam Mark wrote:
&gt; &gt;&gt;&gt;&gt;&gt; On Wed, 16 Jan 2019, Andrew F. Davis wrote:
&gt; &gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt; On 1/16/19 9:19 AM, Brian Starkey wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; Hi :-)
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; On Tue, Jan 15, 2019 at 12:40:16PM
-0600, Andrew F. Davis wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 1/15/19 12:38 PM, Andrew F.
Davis wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 1/15/19 11:45 AM, Liam Mark wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Tue, 15 Jan 2019,
Andrew F. Davis wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 1/14/19 11:13 AM,
Liam Mark wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Fri, 11 Jan
2019, Andrew F. Davis wrote:
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Buffers may
not be mapped from the CPU so skip cache maintenance here.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Accesses
from the CPU to a cached heap should be bracketed with
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
{begin,end}_cpu_access calls so maintenance should not be needed
anyway.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
Signed-off-by: Andrew F. Davis <[email protected]>
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; ---
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
drivers/staging/android/ion/ion.c | 7 ++++---
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; 1 file
changed, 4 insertions(+), 3 deletions(-)
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; diff --git
a/drivers/staging/android/ion/ion.c
b/drivers/staging/android/ion/ion.c
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; index
14e48f6eb734..09cb5a8e2b09 100644
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; ---
a/drivers/staging/android/ion/ion.c
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; +++
b/drivers/staging/android/ion/ion.c
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; @@ -261,8
+261,8 @@ static struct sg_table *ion_map_dma_buf(struct
dma_buf_attachment *attachment,
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; table = a-&gt;table;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; - if
(!dma_map_sg(attachment-&gt;dev, table-&gt;sgl, table-&gt;nents,
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; -
direction))
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; + if
(!dma_map_sg_attrs(attachment-&gt;dev, table-&gt;sgl, table-&gt;nents,
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; +
direction, DMA_ATTR_SKIP_CPU_SYNC))
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Unfortunately I
don't think you can do this for a couple reasons.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; You can't rely
on {begin,end}_cpu_access calls to do cache maintenance.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; If the calls to
{begin,end}_cpu_access were made before the call to
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; dma_buf_attach
then there won't have been a device attached so the calls
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; to
{begin,end}_cpu_access won't have done any cache maintenance.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; That should be okay
though, if you have no attachments (or all
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; attachments are
IO-coherent) then there is no need for cache
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; maintenance. Unless
you mean a sequence where a non-io-coherent device
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; is attached later
after data has already been written. Does that
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; sequence need supporting?
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Yes, but also I think
there are cases where CPU access can happen before
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; in Android, but I will
focus on later for now.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; DMA-BUF doesn't have
to allocate the backing
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; memory until
map_dma_buf() time, and that should only happen after all
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; the devices have
attached so it can know where to put the buffer. So we
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; shouldn't expect any
CPU access to buffers before all the devices are
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; attached and mapped, right?
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Here is an example where
CPU access can happen later in Android.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Camera device records
video -&gt; software post processing -&gt; video device
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; (who does compression of
raw data) and writes to a file
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; In this example assume
the buffer is cached and the devices are not
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; IO-coherent (quite common).
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; This is the start of the
problem, having cached mappings of memory that
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; is also being accessed
non-coherently is going to cause issues one way
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; or another. On top of the
speculative cache fills that have to be
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; constantly fought back
against with CMOs like below; some coherent
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; interconnects behave badly
when you mix coherent and non-coherent access
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; (snoop filters get messed up).
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; The solution is to either
always have the addresses marked non-coherent
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; (like device memory, no-map
carveouts), or if you really want to use
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; regular system memory
allocated at runtime, then all cached mappings of
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; it need to be dropped, even
the kernel logical address (area as painful
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; as that would be).
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; Ouch :-( I wasn't aware about these
potential interconnect issues. How
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; "real" is that? It seems that we
aren't really hitting that today on
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; real devices.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt; Sadly there is at least one real device
like this now (TI AM654). We
&gt; &gt;&gt;&gt;&gt;&gt;&gt; spent some time working with the ARM
interconnect spec designers to see
&gt; &gt;&gt;&gt;&gt;&gt;&gt; if this was allowed behavior, final
conclusion was mixing coherent and
&gt; &gt;&gt;&gt;&gt;&gt;&gt; non-coherent accesses is never a good
idea.. So we have been working to
&gt; &gt;&gt;&gt;&gt;&gt;&gt; try to minimize any cases of mixed
attributes [0], if a region is
&gt; &gt;&gt;&gt;&gt;&gt;&gt; coherent then everyone in the system
needs to treat it as such and
&gt; &gt;&gt;&gt;&gt;&gt;&gt; vice-versa, even clever CMO that work on
other systems wont save you
&gt; &gt;&gt;&gt;&gt;&gt;&gt; here. :(
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt; [0]
https://github.com/ARM-software/arm-trusted-firmware/pull/1553
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;
&gt; &gt; "Never a good idea" - but I think it should still be well defined by
&gt; &gt; the ARMv8 ARM (Section B2.8). Does this apply to your system?
&gt; &gt;
&gt; &gt; "If the mismatched attributes for a memory location all assign the
&gt; &gt; same shareability attribute to a Location that has a cacheable
&gt; &gt; attribute, any loss of uniprocessor semantics, ordering, or coherency
&gt; &gt; within a shareability domain can be avoided by use of software cache
&gt; &gt; management"
&gt; &gt;
&gt; &gt; https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
&gt; &gt;
&gt; &gt; If the cache is invalidated when switching between access types,
&gt; &gt; shouldn't the snoop filters get un-messed-up?
&gt; &gt;
&gt;
&gt; The details of the issue are this, our coherent interconnect (MSMC) has
&gt; a snoop filter (for those following along at home it's a list of which
&gt; cache lines are currently inside each connected master so snoop requests
&gt; can be filtered for masters that wont care). When a "NoSnoop"(non-cached
&gt; or non-shareable) transaction is received for a location from any master
&gt; it assumes that location cannot be in the cache of *any* master (as the
&gt; correct cache-line state transition a given core will take for that line
&gt; is not defined by ARM spec), so it drops all records of that line. The
&gt; only way to recover from this is for every master to invalidate the line
&gt; and pick it back up again so the snoop filer can re-learn who really has
&gt; it again. Invalidate on one core also doesn't propagate to the different
&gt; cores as those are requests are also blocked by the now confused snoop
&gt; filter, so each and every core must manually do it..
&gt;
&gt; It behaves much more like later in ARMv8 ARM (Section B2.8):
&gt;
&gt; "If the mismatched attributes for a Location mean that multiple
&gt; cacheable accesses to the Location might be made with different
&gt; shareability attributes, then uniprocessor semantics, ordering, and
&gt; coherency are guaranteed only if:
&gt; • Each PE that accesses the Location with a cacheable attribute performs
&gt; a clean and invalidate of the Location before and after accessing that
&gt; Location.
&gt; • A DMB barrier with scope that covers the full shareability of the
&gt; accesses is placed between any accesses to the same memory Location that
&gt; use different attributes."
&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; ION buffer is allocated.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; //Camera device records video
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; dma_buf_attach
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; dma_map_attachment
(buffer needs to be cleaned)
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Why does the buffer need to
be cleaned here? I just got through reading
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; the thread linked by Laura
in the other reply. I do like +Brian's
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Actually +Brian this time :)
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; suggestion of tracking if
the buffer has had CPU access since the last
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; time and only flushing the
cache if it has. As unmapped heaps never get
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; CPU mapped this would never
be the case for unmapped heaps, it solves my
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; problem.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; [camera device writes to buffer]
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; dma_buf_unmap_attachment
(buffer needs to be invalidated)
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; It doesn't know there will
be any further CPU access, it could get freed
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; after this for all we know,
the invalidate can be saved until the CPU
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; requests access again.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; We don't have any API to allow the
invalidate to happen on CPU access
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; if all devices already detached. We
need a struct device pointer to
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; give to the DMA API, otherwise on
arm64 there'll be no invalidate.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; I had a chat with a few people
internally after the previous
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; discussion with Liam. One suggestion
was to use
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; DMA_ATTR_SKIP_CPU_SYNC in
unmap_dma_buf, but only if there's at least
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; one other device attached
(guarantees that we can do an invalidate in
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; the future if begin_cpu_access is
called). If the last device
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; detaches, do a sync then.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; Conversely, in map_dma_buf, we would
track if there was any CPU access
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; and use/skip the sync appropriately.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt; Now that I think this all through I
agree this patch is probably wrong.
&gt; &gt;&gt;&gt;&gt;&gt;&gt; The real fix needs to be better handling
in the dma_map_sg() to deal
&gt; &gt;&gt;&gt;&gt;&gt;&gt; with the case of the memory not being
mapped (what I'm dealing with for
&gt; &gt;&gt;&gt;&gt;&gt;&gt; unmapped heaps), and for cases when the
memory in question is not cached
&gt; &gt;&gt;&gt;&gt;&gt;&gt; (Liam's issue I think). For both these
cases the dma_map_sg() does the
&gt; &gt;&gt;&gt;&gt;&gt;&gt; wrong thing.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; I did start poking the code to check
out how that would look, but then
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; Christmas happened and I'm still
catching back up.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; dma_buf_detach (device
cannot stay attached because it is being sent down
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; the pipeline and Camera
doesn't know the end of the use case)
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; This seems like a broken
use-case, I understand the desire to keep
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; everything as modular as
possible and separate the steps, but at this
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; point no one owns this
buffers backing memory, not the CPU or any
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; device. I would go as far as
to say DMA-BUF should be free now to
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; de-allocate the backing
storage if it wants, that way it could get ready
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; for the next attachment,
which may change the required backing memory
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; completely.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; All devices should attach
before the first mapping, and only let go
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; after the task is complete,
otherwise this buffers data needs copied off
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; to a different location or
the CPU needs to take ownership in-between.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; Yeah.. that's certainly the theory.
Are there any DMA-BUF
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; implementations which actually do
that? I hear it quoted a lot,
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; because that's what the docs say -
but if the reality doesn't match
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt; it, maybe we should change the docs.
&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt;
&gt; &gt;&gt;&gt;&gt;&gt;&gt; Do you mean on the userspace side? I'm
not sure, seems like Android
&gt; &gt;&gt;&gt;&gt;&gt;&gt; might be doing this wrong from what I
can gather. From kernel side if
&gt; &gt;&gt;&gt;&gt;&gt;&gt; you mean the "de-allocate the backing
storage", we will have some cases
&gt; &gt;&gt;&gt;&gt;&gt;&gt; like this soon, so I want to make sure
userspace is not abusing DMA-BUF
&gt; &gt;&gt;&gt;&gt;&gt;&gt; in ways not specified in the
documentation. Changing the docs to force
&gt; &gt;&gt;&gt;&gt;&gt;&gt; the backing memory to always be
allocated breaks the central goal in
&gt; &gt;&gt;&gt;&gt;&gt;&gt; having attach/map in DMA-BUF separate.
&gt; &gt;
&gt; &gt; Actually I meant in the kernel, in exporters. I haven't seen anyone
&gt; &gt; using the API as it was intended (defer allocation until first map,
&gt; &gt; migrate between different attachments, etc.). Mostly, backing storage
&gt; &gt; seems to get allocated at the point of export, and device mappings are
&gt; &gt; often held persistently (e.g. the DRM prime code maps the buffer at
&gt; &gt; import time, and keeps it mapped: drm_gem_prime_import_dev).
&gt; &gt;
&gt;

So I suppose some clarification on the 'intended use' part of dma-buf
about deferred allocation is due, so here it is: (Daniel, please feel
free to chime in with your opinion here)

- dma-buf was of course designed as a framework to help intelligent
exporters to defer allocation until first map, and be able to migrate
backing storage if required etc. At the same time, it is not a
_requirement_ from any exporter, so exporters so far have just used it
as a convenient mechanism for zero-copy.
- ION is one of the few dma-buf exporters in kernel, which satisfies a
certain set of expectations from its users.

&gt; I haven't either, which is a shame as it allows for some really useful
&gt; management strategies for shared memory resources. I'm working on one
&gt; such case right now, maybe I'll get to be the first to upstream one :)
&gt;
That will be a really good thing! Though perhaps we ought to think if
for what you're trying to do, is ION the right place, or should you
have a device-specific exporter, available to users via dma-buf apis?

&gt; &gt; I wasn't aware that CPU access before first device access was
&gt; &gt; considered an abuse of the API - it seems like a valid thing to want
&gt; &gt; to do.
&gt; &gt;
&gt;
&gt; That's just it, I don't know if it is an abuse of API, I'm trying to get
&gt; some clarity on that. If we do want to allow early CPU access then that
&gt; seems to be in contrast to the idea of deferred allocation until first
&gt; device map, what is supposed to backing the buffer if no devices have
&gt; attached or mapped yet? Just some system memory followed by migration on
&gt; the first attach to the proper backing? Seems too time wasteful to be
&gt; have a valid use.
&gt;
&gt; Maybe it should be up to the exporter if early CPU access is allowed?
&gt;
&gt; I'm hoping someone with authority over the DMA-BUF framework can clarify
&gt; original intentions here.

I don't think dma-buf as a framework stops early CPU access, and the
exporter can definitely decide on that by implementing
begin_cpu_access / end_cpu_access operations to not allow early CPU
access, if it so desires.

</[email protected]></[email protected]>

>
> >>>>>>
> >>>>>>>>>> //buffer is send down the pipeline
> >>>>>>>>>>
> >>>>>>>>>> // Usersapce software post processing occurs
> >>>>>>>>>> mmap buffer
> >>>>>>>>>
> >>>>>>>>> Perhaps the invalidate should happen here in mmap.
> >>>>>>>>>
> >>>>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> >>>>>>>>>> devices attached to buffer
> >>>>>>>>>
> >>>>>>>>> And that should be okay, mmap does the sync, and if no devices are
> >>>>>>>>> attached nothing could have changed the underlying memory in the
> >>>>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> >>>>>>>
> >>>>>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> >>>>>>> Liam was saying that it's too painful for them to do that every time a
> >>>>>>> device unmaps - when in many cases (device->device, no CPU) it's not
> >>>>>>> needed.
> >>>>>>
> >>>>>> Invalidates are painless, at least compared to a real cache flush, just
> >>>>>> set the invalid bit vs actually writing out lines. I thought the issue
> >>>>>> was on the map side.
> >>>>>>
> >>>>>
> >>>>> Invalidates aren't painless for us because we have a coherent system cache
> >>>>> so clean lines get written out.
> >>>>
> >>>> That seems very broken, why would clean lines ever need to be written
> >>>> out, that defeats the whole point of having the invalidate separate from
> >>>> clean. How do you deal with stale cache lines? I guess in your case this
> >>>> is what forces you to have to use uncached memory for DMA-able memory.
> >>>>
> >>>
> >>> My understanding is that our ARM invalidate is a clean + invalidate, I had
> >>> concerns about the clean lines being written to the the system cache as
> >>> part of the 'clean', but the following 'invalidate' would take care of
> >>> actually invalidating the lines (so nothign broken).
> >>> But i am probably wrong on this and it is probably smart enough not to the
> >>> writing of the clean lines.
> >>>
> >>
> >> You are correct that for a lot of ARM cores "invalidate" is always a
> >> "clean + invalidate". At first I thought this was kinda silly as there
> >> is now no way to mark a dirty line invalid without it getting written
> >> out first, but if you think about it any dirty cache-line can be written
> >> out (cleaned) at anytime anyway, so this doesn't actually change system
> >> behavior. You should just not write to memory (make the line dirty)
> >> anything you don't want eventually written out.
> >>
> >> Point two, it's not just smart enough to not write-out clean lines, it
> >> is guaranteed not to write them out by the spec. Otherwise since
> >> cache-lines can be randomly filled if those same clean lines got written
> >> out on invalidate operations there would be no way to maintain coherency
> >> and things would be written over top each other all over the place.
> >>
> >>> But regardless, targets supporting a coherent system cache is a legitamate
> >>> configuration and an invalidate on this configuration does have to go to
> >>> the bus to invalidate the system cache (which isn't free) so I dont' think
> >>> you can make the assumption that invalidates are cheap so that it is okay
> >>> to do them (even if they are not needed) on every dma unmap.
> >>>
> >>
> >> Very true, CMOs need to be broadcast to other coherent masters on a
> >> coherent interconnect (and the interconnect itself if it has a cache as
> >> well (L3)), so not 100% free, but almost, just the infinitesimal cost of
> >> the cache tag check in hardware. If there are no non-coherent devices
> >> attached then the CMOs are no-ops, if there are then the data needs to
> >> be written out either way, doing it every access like is done with
> >> uncached memory (- any write combining) will blow away any saving made
> >> from the one less CMO. Either way you lose with uncached mappings of
> >> memory. If I'm wrong I would love to know.
> >>
> >
> > From what I understand, the current DMA APIs are not equipped to
> > handle having coherent and non-coherent devices attached at the same
> > time. The buffer is either in "CPU land" or "Device land", there's no
> > smaller granule of "Coherent Device land" or "Non-Coherent Device
> > land".
> >
> > I think if there's devices which are making coherent accesses, and
> > devices which are making non-coherent accesses, then we can't support
> > them being attached at the same time without some enhancements to the
> > APIs.
> >
>
> I think you are right, we only handle sync to/from the CPU out to
> "Device land". To sync from device to device I'm not sure there is
> anything right now, they all have to be able to talk to each other
> without any maintenance from the host CPU.
>
> This will probably lead to some interesting cases like in OpenVX where a
> key selling point is keeping the host out of the loop and let the remote
> devices do all the sharing between themselves.
>
> >>>>> And these invalidates can occur on fairly large buffers.
> >>>>>
> >>>>> That is why we haven't went with using cached ION memory and "tracking CPU
> >>>>> access" because it only solves half the problem, ie there isn't a way to
> >>>>> safely skip the invalidate (because we can't read the future).
> >>>>> Our solution was to go with uncached ION memory (when possible), but as
> >>>>> you can see in other discussions upstream support for uncached memory has
> >>>>> its own issues.
> >>>>>
> >
> > @Liam, in your problematic use-cases, are both devices detached when
> > the buffer moves between them?
> >
> > 1) dev 1 map, access, unmap
> > 2) dev 1 detach
> > 3) (maybe) CPU access
> > 4) dev 2 attach
> > 5) dev 2 map, access
> >
> > I still think a pretty pragmatic solution is to use
> > DMA_ATTR_SKIP_CPU_SYNC until the last device detaches. That won't work
> > if your access sequence looks like above...
> >
> > ...however, if your sequence looks like above, then you probably need
> > to keep at least one of the devices attached anyway. Otherwise, per
> > the API, the buffer could get migrated after 2)/before 5). That will
> > surely hurt a lot more than an invalidate.
> >
> >>>>
> >>>> Sounds like you need to fix upstream support then, finding a way to drop
> >>>> all cacheable mappings of memory you want to make uncached mappings for
> >>>> seems to be the only solution.
> >>>>
> >>>
> >>> I think we can probably agree that there woudln't be a good way to remove
> >>> cached mappings without causing an unacceptable performance degradation
> >>> since it would fragment all the nice 1GB kernel mappings we have.
> >>>
> >>> So I am trying to find an alternative solution.
> >>>
> >>
> >> I'm not sure there is a better solution. How hard is this solution to
> >> implement anyway? The kernel already has to make gaps and cut up that
> >> nice 1GB mapping when you make a reserved memory space in the lowmem
> >> area, so all the logic is probably already implemented. Just need to
> >> allow it to be hooked into from Ion when doing doing the uncached mappings.
> >>
> >
> > I haven't looked recently, but I'm not sure the early memblock code
> > can be reused as-is at runtime. I seem to remember it makes a bunch of
> > assumptions about the fact that it's running "early".
> >
> > If CPU uncached mappings of normal system memory is really the way
> > forward, I could envisage a heap which maintains a pool of chunks of
> > memory which it removed from the kernel mapping. The pool could grow
> > (remove more pages from the kernel mapping)/shrink (add them back to
> > the kernel mapping) as needed.
> >
> > John Reitan implemented a compound-page heap, which used compaction to
> > get a pool of 2MB contiguous pages. Something like that would at least
> > prevent needing full 4kB granularity when removing things from the
> > kernel mapping.
> >
> > Even better, could it somehow be restricted to a region which is
> > already fragmented? (e.g. the one which was used for the default CMA
> > heap)
> >
> > Thanks,
> > -Brian
> >
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>> [CPU reads/writes to the buffer]
> >>>>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> >>>>>>>>>> devices attached to buffer
> >>>>>>>>>> munmap buffer
> >>>>>>>>>>
> >>>>>>>>>> //buffer is send down the pipeline
> >>>>>>>>>> // Buffer is send to video device (who does compression of raw data) and
> >>>>>>>>>> writes to a file
> >>>>>>>>>> dma_buf_attach
> >>>>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> >>>>>>>>>> [video device writes to buffer]
> >>>>>>>>>> dma_buf_unmap_attachment
> >>>>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> >>>>>>>>>> the pipeline and Video doesn't know the end of the use case)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> >>>>>>>>>>>> access then there is no requirement (that I am aware of) for you to call
> >>>>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> >>>>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> >>>>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
> >>>>>>>>>>> maintenance on the buffer?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Because ION no longer provides DMA ready memory.
> >>>>>>>>>> Take the above example.
> >>>>>>>>>>
> >>>>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
> >>>>>>>>>> Zeros are written to the cache.
> >>>>>>>>>>
> >>>>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
> >>>>>>>>>> The camera devices writes directly to the buffer in DDR.
> >>>>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> >>>>>>>>>> evicted from the cache, this zero overwrites data the camera device has
> >>>>>>>>>> written which corrupts your data.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> >>>>>>>>> for CPU access at the time of zeroing.
> >>>>>>>>>
> >>>>>>>
> >>>>>>> Actually that should be at the point of the first non-coherent device
> >>>>>>> mapping the buffer right? No point in doing CMO if the future accesses
> >>>>>>> are coherent.
> >>>>>>
> >>>>>> I see your point, as long as the zeroing is guaranteed to be the first
> >>>>>> access to this buffer then it should be safe.
> >>>>>>
> >>>>>> Andrew
> >>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> -Brian
> >>>>>>>
> >>>>>>>>> Andrew
> >>>>>>>>>
> >>>>>>>>>> Liam
> >>>>>>>>>>
> >>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>>>>>>> a Linux Foundation Collaborative Project
> >>>>>>>>>>
> >>>>>>
> >>>>>
> >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>>>> a Linux Foundation Collaborative Project
> >>>>>
> >>>>
> >>>
> >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> a Linux Foundation Collaborative Project
> >>>

Best,
Sumit.

2019-01-22 22:58:40

by Liam Mark

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Mon, 21 Jan 2019, Brian Starkey wrote:

> Hi,
>
> Sorry for being a bit sporadic on this. I was out travelling last week
> with little time for email.
>
> On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
> > On 1/17/19 7:11 PM, Liam Mark wrote:
> > > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> > >
> > >> On 1/16/19 4:54 PM, Liam Mark wrote:
> > >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> > >>>
> > >>>> On 1/16/19 9:19 AM, Brian Starkey wrote:
> > >>>>> Hi :-)
> > >>>>>
> > >>>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> > >>>>>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> > >>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote:
> > >>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> > >>>>>>>>
> > >>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote:
> > >>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> > >>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with
> > >>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Signed-off-by: Andrew F. Davis <[email protected]>
> > >>>>>>>>>>> ---
> > >>>>>>>>>>> drivers/staging/android/ion/ion.c | 7 ++++---
> > >>>>>>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> > >>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644
> > >>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c
> > >>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c
> > >>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> > >>>>>>>>>>>
> > >>>>>>>>>>> table = a->table;
> > >>>>>>>>>>>
> > >>>>>>>>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > >>>>>>>>>>> - direction))
> > >>>>>>>>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> > >>>>>>>>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
> > >>>>>>>>>>
> > >>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons.
> > >>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> > >>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the call to
> > >>>>>>>>>> dma_buf_attach then there won't have been a device attached so the calls
> > >>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> That should be okay though, if you have no attachments (or all
> > >>>>>>>>> attachments are IO-coherent) then there is no need for cache
> > >>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent device
> > >>>>>>>>> is attached later after data has already been written. Does that
> > >>>>>>>>> sequence need supporting?
> > >>>>>>>>
> > >>>>>>>> Yes, but also I think there are cases where CPU access can happen before
> > >>>>>>>> in Android, but I will focus on later for now.
> > >>>>>>>>
> > >>>>>>>>> DMA-BUF doesn't have to allocate the backing
> > >>>>>>>>> memory until map_dma_buf() time, and that should only happen after all
> > >>>>>>>>> the devices have attached so it can know where to put the buffer. So we
> > >>>>>>>>> shouldn't expect any CPU access to buffers before all the devices are
> > >>>>>>>>> attached and mapped, right?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Here is an example where CPU access can happen later in Android.
> > >>>>>>>>
> > >>>>>>>> Camera device records video -> software post processing -> video device
> > >>>>>>>> (who does compression of raw data) and writes to a file
> > >>>>>>>>
> > >>>>>>>> In this example assume the buffer is cached and the devices are not
> > >>>>>>>> IO-coherent (quite common).
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> This is the start of the problem, having cached mappings of memory that
> > >>>>>>> is also being accessed non-coherently is going to cause issues one way
> > >>>>>>> or another. On top of the speculative cache fills that have to be
> > >>>>>>> constantly fought back against with CMOs like below; some coherent
> > >>>>>>> interconnects behave badly when you mix coherent and non-coherent access
> > >>>>>>> (snoop filters get messed up).
> > >>>>>>>
> > >>>>>>> The solution is to either always have the addresses marked non-coherent
> > >>>>>>> (like device memory, no-map carveouts), or if you really want to use
> > >>>>>>> regular system memory allocated at runtime, then all cached mappings of
> > >>>>>>> it need to be dropped, even the kernel logical address (area as painful
> > >>>>>>> as that would be).
> > >>>>>
> > >>>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
> > >>>>> "real" is that? It seems that we aren't really hitting that today on
> > >>>>> real devices.
> > >>>>>
> > >>>>
> > >>>> Sadly there is at least one real device like this now (TI AM654). We
> > >>>> spent some time working with the ARM interconnect spec designers to see
> > >>>> if this was allowed behavior, final conclusion was mixing coherent and
> > >>>> non-coherent accesses is never a good idea.. So we have been working to
> > >>>> try to minimize any cases of mixed attributes [0], if a region is
> > >>>> coherent then everyone in the system needs to treat it as such and
> > >>>> vice-versa, even clever CMO that work on other systems wont save you
> > >>>> here. :(
> > >>>>
> > >>>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
> > >>>>
>
> "Never a good idea" - but I think it should still be well defined by
> the ARMv8 ARM (Section B2.8). Does this apply to your system?
>
> "If the mismatched attributes for a memory location all assign the
> same shareability attribute to a Location that has a cacheable
> attribute, any loss of uniprocessor semantics, ordering, or coherency
> within a shareability domain can be avoided by use of software cache
> management"
>
> https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
>
> If the cache is invalidated when switching between access types,
> shouldn't the snoop filters get un-messed-up?
>
> > >>>>
> > >>>>>>>
> > >>>>>>>> ION buffer is allocated.
> > >>>>>>>>
> > >>>>>>>> //Camera device records video
> > >>>>>>>> dma_buf_attach
> > >>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> > >>>>>>>
> > >>>>>>> Why does the buffer need to be cleaned here? I just got through reading
> > >>>>>>> the thread linked by Laura in the other reply. I do like +Brian's
> > >>>>>>
> > >>>>>> Actually +Brian this time :)
> > >>>>>>
> > >>>>>>> suggestion of tracking if the buffer has had CPU access since the last
> > >>>>>>> time and only flushing the cache if it has. As unmapped heaps never get
> > >>>>>>> CPU mapped this would never be the case for unmapped heaps, it solves my
> > >>>>>>> problem.
> > >>>>>>>
> > >>>>>>>> [camera device writes to buffer]
> > >>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> > >>>>>>>
> > >>>>>>> It doesn't know there will be any further CPU access, it could get freed
> > >>>>>>> after this for all we know, the invalidate can be saved until the CPU
> > >>>>>>> requests access again.
> > >>>>>
> > >>>>> We don't have any API to allow the invalidate to happen on CPU access
> > >>>>> if all devices already detached. We need a struct device pointer to
> > >>>>> give to the DMA API, otherwise on arm64 there'll be no invalidate.
> > >>>>>
> > >>>>> I had a chat with a few people internally after the previous
> > >>>>> discussion with Liam. One suggestion was to use
> > >>>>> DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least
> > >>>>> one other device attached (guarantees that we can do an invalidate in
> > >>>>> the future if begin_cpu_access is called). If the last device
> > >>>>> detaches, do a sync then.
> > >>>>>
> > >>>>> Conversely, in map_dma_buf, we would track if there was any CPU access
> > >>>>> and use/skip the sync appropriately.
> > >>>>>
> > >>>>
> > >>>> Now that I think this all through I agree this patch is probably wrong.
> > >>>> The real fix needs to be better handling in the dma_map_sg() to deal
> > >>>> with the case of the memory not being mapped (what I'm dealing with for
> > >>>> unmapped heaps), and for cases when the memory in question is not cached
> > >>>> (Liam's issue I think). For both these cases the dma_map_sg() does the
> > >>>> wrong thing.
> > >>>>
> > >>>>> I did start poking the code to check out how that would look, but then
> > >>>>> Christmas happened and I'm still catching back up.
> > >>>>>
> > >>>>>>>
> > >>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> > >>>>>>>> the pipeline and Camera doesn't know the end of the use case)
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> This seems like a broken use-case, I understand the desire to keep
> > >>>>>>> everything as modular as possible and separate the steps, but at this
> > >>>>>>> point no one owns this buffers backing memory, not the CPU or any
> > >>>>>>> device. I would go as far as to say DMA-BUF should be free now to
> > >>>>>>> de-allocate the backing storage if it wants, that way it could get ready
> > >>>>>>> for the next attachment, which may change the required backing memory
> > >>>>>>> completely.
> > >>>>>>>
> > >>>>>>> All devices should attach before the first mapping, and only let go
> > >>>>>>> after the task is complete, otherwise this buffers data needs copied off
> > >>>>>>> to a different location or the CPU needs to take ownership in-between.
> > >>>>>>>
> > >>>>>
> > >>>>> Yeah.. that's certainly the theory. Are there any DMA-BUF
> > >>>>> implementations which actually do that? I hear it quoted a lot,
> > >>>>> because that's what the docs say - but if the reality doesn't match
> > >>>>> it, maybe we should change the docs.
> > >>>>>
> > >>>>
> > >>>> Do you mean on the userspace side? I'm not sure, seems like Android
> > >>>> might be doing this wrong from what I can gather. From kernel side if
> > >>>> you mean the "de-allocate the backing storage", we will have some cases
> > >>>> like this soon, so I want to make sure userspace is not abusing DMA-BUF
> > >>>> in ways not specified in the documentation. Changing the docs to force
> > >>>> the backing memory to always be allocated breaks the central goal in
> > >>>> having attach/map in DMA-BUF separate.
>
> Actually I meant in the kernel, in exporters. I haven't seen anyone
> using the API as it was intended (defer allocation until first map,
> migrate between different attachments, etc.). Mostly, backing storage
> seems to get allocated at the point of export, and device mappings are
> often held persistently (e.g. the DRM prime code maps the buffer at
> import time, and keeps it mapped: drm_gem_prime_import_dev).
>
> I wasn't aware that CPU access before first device access was
> considered an abuse of the API - it seems like a valid thing to want
> to do.
>
> > >>>>
> > >>>>>>>> //buffer is send down the pipeline
> > >>>>>>>>
> > >>>>>>>> // Usersapce software post processing occurs
> > >>>>>>>> mmap buffer
> > >>>>>>>
> > >>>>>>> Perhaps the invalidate should happen here in mmap.
> > >>>>>>>
> > >>>>>>>> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no
> > >>>>>>>> devices attached to buffer
> > >>>>>>>
> > >>>>>>> And that should be okay, mmap does the sync, and if no devices are
> > >>>>>>> attached nothing could have changed the underlying memory in the
> > >>>>>>> mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are.
> > >>>>>
> > >>>>> Yeah, that's true - so long as you did an invalidate in unmap_dma_buf.
> > >>>>> Liam was saying that it's too painful for them to do that every time a
> > >>>>> device unmaps - when in many cases (device->device, no CPU) it's not
> > >>>>> needed.
> > >>>>
> > >>>> Invalidates are painless, at least compared to a real cache flush, just
> > >>>> set the invalid bit vs actually writing out lines. I thought the issue
> > >>>> was on the map side.
> > >>>>
> > >>>
> > >>> Invalidates aren't painless for us because we have a coherent system cache
> > >>> so clean lines get written out.
> > >>
> > >> That seems very broken, why would clean lines ever need to be written
> > >> out, that defeats the whole point of having the invalidate separate from
> > >> clean. How do you deal with stale cache lines? I guess in your case this
> > >> is what forces you to have to use uncached memory for DMA-able memory.
> > >>
> > >
> > > My understanding is that our ARM invalidate is a clean + invalidate, I had
> > > concerns about the clean lines being written to the the system cache as
> > > part of the 'clean', but the following 'invalidate' would take care of
> > > actually invalidating the lines (so nothign broken).
> > > But i am probably wrong on this and it is probably smart enough not to the
> > > writing of the clean lines.
> > >
> >
> > You are correct that for a lot of ARM cores "invalidate" is always a
> > "clean + invalidate". At first I thought this was kinda silly as there
> > is now no way to mark a dirty line invalid without it getting written
> > out first, but if you think about it any dirty cache-line can be written
> > out (cleaned) at anytime anyway, so this doesn't actually change system
> > behavior. You should just not write to memory (make the line dirty)
> > anything you don't want eventually written out.
> >
> > Point two, it's not just smart enough to not write-out clean lines, it
> > is guaranteed not to write them out by the spec. Otherwise since
> > cache-lines can be randomly filled if those same clean lines got written
> > out on invalidate operations there would be no way to maintain coherency
> > and things would be written over top each other all over the place.
> >
> > > But regardless, targets supporting a coherent system cache is a legitamate
> > > configuration and an invalidate on this configuration does have to go to
> > > the bus to invalidate the system cache (which isn't free) so I dont' think
> > > you can make the assumption that invalidates are cheap so that it is okay
> > > to do them (even if they are not needed) on every dma unmap.
> > >
> >
> > Very true, CMOs need to be broadcast to other coherent masters on a
> > coherent interconnect (and the interconnect itself if it has a cache as
> > well (L3)), so not 100% free, but almost, just the infinitesimal cost of
> > the cache tag check in hardware. If there are no non-coherent devices
> > attached then the CMOs are no-ops, if there are then the data needs to
> > be written out either way, doing it every access like is done with
> > uncached memory (- any write combining) will blow away any saving made
> > from the one less CMO. Either way you lose with uncached mappings of
> > memory. If I'm wrong I would love to know.
> >
>
> From what I understand, the current DMA APIs are not equipped to
> handle having coherent and non-coherent devices attached at the same
> time. The buffer is either in "CPU land" or "Device land", there's no
> smaller granule of "Coherent Device land" or "Non-Coherent Device
> land".
>
> I think if there's devices which are making coherent accesses, and
> devices which are making non-coherent accesses, then we can't support
> them being attached at the same time without some enhancements to the
> APIs.
>
> > >>> And these invalidates can occur on fairly large buffers.
> > >>>
> > >>> That is why we haven't went with using cached ION memory and "tracking CPU
> > >>> access" because it only solves half the problem, ie there isn't a way to
> > >>> safely skip the invalidate (because we can't read the future).
> > >>> Our solution was to go with uncached ION memory (when possible), but as
> > >>> you can see in other discussions upstream support for uncached memory has
> > >>> its own issues.
> > >>>
>
> @Liam, in your problematic use-cases, are both devices detached when
> the buffer moves between them?
>
> 1) dev 1 map, access, unmap
> 2) dev 1 detach
> 3) (maybe) CPU access
> 4) dev 2 attach
> 5) dev 2 map, access
>
> I still think a pretty pragmatic solution is to use
> DMA_ATTR_SKIP_CPU_SYNC until the last device detaches. That won't work
> if your access sequence looks like above...
>

Yes the piplelining case in Android looks like the above.

> ...however, if your sequence looks like above, then you probably need
> to keep at least one of the devices attached anyway.

It would be nice if a device could be kept attached, but that doesn't
always work very well for an pipelining case as there isn't necessarily a
good place for someone to know when to detach it.


> Otherwise, per
> the API, the buffer could get migrated after 2)/before 5). That will
> surely hurt a lot more than an invalidate.
>

You are right, in theory it could but in practice it isn't getting
migrated.
I understand the theoretical case, but it doesn't sounds clean as well to
force clients to stay attached when there isn't a very nice way of knowing
when to have them detach.


> > >>
> > >> Sounds like you need to fix upstream support then, finding a way to drop
> > >> all cacheable mappings of memory you want to make uncached mappings for
> > >> seems to be the only solution.
> > >>
> > >
> > > I think we can probably agree that there woudln't be a good way to remove
> > > cached mappings without causing an unacceptable performance degradation
> > > since it would fragment all the nice 1GB kernel mappings we have.
> > >
> > > So I am trying to find an alternative solution.
> > >
> >
> > I'm not sure there is a better solution. How hard is this solution to
> > implement anyway? The kernel already has to make gaps and cut up that
> > nice 1GB mapping when you make a reserved memory space in the lowmem
> > area, so all the logic is probably already implemented. Just need to
> > allow it to be hooked into from Ion when doing doing the uncached mappings.
> >
>
> I haven't looked recently, but I'm not sure the early memblock code
> can be reused as-is at runtime. I seem to remember it makes a bunch of
> assumptions about the fact that it's running "early".
>
> If CPU uncached mappings of normal system memory is really the way
> forward, I could envisage a heap which maintains a pool of chunks of
> memory which it removed from the kernel mapping. The pool could grow
> (remove more pages from the kernel mapping)/shrink (add them back to
> the kernel mapping) as needed.
>
> John Reitan implemented a compound-page heap, which used compaction to
> get a pool of 2MB contiguous pages. Something like that would at least
> prevent needing full 4kB granularity when removing things from the
> kernel mapping.
>
> Even better, could it somehow be restricted to a region which is
> already fragmented? (e.g. the one which was used for the default CMA
> heap)
>
> Thanks,
> -Brian
>
> > >>>>>
> > >>>>>>>
> > >>>>>>>> [CPU reads/writes to the buffer]
> > >>>>>>>> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no
> > >>>>>>>> devices attached to buffer
> > >>>>>>>> munmap buffer
> > >>>>>>>>
> > >>>>>>>> //buffer is send down the pipeline
> > >>>>>>>> // Buffer is send to video device (who does compression of raw data) and
> > >>>>>>>> writes to a file
> > >>>>>>>> dma_buf_attach
> > >>>>>>>> dma_map_attachment (buffer needs to be cleaned)
> > >>>>>>>> [video device writes to buffer]
> > >>>>>>>> dma_buf_unmap_attachment
> > >>>>>>>> dma_buf_detach (device cannot stay attached because it is being sent down
> > >>>>>>>> the pipeline and Video doesn't know the end of the use case)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU
> > >>>>>>>>>> access then there is no requirement (that I am aware of) for you to call
> > >>>>>>>>>> {begin,end}_cpu_access before passing the buffer to the device and if this
> > >>>>>>>>>> buffer is cached and your device is not IO-coherent then the cache maintenance
> > >>>>>>>>>> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> If I am not doing any CPU access then why do I need CPU cache
> > >>>>>>>>> maintenance on the buffer?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Because ION no longer provides DMA ready memory.
> > >>>>>>>> Take the above example.
> > >>>>>>>>
> > >>>>>>>> ION allocates memory from buddy allocator and requests zeroing.
> > >>>>>>>> Zeros are written to the cache.
> > >>>>>>>>
> > >>>>>>>> You pass the buffer to the camera device which is not IO-coherent.
> > >>>>>>>> The camera devices writes directly to the buffer in DDR.
> > >>>>>>>> Since you didn't clean the buffer a dirty cache line (one of the zeros) is
> > >>>>>>>> evicted from the cache, this zero overwrites data the camera device has
> > >>>>>>>> written which corrupts your data.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> The zeroing *is* a CPU access, therefor it should handle the needed CMO
> > >>>>>>> for CPU access at the time of zeroing.
> > >>>>>>>
> > >>>>>
> > >>>>> Actually that should be at the point of the first non-coherent device
> > >>>>> mapping the buffer right? No point in doing CMO if the future accesses
> > >>>>> are coherent.
> > >>>>
> > >>>> I see your point, as long as the zeroing is guaranteed to be the first
> > >>>> access to this buffer then it should be safe.
> > >>>>
> > >>>> Andrew
> > >>>>
> > >>>>>
> > >>>>> Cheers,
> > >>>>> -Brian
> > >>>>>
> > >>>>>>> Andrew
> > >>>>>>>
> > >>>>>>>> Liam
> > >>>>>>>>
> > >>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > >>>>>>>> a Linux Foundation Collaborative Project
> > >>>>>>>>
> > >>>>
> > >>>
> > >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > >>> a Linux Foundation Collaborative Project
> > >>>
> > >>
> > >
> > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > > a Linux Foundation Collaborative Project
> > >
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2019-01-23 16:55:35

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/22/19 11:33 AM, Sumit Semwal wrote:
> Hello everyone,
>
> Sincere apologies for chiming in a bit late here, but was off due to
> some health issues.
>

Hope you are feeling better friend :)

Looks like this email was a bit broken and you replied again, the
responses are a little different in each email, so I'd like to respond
to bits of both, I'll fix up the formatting.

> Also, adding Daniel Vetter to the mix, since he has been one of the
> core guys who shaped up dma-buf as it is today.
>
> On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis <[email protected]> wrote:
>>
>> On 1/21/19 5:22 AM, Brian Starkey wrote:

[snip]

>>>
>>> Actually I meant in the kernel, in exporters. I haven't seen anyone
>>> using the API as it was intended (defer allocation until first map,
>>> migrate between different attachments, etc.). Mostly, backing storage
>>> seems to get allocated at the point of export, and device mappings are
>>> often held persistently (e.g. the DRM prime code maps the buffer at
>>> import time, and keeps it mapped: drm_gem_prime_import_dev).
>>>
>>
>
> So I suppose some clarification on the 'intended use' part of dma-buf
> about deferred allocation is due, so here it is: (Daniel, please feel
> free to chime in with your opinion here)
>
> - dma-buf was of course designed as a framework to help intelligent
> exporters to defer allocation until first map, and be able to migrate
> backing storage if required etc. At the same time, it is not a
> _requirement_ from any exporter, so exporters so far have just used it
> as a convenient mechanism for zero-copy.
> - ION is one of the few dma-buf exporters in kernel, which satisfies a
> certain set of expectations from its users.
>

The issue here is that Ion is blocking the ability to late allocate, it
expects its heaps to have the memory ready at allocation time. My point
being if the DMA-BUFs intended design was to allow this then Ion should
respect that and also allow the same from its heap exporters.

>> I haven't either, which is a shame as it allows for some really useful
>> management strategies for shared memory resources. I'm working on one
>> such case right now, maybe I'll get to be the first to upstream one :)
>>
> That will be a really good thing! Though perhaps we ought to think if
> for what you're trying to do, is ION the right place, or should you
> have a device-specific exporter, available to users via dma-buf apis?
>

I'm starting to question if Ion is the right place myself..

At a conceptual level I don't believe userspace should be picking the
backing memory type. This is because the right type of backing memory
for a task will change from system to system. The kernel should abstract
away these hardware differences from userspace as much as it can to
allow portable code.

For instance a device may need a contiguous buffer on one system but the
same device on another may have some IOMMU. So which type of memory do
we allocate? Same issue for cacheability and other properties.

What we need is a central allocator with full system knowledge to do the
choosing for us. It seems many agree with the above and I take
inspiration from your cenalloc patchset. The thing I'm not sure about is
letting the device drivers set their constraints, because they also
don't have the full system integration details. For cases where devices
are behind an IOMMU it is easy enough for the device to know, but what
about when we have external MMUs out on the bus for anyone to use (I'm
guessing you remember TILER..).

I would propose the central allocator keep per-system knowledge (or
fetch it from DT, or if this is considered policy then userspace) which
it can use to directly check the attached devices and pick the right memory.

Anyway the central system allocator could handle 90% of cases I can
think of, and this is where Ion comes back in, the other cases would
still require the program to manually pick the right memory (maybe for
performance reasons, etc.).

So my vision is to have Ion as the the main front-end for DMA-BUF
allocations, and expose the central allocator through it (maybe as a
default heap type that can be populated on a per-system basis), but also
have other individual heap types exported for the edge cases where
manual selection is needed like we do now.

Hence why Ion should allow direct control of the dma_buf_ops from the
heaps, so we can build central allocators as Ion heaps.

If I'm off into the weeds here and you have some other ideas I'm all ears.

Andrew

>>> I wasn't aware that CPU access before first device access was
>>> considered an abuse of the API - it seems like a valid thing to want
>>> to do.
>>>
>>
>> That's just it, I don't know if it is an abuse of API, I'm trying to get
>> some clarity on that. If we do want to allow early CPU access then that
>> seems to be in contrast to the idea of deferred allocation until first
>> device map, what is supposed to backing the buffer if no devices have
>> attached or mapped yet? Just some system memory followed by migration on
>> the first attach to the proper backing? Seems too time wasteful to be
>> have a valid use.
>>
>> Maybe it should be up to the exporter if early CPU access is allowed?
>>
>> I'm hoping someone with authority over the DMA-BUF framework can clarify
>> original intentions here.
>
> I don't think dma-buf as a framework stops early CPU access, and the
> exporter can definitely decide on that by implementing
> begin_cpu_access / end_cpu_access operations to not allow early CPU
> access, if it so desires.
>

2019-01-23 17:10:49

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/22/19 9:23 PM, Sumit Semwal wrote:
> Hello everyone,
>
> (Thanks to Dan for letting me know my last email got corrupted :/ -
> resending it here)
>

Hmm, this one seems a bit messed up also (Thunderbird doesn't seem to
like it at least).

[snip]

> - from dma-buf PoV, ION is an exporter of dma-buf buffers, for its users
> that have specific requirements.
>

This is what I'm hoping to change up a little bit, Ion shouldn't be the
exporter, its heaps should be the exporters (manage the dma_buf_ops),
Ion would only do advertising of available heaps and allow allocating
DMA-BUFs from them.

IMO that would clear up the other discussions going on right now about
how Ion should handle different dma-buf syncing tasks, it simply
wouldn't :). Plus Ion core gets slimmed down, maybe even enough for
destaging..

>> I haven't either, which is a shame as it allows for some really useful
>> management strategies for shared memory resources. I'm working on one
>> such case right now, maybe I'll get to be the first to upstream one :)
>>
> Yes, it would, and great that you're looking to be the first one to do it :)
>
>> > I wasn't aware that CPU access before first device access was
>> > considered an abuse of the API - it seems like a valid thing to want
>> > to do.
>> >
>>
>> That's just it, I don't know if it is an abuse of API, I'm trying to get
>> some clarity on that. If we do want to allow early CPU access then that
>> seems to be in contrast to the idea of deferred allocation until first
>> device map, what is supposed to backing the buffer if no devices have
>> attached or mapped yet? Just some system memory followed by migration on
>> the first attach to the proper backing? Seems too time wasteful to be
>> have a valid use.
>>
>> Maybe it should be up to the exporter if early CPU access is allowed?
>>
>> I'm hoping someone with authority over the DMA-BUF framework can clarify
>> original intentions here.
>>
>
> I suppose dma-buf as a framework can't know or decide what the exporter
> wants or can do - whether the exporter wants to use it for 'only
> zero-copy', or do some intelligent things behind the scene, I think
> should be best left to the exporter.
>
> Hope this helps,

Yes, these inputs are very helpful, thanks,
Andrew

> Sumit.
>

2019-01-23 17:14:17

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

Hi Andrew,

On Wed, Jan 23, 2019 at 10:51:24AM -0600, Andrew F. Davis wrote:
> On 1/22/19 11:33 AM, Sumit Semwal wrote:
> > Hello everyone,
> >
> > Sincere apologies for chiming in a bit late here, but was off due to
> > some health issues.
> >
>
> Hope you are feeling better friend :)
>
> Looks like this email was a bit broken and you replied again, the
> responses are a little different in each email, so I'd like to respond
> to bits of both, I'll fix up the formatting.
>
> > Also, adding Daniel Vetter to the mix, since he has been one of the
> > core guys who shaped up dma-buf as it is today.
> >
> > On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis <[email protected]> wrote:
> >>
> >> On 1/21/19 5:22 AM, Brian Starkey wrote:
>
> [snip]
>
> >>>
> >>> Actually I meant in the kernel, in exporters. I haven't seen anyone
> >>> using the API as it was intended (defer allocation until first map,
> >>> migrate between different attachments, etc.). Mostly, backing storage
> >>> seems to get allocated at the point of export, and device mappings are
> >>> often held persistently (e.g. the DRM prime code maps the buffer at
> >>> import time, and keeps it mapped: drm_gem_prime_import_dev).
> >>>
> >>
> >
> > So I suppose some clarification on the 'intended use' part of dma-buf
> > about deferred allocation is due, so here it is: (Daniel, please feel
> > free to chime in with your opinion here)
> >
> > - dma-buf was of course designed as a framework to help intelligent
> > exporters to defer allocation until first map, and be able to migrate
> > backing storage if required etc. At the same time, it is not a
> > _requirement_ from any exporter, so exporters so far have just used it
> > as a convenient mechanism for zero-copy.
> > - ION is one of the few dma-buf exporters in kernel, which satisfies a
> > certain set of expectations from its users.
> >
>
> The issue here is that Ion is blocking the ability to late allocate, it
> expects its heaps to have the memory ready at allocation time. My point
> being if the DMA-BUFs intended design was to allow this then Ion should
> respect that and also allow the same from its heap exporters.
>
> >> I haven't either, which is a shame as it allows for some really useful
> >> management strategies for shared memory resources. I'm working on one
> >> such case right now, maybe I'll get to be the first to upstream one :)
> >>
> > That will be a really good thing! Though perhaps we ought to think if
> > for what you're trying to do, is ION the right place, or should you
> > have a device-specific exporter, available to users via dma-buf apis?
> >
>
> I'm starting to question if Ion is the right place myself..
>
> At a conceptual level I don't believe userspace should be picking the
> backing memory type. This is because the right type of backing memory
> for a task will change from system to system. The kernel should abstract
> away these hardware differences from userspace as much as it can to
> allow portable code.
>
> For instance a device may need a contiguous buffer on one system but the
> same device on another may have some IOMMU. So which type of memory do
> we allocate? Same issue for cacheability and other properties.
>
> What we need is a central allocator with full system knowledge to do the
> choosing for us. It seems many agree with the above and I take
> inspiration from your cenalloc patchset. The thing I'm not sure about is
> letting the device drivers set their constraints, because they also
> don't have the full system integration details. For cases where devices
> are behind an IOMMU it is easy enough for the device to know, but what
> about when we have external MMUs out on the bus for anyone to use (I'm
> guessing you remember TILER..).
>
> I would propose the central allocator keep per-system knowledge (or
> fetch it from DT, or if this is considered policy then userspace) which
> it can use to directly check the attached devices and pick the right memory.
>
> Anyway the central system allocator could handle 90% of cases I can
> think of, and this is where Ion comes back in, the other cases would
> still require the program to manually pick the right memory (maybe for
> performance reasons, etc.).
>
> So my vision is to have Ion as the the main front-end for DMA-BUF
> allocations, and expose the central allocator through it (maybe as a
> default heap type that can be populated on a per-system basis), but also
> have other individual heap types exported for the edge cases where
> manual selection is needed like we do now.
>
> Hence why Ion should allow direct control of the dma_buf_ops from the
> heaps, so we can build central allocators as Ion heaps.
>
> If I'm off into the weeds here and you have some other ideas I'm all ears.
>

This is a topic I've gone around a few times. The crux of it is, as
you know, a central allocator is Really Hard. I don't know what you've
seen/done so far in this area, so please forgive me if this is old hat
to you.

Android's platform-specific Gralloc module actually does a pretty good
job, and because it's platform-specific, it can be simple. That does
have a certain appeal to it over something generic but complex.

It seems that generic gets insanely complicated really quickly -
picking out "compatible" is hard enough, but improving that to pick
out "optimal" is ... well, I've not seen any good proposals for that.

In case you didn't come across it already, the effort which seems to
have gained the most "air-time" recently is
https://github.com/cubanismo/allocator, which is still a userspace
module (perhaps some concepts from there could go into the kernel?),
but makes some attempts at generic constraint solving. It's also not
really moving anywhere at the moment.

Cheers,
-Brian

> Andrew
>
> >>> I wasn't aware that CPU access before first device access was
> >>> considered an abuse of the API - it seems like a valid thing to want
> >>> to do.
> >>>
> >>
> >> That's just it, I don't know if it is an abuse of API, I'm trying to get
> >> some clarity on that. If we do want to allow early CPU access then that
> >> seems to be in contrast to the idea of deferred allocation until first
> >> device map, what is supposed to backing the buffer if no devices have
> >> attached or mapped yet? Just some system memory followed by migration on
> >> the first attach to the proper backing? Seems too time wasteful to be
> >> have a valid use.
> >>
> >> Maybe it should be up to the exporter if early CPU access is allowed?
> >>
> >> I'm hoping someone with authority over the DMA-BUF framework can clarify
> >> original intentions here.
> >
> > I don't think dma-buf as a framework stops early CPU access, and the
> > exporter can definitely decide on that by implementing
> > begin_cpu_access / end_cpu_access operations to not allow early CPU
> > access, if it so desires.
> >

2019-01-24 16:06:41

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/23/19 11:11 AM, Brian Starkey wrote:
> Hi Andrew,
>
> On Wed, Jan 23, 2019 at 10:51:24AM -0600, Andrew F. Davis wrote:
>> On 1/22/19 11:33 AM, Sumit Semwal wrote:
>>> Hello everyone,
>>>
>>> Sincere apologies for chiming in a bit late here, but was off due to
>>> some health issues.
>>>
>>
>> Hope you are feeling better friend :)
>>
>> Looks like this email was a bit broken and you replied again, the
>> responses are a little different in each email, so I'd like to respond
>> to bits of both, I'll fix up the formatting.
>>
>>> Also, adding Daniel Vetter to the mix, since he has been one of the
>>> core guys who shaped up dma-buf as it is today.
>>>
>>> On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis <[email protected]> wrote:
>>>>
>>>> On 1/21/19 5:22 AM, Brian Starkey wrote:
>>
>> [snip]
>>
>>>>>
>>>>> Actually I meant in the kernel, in exporters. I haven't seen anyone
>>>>> using the API as it was intended (defer allocation until first map,
>>>>> migrate between different attachments, etc.). Mostly, backing storage
>>>>> seems to get allocated at the point of export, and device mappings are
>>>>> often held persistently (e.g. the DRM prime code maps the buffer at
>>>>> import time, and keeps it mapped: drm_gem_prime_import_dev).
>>>>>
>>>>
>>>
>>> So I suppose some clarification on the 'intended use' part of dma-buf
>>> about deferred allocation is due, so here it is: (Daniel, please feel
>>> free to chime in with your opinion here)
>>>
>>> - dma-buf was of course designed as a framework to help intelligent
>>> exporters to defer allocation until first map, and be able to migrate
>>> backing storage if required etc. At the same time, it is not a
>>> _requirement_ from any exporter, so exporters so far have just used it
>>> as a convenient mechanism for zero-copy.
>>> - ION is one of the few dma-buf exporters in kernel, which satisfies a
>>> certain set of expectations from its users.
>>>
>>
>> The issue here is that Ion is blocking the ability to late allocate, it
>> expects its heaps to have the memory ready at allocation time. My point
>> being if the DMA-BUFs intended design was to allow this then Ion should
>> respect that and also allow the same from its heap exporters.
>>
>>>> I haven't either, which is a shame as it allows for some really useful
>>>> management strategies for shared memory resources. I'm working on one
>>>> such case right now, maybe I'll get to be the first to upstream one :)
>>>>
>>> That will be a really good thing! Though perhaps we ought to think if
>>> for what you're trying to do, is ION the right place, or should you
>>> have a device-specific exporter, available to users via dma-buf apis?
>>>
>>
>> I'm starting to question if Ion is the right place myself..
>>
>> At a conceptual level I don't believe userspace should be picking the
>> backing memory type. This is because the right type of backing memory
>> for a task will change from system to system. The kernel should abstract
>> away these hardware differences from userspace as much as it can to
>> allow portable code.
>>
>> For instance a device may need a contiguous buffer on one system but the
>> same device on another may have some IOMMU. So which type of memory do
>> we allocate? Same issue for cacheability and other properties.
>>
>> What we need is a central allocator with full system knowledge to do the
>> choosing for us. It seems many agree with the above and I take
>> inspiration from your cenalloc patchset. The thing I'm not sure about is
>> letting the device drivers set their constraints, because they also
>> don't have the full system integration details. For cases where devices
>> are behind an IOMMU it is easy enough for the device to know, but what
>> about when we have external MMUs out on the bus for anyone to use (I'm
>> guessing you remember TILER..).
>>
>> I would propose the central allocator keep per-system knowledge (or
>> fetch it from DT, or if this is considered policy then userspace) which
>> it can use to directly check the attached devices and pick the right memory.
>>
>> Anyway the central system allocator could handle 90% of cases I can
>> think of, and this is where Ion comes back in, the other cases would
>> still require the program to manually pick the right memory (maybe for
>> performance reasons, etc.).
>>
>> So my vision is to have Ion as the the main front-end for DMA-BUF
>> allocations, and expose the central allocator through it (maybe as a
>> default heap type that can be populated on a per-system basis), but also
>> have other individual heap types exported for the edge cases where
>> manual selection is needed like we do now.
>>
>> Hence why Ion should allow direct control of the dma_buf_ops from the
>> heaps, so we can build central allocators as Ion heaps.
>>
>> If I'm off into the weeds here and you have some other ideas I'm all ears.
>>
>
> This is a topic I've gone around a few times. The crux of it is, as
> you know, a central allocator is Really Hard. I don't know what you've
> seen/done so far in this area, so please forgive me if this is old hat
> to you.
>

I'm very new to all this, so any pointers to history in this area are
appreciated.

> Android's platform-specific Gralloc module actually does a pretty good
> job, and because it's platform-specific, it can be simple. That does
> have a certain appeal to it over something generic but complex.
>
> It seems that generic gets insanely complicated really quickly -
> picking out "compatible" is hard enough, but improving that to pick
> out "optimal" is ... well, I've not seen any good proposals for that.
>

I quickly found out the extent of how complicated this can get. I
originally wanted a generic central allocator, but to be complete it
would have to handle every one of everyones uses, so just not going to
happen. So I refocused on simply a central allocator, it didn't need to
be generic and can be made specifically for each given system of ours.
Even that quickly ran into objections when proposed internally, there
will always be edge use-cases that require manual selection of
allocation type (due to performance and resource partitioning). Now I'm
at an optional central allocator plus the standard allocator heaps
provided by Ion, the system specific central allocator being also
exported by Ion for simplicity.

So all I want is to find is "compatible" when needed, leave "optimal"
selection up to the applications that need it.

> In case you didn't come across it already, the effort which seems to
> have gained the most "air-time" recently is
> https://github.com/cubanismo/allocator, which is still a userspace
> module (perhaps some concepts from there could go into the kernel?),
> but makes some attempts at generic constraint solving. It's also not
> really moving anywhere at the moment.
>

Very interesting, I'm going to have to stare at this for a bit.

Thanks,
Andrew

> Cheers,
> -Brian
>
>> Andrew
>>
>>>>> I wasn't aware that CPU access before first device access was
>>>>> considered an abuse of the API - it seems like a valid thing to want
>>>>> to do.
>>>>>
>>>>
>>>> That's just it, I don't know if it is an abuse of API, I'm trying to get
>>>> some clarity on that. If we do want to allow early CPU access then that
>>>> seems to be in contrast to the idea of deferred allocation until first
>>>> device map, what is supposed to backing the buffer if no devices have
>>>> attached or mapped yet? Just some system memory followed by migration on
>>>> the first attach to the proper backing? Seems too time wasteful to be
>>>> have a valid use.
>>>>
>>>> Maybe it should be up to the exporter if early CPU access is allowed?
>>>>
>>>> I'm hoping someone with authority over the DMA-BUF framework can clarify
>>>> original intentions here.
>>>
>>> I don't think dma-buf as a framework stops early CPU access, and the
>>> exporter can definitely decide on that by implementing
>>> begin_cpu_access / end_cpu_access operations to not allow early CPU
>>> access, if it so desires.
>>>

2019-01-24 16:45:32

by Brian Starkey

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On Thu, Jan 24, 2019 at 10:04:46AM -0600, Andrew F. Davis wrote:
> On 1/23/19 11:11 AM, Brian Starkey wrote:

[snip]

>
> I'm very new to all this, so any pointers to history in this area are
> appreciated.
>

[snip]

>
> > In case you didn't come across it already, the effort which seems to
> > have gained the most "air-time" recently is
> > https://github.com/cubanismo/allocator, which is still a userspace
> > module (perhaps some concepts from there could go into the kernel?),
> > but makes some attempts at generic constraint solving. It's also not
> > really moving anywhere at the moment.
> >
>
> Very interesting, I'm going to have to stare at this for a bit.

In which case, some reading material that might be of interest :-)

https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf
https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html

-Brian


2019-02-19 21:40:07

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

On 1/24/19 8:44 AM, Brian Starkey wrote:
> On Thu, Jan 24, 2019 at 10:04:46AM -0600, Andrew F. Davis wrote:
>> On 1/23/19 11:11 AM, Brian Starkey wrote:
>
> [snip]
>
>>
>> I'm very new to all this, so any pointers to history in this area are
>> appreciated.
>>
>
> [snip]
>
>>
>>> In case you didn't come across it already, the effort which seems to
>>> have gained the most "air-time" recently is
>>> https://github.com/cubanismo/allocator, which is still a userspace
>>> module (perhaps some concepts from there could go into the kernel?),
>>> but makes some attempts at generic constraint solving. It's also not
>>> really moving anywhere at the moment.
>>>
>>
>> Very interesting, I'm going to have to stare at this for a bit.
>
> In which case, some reading material that might be of interest :-)
>
> https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf
> https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
> https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
>
> -Brian
>

In some respects this is more a question of "what is the purpose
of Ion". Once upon a time, Ion was going to do constraint solving
but that never really happened and I figured Ion would get deprecated.
People keep coming out of the woodwork with new uses for Ion so
its stuck around. This is why I've primarily focused on Ion as a
framework for exposing available memory types to userspace and leave
the constraint solving to someone else, since that's what most
users seem to want out of Ion ("I know I want memory type X please
give it to me"). That's not to say that this was a perfect or
even the correct approach, just what made the most sense based
on users.

Thanks,
Laura