2021-11-15 14:20:56

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

Hi Jonathan,

This patchset introduces a new userspace interface based on DMABUF
objects, to complement the existing fileio based API.

The advantage of this DMABUF based interface vs. the fileio
interface, is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

The first few patches [01/15] to [03/15] are not really related, but
allow to reduce the size of the patches that introduce the new API.

Patch [04/15] to [06/15] enables write() support to the buffer-dma
implementation of the buffer API, to continue the work done by
Mihail Chindris.

Patches [07/15] to [12/15] introduce the new DMABUF based API.

Patches [13/15] and [14/15] add support for cyclic buffers, only through
the new API. A cyclic buffer will be repeated on the output until the
buffer is disabled.

Patch [15/15] adds documentation about the new API.

For now, the API allows you to alloc DMABUF objects and mmap() them to
read or write the samples. It does not yet allow to import DMABUFs
parented to other subsystems, but that should eventually be possible
once it's wired.

This patchset is inspired by the "mmap interface" that was previously
submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
great if I could get a review from you guys. Alexandru's commit was
signed with his @analog.com address but he doesn't work at ADI anymore,
so I believe I'll need him to sign with a new email.

Cheers,
-Paul

Alexandru Ardelean (1):
iio: buffer-dma: split iio_dma_buffer_fileio_free() function

Paul Cercueil (14):
iio: buffer-dma: Get rid of incoming/outgoing queues
iio: buffer-dma: Remove unused iio_buffer_block struct
iio: buffer-dma: Use round_down() instead of rounddown()
iio: buffer-dma: Enable buffer write support
iio: buffer-dmaengine: Support specifying buffer direction
iio: buffer-dmaengine: Enable write support
iio: core: Add new DMABUF interface infrastructure
iio: buffer-dma: Use DMABUFs instead of custom solution
iio: buffer-dma: Implement new DMABUF based userspace API
iio: buffer-dma: Boost performance using write-combine cache setting
iio: buffer-dmaengine: Support new DMABUF based userspace API
iio: core: Add support for cyclic buffers
iio: buffer-dmaengine: Add support for cyclic buffers
Documentation: iio: Document high-speed DMABUF based API

Documentation/driver-api/dma-buf.rst | 2 +
Documentation/iio/dmabuf_api.rst | 94 +++
Documentation/iio/index.rst | 2 +
drivers/iio/adc/adi-axi-adc.c | 3 +-
drivers/iio/buffer/industrialio-buffer-dma.c | 670 ++++++++++++++----
.../buffer/industrialio-buffer-dmaengine.c | 42 +-
drivers/iio/industrialio-buffer.c | 49 ++
include/linux/iio/buffer-dma.h | 43 +-
include/linux/iio/buffer-dmaengine.h | 5 +-
include/linux/iio/buffer_impl.h | 8 +
include/uapi/linux/iio/buffer.h | 30 +
11 files changed, 783 insertions(+), 165 deletions(-)
create mode 100644 Documentation/iio/dmabuf_api.rst

--
2.33.0



2021-11-15 14:21:30

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

The buffer-dma code was using two queues, incoming and outgoing, to
manage the state of the blocks in use.

While this totally works, it adds some complexity to the code,
especially since the code only manages 2 blocks. It is much easier to
just check each block's state manually, and keep a counter for the next
block to dequeue.

Since the new DMABUF based API wouldn't use these incoming and outgoing
queues anyway, getting rid of them now makes the upcoming changes
simpler.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 68 ++++++++++----------
include/linux/iio/buffer-dma.h | 7 +-
2 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index d348af8b9705..abac88f20104 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(

static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
{
- struct iio_dma_buffer_queue *queue = block->queue;
-
- /*
- * The buffer has already been freed by the application, just drop the
- * reference.
- */
- if (block->state != IIO_BLOCK_STATE_DEAD) {
+ if (block->state != IIO_BLOCK_STATE_DEAD)
block->state = IIO_BLOCK_STATE_DONE;
- list_add_tail(&block->head, &queue->outgoing);
- }
}

/**
@@ -317,11 +309,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
* dead. This means we can reset the lists without having to fear
* corrution.
*/
- INIT_LIST_HEAD(&queue->outgoing);
spin_unlock_irq(&queue->list_lock);

- INIT_LIST_HEAD(&queue->incoming);
-
for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
if (queue->fileio.blocks[i]) {
block = queue->fileio.blocks[i];
@@ -346,7 +335,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
}

block->state = IIO_BLOCK_STATE_QUEUED;
- list_add_tail(&block->head, &queue->incoming);
}

out_unlock:
@@ -401,13 +389,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
struct iio_dev *indio_dev)
{
struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
- struct iio_dma_buffer_block *block, *_block;
+ struct iio_dma_buffer_block *block;
+ unsigned int i;

mutex_lock(&queue->lock);
queue->active = true;
- list_for_each_entry_safe(block, _block, &queue->incoming, head) {
- list_del(&block->head);
- iio_dma_buffer_submit_block(queue, block);
+ queue->fileio.next_dequeue = 0;
+
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ block = queue->fileio.blocks[i];
+
+ if (block->state == IIO_BLOCK_STATE_QUEUED)
+ iio_dma_buffer_submit_block(queue, block);
}
mutex_unlock(&queue->lock);

@@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
struct iio_dma_buffer_block *block)
{
- if (block->state == IIO_BLOCK_STATE_DEAD) {
+ if (block->state == IIO_BLOCK_STATE_DEAD)
iio_buffer_block_put(block);
- } else if (queue->active) {
+ else if (queue->active)
iio_dma_buffer_submit_block(queue, block);
- } else {
+ else
block->state = IIO_BLOCK_STATE_QUEUED;
- list_add_tail(&block->head, &queue->incoming);
- }
}

static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
struct iio_dma_buffer_queue *queue)
{
struct iio_dma_buffer_block *block;
+ unsigned int idx;

spin_lock_irq(&queue->list_lock);
- block = list_first_entry_or_null(&queue->outgoing, struct
- iio_dma_buffer_block, head);
- if (block != NULL) {
- list_del(&block->head);
+
+ idx = queue->fileio.next_dequeue;
+ block = queue->fileio.blocks[idx];
+
+ if (block->state == IIO_BLOCK_STATE_DONE) {
block->state = IIO_BLOCK_STATE_DEQUEUED;
+ idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
+ queue->fileio.next_dequeue = idx;
+ } else {
+ block = NULL;
}
+
spin_unlock_irq(&queue->list_lock);

return block;
@@ -539,6 +537,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
struct iio_dma_buffer_block *block;
size_t data_available = 0;
+ unsigned int i;

/*
* For counting the available bytes we'll use the size of the block not
@@ -552,8 +551,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
data_available += queue->fileio.active_block->size;

spin_lock_irq(&queue->list_lock);
- list_for_each_entry(block, &queue->outgoing, head)
- data_available += block->size;
+
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ block = queue->fileio.blocks[i];
+
+ if (block != queue->fileio.active_block
+ && block->state == IIO_BLOCK_STATE_DONE)
+ data_available += block->size;
+ }
+
spin_unlock_irq(&queue->list_lock);
mutex_unlock(&queue->lock);

@@ -616,9 +622,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
queue->dev = dev;
queue->ops = ops;

- INIT_LIST_HEAD(&queue->incoming);
- INIT_LIST_HEAD(&queue->outgoing);
-
mutex_init(&queue->lock);
spin_lock_init(&queue->list_lock);

@@ -645,11 +648,8 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
continue;
queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
}
- INIT_LIST_HEAD(&queue->outgoing);
spin_unlock_irq(&queue->list_lock);

- INIT_LIST_HEAD(&queue->incoming);
-
for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
if (!queue->fileio.blocks[i])
continue;
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index ff15c61bf319..d4ed5ff39d44 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -78,12 +78,15 @@ struct iio_dma_buffer_block {
* @active_block: Block being used in read()
* @pos: Read offset in the active block
* @block_size: Size of each block
+ * @next_dequeue: index of next block that will be dequeued
*/
struct iio_dma_buffer_queue_fileio {
struct iio_dma_buffer_block *blocks[2];
struct iio_dma_buffer_block *active_block;
size_t pos;
size_t block_size;
+
+ unsigned int next_dequeue;
};

/**
@@ -97,8 +100,6 @@ struct iio_dma_buffer_queue_fileio {
* atomic context as well as blocks on those lists. This is the outgoing queue
* list and typically also a list of active blocks in the part that handles
* the DMA controller
- * @incoming: List of buffers on the incoming queue
- * @outgoing: List of buffers on the outgoing queue
* @active: Whether the buffer is currently active
* @fileio: FileIO state
*/
@@ -109,8 +110,6 @@ struct iio_dma_buffer_queue {

struct mutex lock;
spinlock_t list_lock;
- struct list_head incoming;
- struct list_head outgoing;

bool active;

--
2.33.0


2021-11-15 14:22:16

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct

This structure was never used anywhere, so it can safely be dropped.

It will later be re-introduced as a different structure in a
different header.

Signed-off-by: Paul Cercueil <[email protected]>
---
include/linux/iio/buffer-dma.h | 5 -----
1 file changed, 5 deletions(-)

diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index d4ed5ff39d44..a65a005c4a19 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -17,11 +17,6 @@ struct iio_dma_buffer_queue;
struct iio_dma_buffer_ops;
struct device;

-struct iio_buffer_block {
- u32 size;
- u32 bytes_used;
-};
-
/**
* enum iio_block_state - State of a struct iio_dma_buffer_block
* @IIO_BLOCK_STATE_DEQUEUED: Block is not queued
--
2.33.0


2021-11-15 14:23:47

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()

We know that the buffer's alignment will always be a power of two;
therefore, we can use the faster round_down() macro.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 1ac94c4e9792..f8ce26a24c57 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -67,7 +67,7 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
dma_cookie_t cookie;

block->bytes_used = min(block->size, dmaengine_buffer->max_size);
- block->bytes_used = rounddown(block->bytes_used,
+ block->bytes_used = round_down(block->bytes_used,
dmaengine_buffer->align);

desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
--
2.33.0


2021-11-15 14:24:47

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 04/15] iio: buffer-dma: Enable buffer write support

Adding write support to the buffer-dma code is easy - the write()
function basically needs to do the exact same thing as the read()
function: dequeue a block, read or write the data, enqueue the block
when entirely processed.

Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write()
now both call a function iio_buffer_dma_io(), which will perform this
task.

The .space_available() callback can return the exact same value as the
.data_available() callback for input buffers, since in both cases we
count the exact same thing (the number of bytes in each available
block).

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 75 +++++++++++++++-----
include/linux/iio/buffer-dma.h | 7 ++
2 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index abac88f20104..eeeed6b2e0cf 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -179,7 +179,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
}

block->size = size;
- block->state = IIO_BLOCK_STATE_DEQUEUED;
+ block->bytes_used = size;
+ block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
INIT_LIST_HEAD(&block->head);
kref_init(&block->kref);
@@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
block->state = IIO_BLOCK_STATE_DONE;
}

+static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
+{
+ __poll_t flags;
+
+ if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+ flags = EPOLLIN | EPOLLRDNORM;
+ else
+ flags = EPOLLOUT | EPOLLWRNORM;
+
+ wake_up_interruptible_poll(&queue->buffer.pollq, flags);
+}
+
/**
* iio_dma_buffer_block_done() - Indicate that a block has been completed
* @block: The completed block
@@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
spin_unlock_irqrestore(&queue->list_lock, flags);

iio_buffer_block_put_atomic(block);
- wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
+ iio_dma_buffer_queue_wake(queue);
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);

@@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
}
spin_unlock_irqrestore(&queue->list_lock, flags);

- wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
+ iio_dma_buffer_queue_wake(queue);
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);

@@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
queue->fileio.blocks[i] = block;
}

- block->state = IIO_BLOCK_STATE_QUEUED;
+ if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+ block->state = IIO_BLOCK_STATE_QUEUED;
}

out_unlock:
@@ -467,20 +481,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
return block;
}

-/**
- * iio_dma_buffer_read() - DMA buffer read callback
- * @buffer: Buffer to read form
- * @n: Number of bytes to read
- * @user_buffer: Userspace buffer to copy the data to
- *
- * Should be used as the read callback for iio_buffer_access_ops
- * struct for DMA buffers.
- */
-int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
- char __user *user_buffer)
+static int iio_dma_buffer_io(struct iio_buffer *buffer,
+ size_t n, char __user *user_buffer, bool is_write)
{
struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
struct iio_dma_buffer_block *block;
+ void *addr;
int ret;

if (n < buffer->bytes_per_datum)
@@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
n = rounddown(n, buffer->bytes_per_datum);
if (n > block->bytes_used - queue->fileio.pos)
n = block->bytes_used - queue->fileio.pos;
+ addr = block->vaddr + queue->fileio.pos;

- if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, n)) {
+ if (is_write)
+ ret = !!copy_from_user(addr, user_buffer, n);
+ else
+ ret = !!copy_to_user(user_buffer, addr, n);
+ if (ret) {
ret = -EFAULT;
goto out_unlock;
}
@@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,

if (queue->fileio.pos == block->bytes_used) {
queue->fileio.active_block = NULL;
+ block->bytes_used = block->size;
iio_dma_buffer_enqueue(queue, block);
}

@@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,

return ret;
}
+
+/**
+ * iio_dma_buffer_read() - DMA buffer read callback
+ * @buffer: Buffer to read form
+ * @n: Number of bytes to read
+ * @user_buffer: Userspace buffer to copy the data to
+ *
+ * Should be used as the read callback for iio_buffer_access_ops
+ * struct for DMA buffers.
+ */
+int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
+ char __user *user_buffer)
+{
+ return iio_dma_buffer_io(buffer, n, user_buffer, false);
+}
EXPORT_SYMBOL_GPL(iio_dma_buffer_read);

+/**
+ * iio_dma_buffer_write() - DMA buffer write callback
+ * @buffer: Buffer to read form
+ * @n: Number of bytes to read
+ * @user_buffer: Userspace buffer to copy the data from
+ *
+ * Should be used as the write callback for iio_buffer_access_ops
+ * struct for DMA buffers.
+ */
+int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
+ const char __user *user_buffer)
+{
+ return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, true);
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
+
/**
* iio_dma_buffer_data_available() - DMA buffer data_available callback
* @buf: Buffer to check for data availability
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index a65a005c4a19..09c07d5563c0 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
struct iio_dev *indio_dev);
int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
char __user *user_buffer);
+int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
+ const char __user *user_buffer);
size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, size_t bpd);
int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned int length);
@@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);

+static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
+{
+ return iio_dma_buffer_data_available(buffer);
+}
+
#endif
--
2.33.0


2021-11-15 14:24:52

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction

Update the devm_iio_dmaengine_buffer_setup() function to support
specifying the buffer direction.

Update the iio_dmaengine_buffer_submit() function to handle input
buffers as well as output buffers.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/adc/adi-axi-adc.c | 3 ++-
.../buffer/industrialio-buffer-dmaengine.c | 24 +++++++++++++++----
include/linux/iio/buffer-dmaengine.h | 5 +++-
3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/iio/adc/adi-axi-adc.c b/drivers/iio/adc/adi-axi-adc.c
index a73e3c2d212f..0a6f2c32b1b9 100644
--- a/drivers/iio/adc/adi-axi-adc.c
+++ b/drivers/iio/adc/adi-axi-adc.c
@@ -113,7 +113,8 @@ static int adi_axi_adc_config_dma_buffer(struct device *dev,
dma_name = "rx";

return devm_iio_dmaengine_buffer_setup(indio_dev->dev.parent,
- indio_dev, dma_name);
+ indio_dev, dma_name,
+ IIO_BUFFER_DIRECTION_IN);
}

static int adi_axi_adc_read_raw(struct iio_dev *indio_dev,
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index f8ce26a24c57..ac26b04aa4a9 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -64,14 +64,25 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
struct dmaengine_buffer *dmaengine_buffer =
iio_buffer_to_dmaengine_buffer(&queue->buffer);
struct dma_async_tx_descriptor *desc;
+ enum dma_transfer_direction dma_dir;
+ size_t max_size;
dma_cookie_t cookie;

- block->bytes_used = min(block->size, dmaengine_buffer->max_size);
- block->bytes_used = round_down(block->bytes_used,
- dmaengine_buffer->align);
+ max_size = min(block->size, dmaengine_buffer->max_size);
+ max_size = round_down(max_size, dmaengine_buffer->align);
+
+ if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
+ block->bytes_used = max_size;
+ dma_dir = DMA_DEV_TO_MEM;
+ } else {
+ dma_dir = DMA_MEM_TO_DEV;
+ }
+
+ if (!block->bytes_used || block->bytes_used > max_size)
+ return -EINVAL;

desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
- block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
+ block->phys_addr, block->bytes_used, dma_dir,
DMA_PREP_INTERRUPT);
if (!desc)
return -ENOMEM;
@@ -275,7 +286,8 @@ static struct iio_buffer *devm_iio_dmaengine_buffer_alloc(struct device *dev,
*/
int devm_iio_dmaengine_buffer_setup(struct device *dev,
struct iio_dev *indio_dev,
- const char *channel)
+ const char *channel,
+ enum iio_buffer_direction dir)
{
struct iio_buffer *buffer;

@@ -286,6 +298,8 @@ int devm_iio_dmaengine_buffer_setup(struct device *dev,

indio_dev->modes |= INDIO_BUFFER_HARDWARE;

+ buffer->direction = dir;
+
return iio_device_attach_buffer(indio_dev, buffer);
}
EXPORT_SYMBOL_GPL(devm_iio_dmaengine_buffer_setup);
diff --git a/include/linux/iio/buffer-dmaengine.h b/include/linux/iio/buffer-dmaengine.h
index 5c355be89814..538d0479cdd6 100644
--- a/include/linux/iio/buffer-dmaengine.h
+++ b/include/linux/iio/buffer-dmaengine.h
@@ -7,11 +7,14 @@
#ifndef __IIO_DMAENGINE_H__
#define __IIO_DMAENGINE_H__

+#include <linux/iio/buffer.h>
+
struct iio_dev;
struct device;

int devm_iio_dmaengine_buffer_setup(struct device *dev,
struct iio_dev *indio_dev,
- const char *channel);
+ const char *channel,
+ enum iio_buffer_direction dir);

#endif
--
2.33.0


2021-11-15 14:25:09

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 06/15] iio: buffer-dmaengine: Enable write support

Use the iio_dma_buffer_write() and iio_dma_buffer_space_available()
functions provided by the buffer-dma core, to enable write support in
the buffer-dmaengine code.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index ac26b04aa4a9..5cde8fd81c7f 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -123,12 +123,14 @@ static void iio_dmaengine_buffer_release(struct iio_buffer *buf)

static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
.read = iio_dma_buffer_read,
+ .write = iio_dma_buffer_write,
.set_bytes_per_datum = iio_dma_buffer_set_bytes_per_datum,
.set_length = iio_dma_buffer_set_length,
.request_update = iio_dma_buffer_request_update,
.enable = iio_dma_buffer_enable,
.disable = iio_dma_buffer_disable,
.data_available = iio_dma_buffer_data_available,
+ .space_available = iio_dma_buffer_space_available,
.release = iio_dmaengine_buffer_release,

.modes = INDIO_BUFFER_HARDWARE,
--
2.33.0


2021-11-15 14:25:47

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure

Add the necessary infrastructure to the IIO core to support a new DMABUF
based interface.

The advantage of this new DMABUF based interface vs. the read()
interface, is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

The data in this new DMABUF interface is managed at the granularity of
DMABUF objects. Reducing the granularity from byte level to block level
is done to reduce the userspace-kernelspace synchronization overhead
since performing syscalls for each byte at a few Mbps is just not
feasible.

This of course leads to a slightly increased latency. For this reason an
application can choose the size of the DMABUFs as well as how many it
allocates. E.g. two DMABUFs would be a traditional double buffering
scheme. But using a higher number might be necessary to avoid
underflow/overflow situations in the presence of scheduling latencies.

As part of the interface, 2 new IOCTLs have been added:

IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *):
Each call will allocate a new DMABUF object. The return value (if not
a negative errno value as error) will be the file descriptor of the new
DMABUF.

IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
Place the DMABUF object into the queue pending for hardware process.

These two IOCTLs have to be performed on the IIO buffer's file
descriptor (either opened from the corresponding /dev/iio:deviceX, or
obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl).

To access the data stored in a block by userspace the block must be
mapped to the process's memory. This is done by calling mmap() on the
DMABUF's file descriptor.

Before accessing the data through the map, you must use the
DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
DMA_BUF_SYNC_START flag, to make sure that the data is available.
This call may block until the hardware is done with this block. Once
you are done reading or writing the data, you must use this ioctl again
with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
kernel's queue.

If you need to know when the hardware is done with a DMABUF, you can
poll its file descriptor for the EPOLLOUT event.

Finally, to destroy a DMABUF object, simply call close() on its file
descriptor.

A typical workflow for the new interface is:

for block in blocks:
DMABUF_ALLOC block
mmap block

enable buffer

while !done
for block in blocks:
DMABUF_ENQUEUE block

DMABUF_SYNC_START block
process data
DMABUF_SYNC_END block

disable buffer

for block in blocks:
close block

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/industrialio-buffer.c | 44 +++++++++++++++++++++++++++++++
include/linux/iio/buffer_impl.h | 8 ++++++
include/uapi/linux/iio/buffer.h | 29 ++++++++++++++++++++
3 files changed, 81 insertions(+)

diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index e180728914c0..30910e6c2346 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -17,6 +17,7 @@
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/slab.h>
+#include <linux/mm.h>
#include <linux/poll.h>
#include <linux/sched/signal.h>

@@ -1585,12 +1586,55 @@ static long iio_device_buffer_getfd(struct iio_dev *indio_dev, unsigned long arg
return ret;
}

+static int iio_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+ struct iio_dmabuf __user *user_buf)
+{
+ struct iio_dmabuf dmabuf;
+
+ if (!buffer->access->enqueue_dmabuf)
+ return -EPERM;
+
+ if (copy_from_user(&dmabuf, user_buf, sizeof(dmabuf)))
+ return -EFAULT;
+
+ if (dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
+ return -EINVAL;
+
+ return buffer->access->enqueue_dmabuf(buffer, &dmabuf);
+}
+
+static int iio_buffer_alloc_dmabuf(struct iio_buffer *buffer,
+ struct iio_dmabuf_alloc_req __user *user_req)
+{
+ struct iio_dmabuf_alloc_req req;
+
+ if (!buffer->access->alloc_dmabuf)
+ return -EPERM;
+
+ if (copy_from_user(&req, user_req, sizeof(req)))
+ return -EFAULT;
+
+ if (req.resv)
+ return -EINVAL;
+
+ return buffer->access->alloc_dmabuf(buffer, &req);
+}
+
static long iio_device_buffer_ioctl(struct iio_dev *indio_dev, struct file *filp,
unsigned int cmd, unsigned long arg)
{
+ struct iio_dev_buffer_pair *ib = filp->private_data;
+ struct iio_buffer *buffer = ib->buffer;
+ void __user *_arg = (void __user *)arg;
+
switch (cmd) {
case IIO_BUFFER_GET_FD_IOCTL:
return iio_device_buffer_getfd(indio_dev, arg);
+ case IIO_BUFFER_DMABUF_ALLOC_IOCTL:
+ return iio_buffer_alloc_dmabuf(buffer, _arg);
+ case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
+ /* TODO: support non-blocking enqueue operation */
+ return iio_buffer_enqueue_dmabuf(buffer, _arg);
default:
return IIO_IOCTL_UNHANDLED;
}
diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h
index e2ca8ea23e19..728541bc2c63 100644
--- a/include/linux/iio/buffer_impl.h
+++ b/include/linux/iio/buffer_impl.h
@@ -39,6 +39,9 @@ struct iio_buffer;
* device stops sampling. Calles are balanced with @enable.
* @release: called when the last reference to the buffer is dropped,
* should free all resources allocated by the buffer.
+ * @alloc_dmabuf: called from userspace via ioctl to allocate one DMABUF.
+ * @enqueue_dmabuf: called from userspace via ioctl to queue this DMABUF
+ * object to this buffer. Requires a valid DMABUF fd.
* @modes: Supported operating modes by this buffer type
* @flags: A bitmask combination of INDIO_BUFFER_FLAG_*
*
@@ -68,6 +71,11 @@ struct iio_buffer_access_funcs {

void (*release)(struct iio_buffer *buffer);

+ int (*alloc_dmabuf)(struct iio_buffer *buffer,
+ struct iio_dmabuf_alloc_req *req);
+ int (*enqueue_dmabuf)(struct iio_buffer *buffer,
+ struct iio_dmabuf *block);
+
unsigned int modes;
unsigned int flags;
};
diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
index 13939032b3f6..e4621b926262 100644
--- a/include/uapi/linux/iio/buffer.h
+++ b/include/uapi/linux/iio/buffer.h
@@ -5,6 +5,35 @@
#ifndef _UAPI_IIO_BUFFER_H_
#define _UAPI_IIO_BUFFER_H_

+#include <linux/types.h>
+
+#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS 0x00000000
+
+/**
+ * struct iio_dmabuf_alloc_req - Descriptor for allocating IIO DMABUFs
+ * @size: the size of a single DMABUF
+ * @resv: reserved
+ */
+struct iio_dmabuf_alloc_req {
+ __u64 size;
+ __u64 resv;
+};
+
+/**
+ * struct iio_dmabuf - Descriptor for a single IIO DMABUF object
+ * @fd: file descriptor of the DMABUF object
+ * @flags: one or more IIO_BUFFER_DMABUF_* flags
+ * @bytes_used: number of bytes used in this DMABUF for the data transfer.
+ * If zero, the full buffer is used.
+ */
+struct iio_dmabuf {
+ __u32 fd;
+ __u32 flags;
+ __u64 bytes_used;
+};
+
#define IIO_BUFFER_GET_FD_IOCTL _IOWR('i', 0x91, int)
+#define IIO_BUFFER_DMABUF_ALLOC_IOCTL _IOW('i', 0x92, struct iio_dmabuf_alloc_req)
+#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL _IOW('i', 0x93, struct iio_dmabuf)

#endif /* _UAPI_IIO_BUFFER_H_ */
--
2.33.0


2021-11-15 14:26:09

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function

From: Alexandru Ardelean <[email protected]>

A part of the logic in the iio_dma_buffer_exit() is required for the change
to add mmap support to IIO buffers.
This change splits the logic into a separate function, which will be
re-used later.

Signed-off-by: Alexandru Ardelean <[email protected]>
Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 39 +++++++++++---------
1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index eeeed6b2e0cf..eb8cfd3af030 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -358,6 +358,27 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);

+static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
+{
+ unsigned int i;
+
+ spin_lock_irq(&queue->list_lock);
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ if (!queue->fileio.blocks[i])
+ continue;
+ queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
+ }
+ spin_unlock_irq(&queue->list_lock);
+
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ if (!queue->fileio.blocks[i])
+ continue;
+ iio_buffer_block_put(queue->fileio.blocks[i]);
+ queue->fileio.blocks[i] = NULL;
+ }
+ queue->fileio.active_block = NULL;
+}
+
static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
struct iio_dma_buffer_block *block)
{
@@ -681,25 +702,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
*/
void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
{
- unsigned int i;
-
mutex_lock(&queue->lock);

- spin_lock_irq(&queue->list_lock);
- for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
- if (!queue->fileio.blocks[i])
- continue;
- queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
- }
- spin_unlock_irq(&queue->list_lock);
-
- for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
- if (!queue->fileio.blocks[i])
- continue;
- iio_buffer_block_put(queue->fileio.blocks[i]);
- queue->fileio.blocks[i] = NULL;
- }
- queue->fileio.active_block = NULL;
+ iio_dma_buffer_fileio_free(queue);
queue->ops = NULL;

mutex_unlock(&queue->lock);
--
2.33.0


2021-11-15 14:26:51

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 09/15] iio: buffer-dma: Use DMABUFs instead of custom solution

Enhance the current fileio code by using DMABUF objects instead of
custom buffers.

This adds more code than it removes, but:
- a lot of the complexity can be dropped, e.g. custom kref and
iio_buffer_block_put_atomic() are not needed anymore;
- it will be much easier to introduce an API to export these DMABUF
objects to userspace in a following patch.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 196 ++++++++++++-------
include/linux/iio/buffer-dma.h | 8 +-
2 files changed, 127 insertions(+), 77 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index eb8cfd3af030..adb20434f2d2 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -14,6 +14,7 @@
#include <linux/poll.h>
#include <linux/iio/buffer_impl.h>
#include <linux/iio/buffer-dma.h>
+#include <linux/dma-buf.h>
#include <linux/dma-mapping.h>
#include <linux/sizes.h>

@@ -90,104 +91,150 @@
* callback is called from within the custom callback.
*/

-static void iio_buffer_block_release(struct kref *kref)
-{
- struct iio_dma_buffer_block *block = container_of(kref,
- struct iio_dma_buffer_block, kref);
-
- WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
-
- dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
- block->vaddr, block->phys_addr);
-
- iio_buffer_put(&block->queue->buffer);
- kfree(block);
-}
-
-static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
-{
- kref_get(&block->kref);
-}
-
-static void iio_buffer_block_put(struct iio_dma_buffer_block *block)
-{
- kref_put(&block->kref, iio_buffer_block_release);
-}
-
-/*
- * dma_free_coherent can sleep, hence we need to take some special care to be
- * able to drop a reference from an atomic context.
- */
-static LIST_HEAD(iio_dma_buffer_dead_blocks);
-static DEFINE_SPINLOCK(iio_dma_buffer_dead_blocks_lock);
-
-static void iio_dma_buffer_cleanup_worker(struct work_struct *work)
-{
- struct iio_dma_buffer_block *block, *_block;
- LIST_HEAD(block_list);
-
- spin_lock_irq(&iio_dma_buffer_dead_blocks_lock);
- list_splice_tail_init(&iio_dma_buffer_dead_blocks, &block_list);
- spin_unlock_irq(&iio_dma_buffer_dead_blocks_lock);
-
- list_for_each_entry_safe(block, _block, &block_list, head)
- iio_buffer_block_release(&block->kref);
-}
-static DECLARE_WORK(iio_dma_buffer_cleanup_work, iio_dma_buffer_cleanup_worker);
-
-static void iio_buffer_block_release_atomic(struct kref *kref)
-{
+struct iio_buffer_dma_buf_attachment {
+ struct scatterlist sgl;
+ struct sg_table sg_table;
struct iio_dma_buffer_block *block;
- unsigned long flags;
-
- block = container_of(kref, struct iio_dma_buffer_block, kref);
-
- spin_lock_irqsave(&iio_dma_buffer_dead_blocks_lock, flags);
- list_add_tail(&block->head, &iio_dma_buffer_dead_blocks);
- spin_unlock_irqrestore(&iio_dma_buffer_dead_blocks_lock, flags);
-
- schedule_work(&iio_dma_buffer_cleanup_work);
-}
-
-/*
- * Version of iio_buffer_block_put() that can be called from atomic context
- */
-static void iio_buffer_block_put_atomic(struct iio_dma_buffer_block *block)
-{
- kref_put(&block->kref, iio_buffer_block_release_atomic);
-}
+};

static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf)
{
return container_of(buf, struct iio_dma_buffer_queue, buffer);
}

+static struct iio_buffer_dma_buf_attachment *
+to_iio_buffer_dma_buf_attachment(struct sg_table *table)
+{
+ return container_of(table, struct iio_buffer_dma_buf_attachment, sg_table);
+}
+
+static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
+{
+ get_dma_buf(block->dmabuf);
+}
+
+static void iio_buffer_block_put(struct iio_dma_buffer_block *block)
+{
+ dma_buf_put(block->dmabuf);
+}
+
+static int iio_buffer_dma_buf_attach(struct dma_buf *dbuf,
+ struct dma_buf_attachment *at)
+{
+ at->priv = dbuf->priv;
+
+ return 0;
+}
+
+static struct sg_table *iio_buffer_dma_buf_map(struct dma_buf_attachment *at,
+ enum dma_data_direction dma_dir)
+{
+ struct iio_dma_buffer_block *block = at->priv;
+ struct iio_buffer_dma_buf_attachment *dba;
+ int ret;
+
+ dba = kzalloc(sizeof(*dba), GFP_KERNEL);
+ if (!dba)
+ return ERR_PTR(-ENOMEM);
+
+ sg_init_one(&dba->sgl, block->vaddr, PAGE_ALIGN(block->size));
+ dba->sg_table.sgl = &dba->sgl;
+ dba->sg_table.nents = 1;
+ dba->block = block;
+
+ ret = dma_map_sgtable(at->dev, &dba->sg_table, dma_dir, 0);
+ if (ret) {
+ kfree(dba);
+ return ERR_PTR(ret);
+ }
+
+ return &dba->sg_table;
+}
+
+static void iio_buffer_dma_buf_unmap(struct dma_buf_attachment *at,
+ struct sg_table *sg_table,
+ enum dma_data_direction dma_dir)
+{
+ struct iio_buffer_dma_buf_attachment *dba =
+ to_iio_buffer_dma_buf_attachment(sg_table);
+
+ dma_unmap_sgtable(at->dev, &dba->sg_table, dma_dir, 0);
+ kfree(dba);
+}
+
+static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
+{
+ struct iio_dma_buffer_block *block = dbuf->priv;
+ struct iio_dma_buffer_queue *queue = block->queue;
+
+ WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
+
+ mutex_lock(&queue->lock);
+
+ dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
+ block->vaddr, block->phys_addr);
+
+ kfree(block);
+
+ mutex_unlock(&queue->lock);
+ iio_buffer_put(&queue->buffer);
+}
+
+
+static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
+ .attach = iio_buffer_dma_buf_attach,
+ .map_dma_buf = iio_buffer_dma_buf_map,
+ .unmap_dma_buf = iio_buffer_dma_buf_unmap,
+ .release = iio_buffer_dma_buf_release,
+};
+
static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
struct iio_dma_buffer_queue *queue, size_t size)
{
struct iio_dma_buffer_block *block;
+ DEFINE_DMA_BUF_EXPORT_INFO(einfo);
+ struct dma_buf *dmabuf;
+ int err;

block = kzalloc(sizeof(*block), GFP_KERNEL);
if (!block)
- return NULL;
+ return ERR_PTR(-ENOMEM);

block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
&block->phys_addr, GFP_KERNEL);
if (!block->vaddr) {
- kfree(block);
- return NULL;
+ err = -ENOMEM;
+ goto err_free_block;
}

+ einfo.ops = &iio_dma_buffer_dmabuf_ops;
+ einfo.size = PAGE_ALIGN(size);
+ einfo.priv = block;
+ einfo.flags = O_RDWR;
+
+ dmabuf = dma_buf_export(&einfo);
+ if (IS_ERR(dmabuf)) {
+ err = PTR_ERR(dmabuf);
+ goto err_free_dma;
+ }
+
+ block->dmabuf = dmabuf;
block->size = size;
block->bytes_used = size;
block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
INIT_LIST_HEAD(&block->head);
- kref_init(&block->kref);

iio_buffer_get(&queue->buffer);

return block;
+
+err_free_dma:
+ dma_free_coherent(queue->dev, PAGE_ALIGN(size),
+ block->vaddr, block->phys_addr);
+err_free_block:
+ kfree(block);
+ return ERR_PTR(err);
}

static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
@@ -224,7 +271,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
_iio_dma_buffer_block_done(block);
spin_unlock_irqrestore(&queue->list_lock, flags);

- iio_buffer_block_put_atomic(block);
+ iio_buffer_block_put(block);
iio_dma_buffer_queue_wake(queue);
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
@@ -250,7 +297,8 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
list_del(&block->head);
block->bytes_used = 0;
_iio_dma_buffer_block_done(block);
- iio_buffer_block_put_atomic(block);
+
+ iio_buffer_block_put(block);
}
spin_unlock_irqrestore(&queue->list_lock, flags);

@@ -340,11 +388,13 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)

if (!block) {
block = iio_dma_buffer_alloc_block(queue, size);
- if (!block) {
- ret = -ENOMEM;
+ if (IS_ERR(block)) {
+ ret = PTR_ERR(block);
goto out_unlock;
}
queue->fileio.blocks[i] = block;
+
+ iio_buffer_block_get(block);
}

if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 09c07d5563c0..22effd6cfbb6 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -8,7 +8,6 @@
#define __INDUSTRIALIO_DMA_BUFFER_H__

#include <linux/list.h>
-#include <linux/kref.h>
#include <linux/spinlock.h>
#include <linux/mutex.h>
#include <linux/iio/buffer_impl.h>
@@ -16,6 +15,7 @@
struct iio_dma_buffer_queue;
struct iio_dma_buffer_ops;
struct device;
+struct dma_buf;

/**
* enum iio_block_state - State of a struct iio_dma_buffer_block
@@ -41,8 +41,8 @@ enum iio_block_state {
* @vaddr: Virutal address of the blocks memory
* @phys_addr: Physical address of the blocks memory
* @queue: Parent DMA buffer queue
- * @kref: kref used to manage the lifetime of block
* @state: Current state of the block
+ * @dmabuf: Underlying DMABUF object
*/
struct iio_dma_buffer_block {
/* May only be accessed by the owner of the block */
@@ -58,13 +58,13 @@ struct iio_dma_buffer_block {
size_t size;
struct iio_dma_buffer_queue *queue;

- /* Must not be accessed outside the core. */
- struct kref kref;
/*
* Must not be accessed outside the core. Access needs to hold
* queue->list_lock if the block is not owned by the core.
*/
enum iio_block_state state;
+
+ struct dma_buf *dmabuf;
};

/**
--
2.33.0


2021-11-15 14:28:10

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 10/15] iio: buffer-dma: Implement new DMABUF based userspace API

Implement the two functions iio_dma_buffer_alloc_dmabuf() and
iio_dma_buffer_enqueue_dmabuf(), as well as all the necesary bits to
enable userspace access to the DMABUF objects.

These two functions are exported as GPL symbols so that IIO buffer
implementations can support the new DMABUF based userspace API.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 273 ++++++++++++++++++-
include/linux/iio/buffer-dma.h | 13 +
2 files changed, 279 insertions(+), 7 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index adb20434f2d2..92356ee02f30 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -15,7 +15,9 @@
#include <linux/iio/buffer_impl.h>
#include <linux/iio/buffer-dma.h>
#include <linux/dma-buf.h>
+#include <linux/dma-fence.h>
#include <linux/dma-mapping.h>
+#include <linux/dma-resv.h>
#include <linux/sizes.h>

/*
@@ -97,6 +99,18 @@ struct iio_buffer_dma_buf_attachment {
struct iio_dma_buffer_block *block;
};

+struct iio_buffer_dma_fence {
+ struct dma_fence base;
+ struct iio_dma_buffer_block *block;
+ spinlock_t lock;
+};
+
+static struct iio_buffer_dma_fence *
+to_iio_buffer_dma_fence(struct dma_fence *fence)
+{
+ return container_of(fence, struct iio_buffer_dma_fence, base);
+}
+
static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf)
{
return container_of(buf, struct iio_dma_buffer_queue, buffer);
@@ -118,6 +132,48 @@ static void iio_buffer_block_put(struct iio_dma_buffer_block *block)
dma_buf_put(block->dmabuf);
}

+static const char *
+iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
+{
+ struct iio_buffer_dma_fence *iio_fence = to_iio_buffer_dma_fence(fence);
+
+ return dev_name(iio_fence->block->queue->dev);
+}
+
+static void iio_buffer_dma_fence_release(struct dma_fence *fence)
+{
+ struct iio_buffer_dma_fence *iio_fence = to_iio_buffer_dma_fence(fence);
+
+ kfree(iio_fence);
+}
+
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
+ .get_driver_name = iio_buffer_dma_fence_get_driver_name,
+ .get_timeline_name = iio_buffer_dma_fence_get_driver_name,
+ .release = iio_buffer_dma_fence_release,
+};
+
+static struct dma_fence *
+iio_dma_buffer_create_dma_fence(struct iio_dma_buffer_block *block)
+{
+ struct iio_buffer_dma_fence *fence;
+ u64 ctx;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return ERR_PTR(-ENOMEM);
+
+ fence->block = block;
+ spin_lock_init(&fence->lock);
+
+ ctx = dma_fence_context_alloc(1);
+
+ dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
+ &fence->lock, ctx, 0);
+
+ return &fence->base;
+}
+
static int iio_buffer_dma_buf_attach(struct dma_buf *dbuf,
struct dma_buf_attachment *at)
{
@@ -162,10 +218,26 @@ static void iio_buffer_dma_buf_unmap(struct dma_buf_attachment *at,
kfree(dba);
}

+static int iio_buffer_dma_buf_mmap(struct dma_buf *dbuf,
+ struct vm_area_struct *vma)
+{
+ struct iio_dma_buffer_block *block = dbuf->priv;
+ struct device *dev = block->queue->dev;
+
+ vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+ if (vma->vm_ops->open)
+ vma->vm_ops->open(vma);
+
+ return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
+ virt_to_page(block->vaddr));
+}
+
static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
{
struct iio_dma_buffer_block *block = dbuf->priv;
struct iio_dma_buffer_queue *queue = block->queue;
+ bool is_fileio = block->fileio;

WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);

@@ -176,20 +248,51 @@ static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)

kfree(block);

+ queue->num_blocks--;
+ if (is_fileio)
+ queue->num_fileio_blocks--;
mutex_unlock(&queue->lock);
iio_buffer_put(&queue->buffer);
}

+static int iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
+ enum dma_data_direction dma_dir)
+{
+ struct iio_dma_buffer_block *block = dbuf->priv;
+ struct device *dev = block->queue->dev;
+
+ /* We only need to invalidate the cache for input buffers */
+ if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+ dma_sync_single_for_cpu(dev, block->phys_addr, block->size, dma_dir);
+
+ return 0;
+}
+
+static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
+ enum dma_data_direction dma_dir)
+{
+ struct iio_dma_buffer_block *block = dbuf->priv;
+ struct device *dev = block->queue->dev;
+
+ /* We only need to sync the cache for output buffers */
+ if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
+ dma_sync_single_for_device(dev, block->phys_addr, block->size, dma_dir);
+
+ return 0;
+}

static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
.attach = iio_buffer_dma_buf_attach,
.map_dma_buf = iio_buffer_dma_buf_map,
.unmap_dma_buf = iio_buffer_dma_buf_unmap,
+ .mmap = iio_buffer_dma_buf_mmap,
.release = iio_buffer_dma_buf_release,
+ .begin_cpu_access = iio_buffer_dma_buf_begin_cpu_access,
+ .end_cpu_access = iio_buffer_dma_buf_end_cpu_access,
};

static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
- struct iio_dma_buffer_queue *queue, size_t size)
+ struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
{
struct iio_dma_buffer_block *block;
DEFINE_DMA_BUF_EXPORT_INFO(einfo);
@@ -223,10 +326,15 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
block->bytes_used = size;
block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
+ block->fileio = fileio;
INIT_LIST_HEAD(&block->head);

iio_buffer_get(&queue->buffer);

+ queue->num_blocks++;
+ if (fileio)
+ queue->num_fileio_blocks++;
+
return block;

err_free_dma:
@@ -265,14 +373,22 @@ static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
{
struct iio_dma_buffer_queue *queue = block->queue;
+ struct dma_resv *resv = block->dmabuf->resv;
+ struct dma_fence *fence;
unsigned long flags;

spin_lock_irqsave(&queue->list_lock, flags);
_iio_dma_buffer_block_done(block);
spin_unlock_irqrestore(&queue->list_lock, flags);

+ fence = dma_resv_excl_fence(resv);
+ if (fence)
+ dma_fence_signal(fence);
+ dma_resv_unlock(resv);
+
iio_buffer_block_put(block);
- iio_dma_buffer_queue_wake(queue);
+ if (queue->fileio.enabled)
+ iio_dma_buffer_queue_wake(queue);
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);

@@ -298,6 +414,8 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
block->bytes_used = 0;
_iio_dma_buffer_block_done(block);

+ if (dma_resv_is_locked(block->dmabuf->resv))
+ dma_resv_unlock(block->dmabuf->resv);
iio_buffer_block_put(block);
}
spin_unlock_irqrestore(&queue->list_lock, flags);
@@ -323,6 +441,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block)
}
}

+static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue)
+{
+ return queue->fileio.enabled ||
+ queue->num_blocks == queue->num_fileio_blocks;
+}
+
/**
* iio_dma_buffer_request_update() - DMA buffer request_update callback
* @buffer: The buffer which to request an update
@@ -349,6 +473,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)

mutex_lock(&queue->lock);

+ queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
+
+ /* If DMABUFs were created, disable fileio interface */
+ if (!queue->fileio.enabled)
+ goto out_unlock;
+
/* Allocations are page aligned */
if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size))
try_reuse = true;
@@ -387,7 +517,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
}

if (!block) {
- block = iio_dma_buffer_alloc_block(queue, size);
+ block = iio_dma_buffer_alloc_block(queue, size, true);
if (IS_ERR(block)) {
ret = PTR_ERR(block);
goto out_unlock;
@@ -444,6 +574,8 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,

block->state = IIO_BLOCK_STATE_ACTIVE;
iio_buffer_block_get(block);
+ dma_resv_lock(block->dmabuf->resv, NULL);
+
ret = queue->ops->submit(queue, block);
if (ret) {
/*
@@ -480,12 +612,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
mutex_lock(&queue->lock);
queue->active = true;
queue->fileio.next_dequeue = 0;
+ queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);

- for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
- block = queue->fileio.blocks[i];
+ dev_dbg(queue->dev, "Buffer enabled in %s mode\n",
+ queue->fileio.enabled ? "fileio" : "dmabuf");

- if (block->state == IIO_BLOCK_STATE_QUEUED)
- iio_dma_buffer_submit_block(queue, block);
+ if (queue->fileio.enabled) {
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ block = queue->fileio.blocks[i];
+
+ if (block->state == IIO_BLOCK_STATE_QUEUED)
+ iio_dma_buffer_submit_block(queue, block);
+ }
}
mutex_unlock(&queue->lock);

@@ -507,6 +645,7 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);

mutex_lock(&queue->lock);
+ queue->fileio.enabled = false;
queue->active = false;

if (queue->ops && queue->ops->abort)
@@ -565,6 +704,11 @@ static int iio_dma_buffer_io(struct iio_buffer *buffer,

mutex_lock(&queue->lock);

+ if (!queue->fileio.enabled) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
if (!queue->fileio.active_block) {
block = iio_dma_buffer_dequeue(queue);
if (block == NULL) {
@@ -681,6 +825,121 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available);

+int iio_dma_buffer_alloc_dmabuf(struct iio_buffer *buffer,
+ struct iio_dmabuf_alloc_req *req)
+{
+ struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
+ struct iio_dma_buffer_block *block;
+ int ret = 0;
+
+ mutex_lock(&queue->lock);
+
+ /*
+ * If the buffer is enabled and in fileio mode new blocks can't be
+ * allocated.
+ */
+ if (queue->fileio.enabled) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ if (!req->size || req->size > SIZE_MAX) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /* Free memory that might be in use for fileio mode */
+ iio_dma_buffer_fileio_free(queue);
+
+ block = iio_dma_buffer_alloc_block(queue, req->size, false);
+ if (IS_ERR(block)) {
+ ret = PTR_ERR(block);
+ goto out_unlock;
+ }
+
+ ret = dma_buf_fd(block->dmabuf, O_CLOEXEC);
+ if (ret < 0) {
+ dma_buf_put(block->dmabuf);
+ goto out_unlock;
+ }
+
+out_unlock:
+ mutex_unlock(&queue->lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_alloc_dmabuf);
+
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+ struct iio_dmabuf *iio_dmabuf)
+{
+ struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
+ struct iio_dma_buffer_block *dma_block;
+ struct dma_fence *fence;
+ struct dma_buf *dmabuf;
+ int ret = 0;
+
+ mutex_lock(&queue->lock);
+
+ /* If in fileio mode buffers can't be enqueued. */
+ if (queue->fileio.enabled) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ dmabuf = dma_buf_get(iio_dmabuf->fd);
+ if (IS_ERR(dmabuf)) {
+ ret = PTR_ERR(dmabuf);
+ goto out_unlock;
+ }
+
+ if (dmabuf->ops != &iio_dma_buffer_dmabuf_ops) {
+ dev_err(queue->dev, "importing DMABUFs from other drivers is not yet supported.\n");
+ ret = -EINVAL;
+ goto out_dma_buf_put;
+ }
+
+ dma_block = dmabuf->priv;
+
+ if (iio_dmabuf->bytes_used > dma_block->size) {
+ ret = -EINVAL;
+ goto out_dma_buf_put;
+ }
+
+ dma_block->bytes_used = iio_dmabuf->bytes_used ?: dma_block->size;
+
+ switch (dma_block->state) {
+ case IIO_BLOCK_STATE_QUEUED:
+ /* Nothing to do */
+ goto out_unlock;
+ case IIO_BLOCK_STATE_DONE:
+ break;
+ default:
+ ret = -EBUSY;
+ goto out_dma_buf_put;
+ }
+
+ fence = iio_dma_buffer_create_dma_fence(dma_block);
+ if (IS_ERR(fence)) {
+ ret = PTR_ERR(fence);
+ goto out_dma_buf_put;
+ }
+
+ dma_resv_lock(dmabuf->resv, NULL);
+ dma_resv_add_excl_fence(dmabuf->resv, fence);
+ dma_resv_unlock(dmabuf->resv);
+
+ iio_dma_buffer_enqueue(queue, dma_block);
+
+out_dma_buf_put:
+ dma_buf_put(dmabuf);
+out_unlock:
+ mutex_unlock(&queue->lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_enqueue_dmabuf);
+
/**
* iio_dma_buffer_set_bytes_per_datum() - DMA buffer set_bytes_per_datum callback
* @buffer: Buffer to set the bytes-per-datum for
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 22effd6cfbb6..85e55fe35282 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -42,6 +42,7 @@ enum iio_block_state {
* @phys_addr: Physical address of the blocks memory
* @queue: Parent DMA buffer queue
* @state: Current state of the block
+ * @fileio: True if this buffer is used for fileio mode
* @dmabuf: Underlying DMABUF object
*/
struct iio_dma_buffer_block {
@@ -64,6 +65,7 @@ struct iio_dma_buffer_block {
*/
enum iio_block_state state;

+ bool fileio;
struct dma_buf *dmabuf;
};

@@ -74,6 +76,7 @@ struct iio_dma_buffer_block {
* @pos: Read offset in the active block
* @block_size: Size of each block
* @next_dequeue: index of next block that will be dequeued
+ * @enabled: Whether the buffer is operating in fileio mode
*/
struct iio_dma_buffer_queue_fileio {
struct iio_dma_buffer_block *blocks[2];
@@ -82,6 +85,7 @@ struct iio_dma_buffer_queue_fileio {
size_t block_size;

unsigned int next_dequeue;
+ bool enabled;
};

/**
@@ -96,6 +100,8 @@ struct iio_dma_buffer_queue_fileio {
* list and typically also a list of active blocks in the part that handles
* the DMA controller
* @active: Whether the buffer is currently active
+ * @num_blocks: Total number of blocks in the queue
+ * @num_fileio_blocks: Number of blocks used for fileio interface
* @fileio: FileIO state
*/
struct iio_dma_buffer_queue {
@@ -107,6 +113,8 @@ struct iio_dma_buffer_queue {
spinlock_t list_lock;

bool active;
+ unsigned int num_blocks;
+ unsigned int num_fileio_blocks;

struct iio_dma_buffer_queue_fileio fileio;
};
@@ -149,4 +157,9 @@ static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
return iio_dma_buffer_data_available(buffer);
}

+int iio_dma_buffer_alloc_dmabuf(struct iio_buffer *buffer,
+ struct iio_dmabuf_alloc_req *req);
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+ struct iio_dmabuf *dmabuf);
+
#endif
--
2.33.0


2021-11-15 14:28:18

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

We can be certain that the input buffers will only be accessed by
userspace for reading, and output buffers will mostly be accessed by
userspace for writing.

Therefore, it makes more sense to use only fully cached input buffers,
and to use the write-combine cache coherency setting for output buffers.

This boosts performance, as the data written to the output buffers does
not have to be sync'd for coherency. It will halve performance if the
userspace application tries to read from the output buffer, but this
should never happen.

Since we don't need to sync the cache when disabling CPU access either
for input buffers or output buffers, the .end_cpu_access() callback can
be dropped completely.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 82 +++++++++++++-------
1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index 92356ee02f30..fb39054d8c15 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct dma_buf *dbuf,
if (vma->vm_ops->open)
vma->vm_ops->open(vma);

- return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
- virt_to_page(block->vaddr));
+ if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
+ /*
+ * With an input buffer, userspace will only read the data and
+ * never write. We can mmap the buffer fully cached.
+ */
+ return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
+ virt_to_page(block->vaddr));
+ } else {
+ /*
+ * With an output buffer, userspace will only write the data
+ * and should rarely (if never) read from it. It is better to
+ * use write-combine in this case.
+ */
+ return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
+ vma->vm_end - vma->vm_start);
+ }
+}
+
+static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block *block)
+{
+ struct device *dev = block->queue->dev;
+ size_t size = PAGE_ALIGN(block->size);
+
+ if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+ dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
+ else
+ dma_free_wc(dev, size, block->vaddr, block->phys_addr);
}

static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
@@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)

mutex_lock(&queue->lock);

- dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
- block->vaddr, block->phys_addr);
-
+ iio_dma_buffer_free_dmamem(block);
kfree(block);

queue->num_blocks--;
@@ -268,19 +291,6 @@ static int iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
return 0;
}

-static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
- enum dma_data_direction dma_dir)
-{
- struct iio_dma_buffer_block *block = dbuf->priv;
- struct device *dev = block->queue->dev;
-
- /* We only need to sync the cache for output buffers */
- if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
- dma_sync_single_for_device(dev, block->phys_addr, block->size, dma_dir);
-
- return 0;
-}
-
static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
.attach = iio_buffer_dma_buf_attach,
.map_dma_buf = iio_buffer_dma_buf_map,
@@ -288,9 +298,28 @@ static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
.mmap = iio_buffer_dma_buf_mmap,
.release = iio_buffer_dma_buf_release,
.begin_cpu_access = iio_buffer_dma_buf_begin_cpu_access,
- .end_cpu_access = iio_buffer_dma_buf_end_cpu_access,
};

+static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block *block)
+{
+ struct device *dev = block->queue->dev;
+ size_t size = PAGE_ALIGN(block->size);
+
+ if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
+ block->vaddr = dma_alloc_coherent(dev, size,
+ &block->phys_addr,
+ GFP_KERNEL);
+ } else {
+ block->vaddr = dma_alloc_wc(dev, size,
+ &block->phys_addr,
+ GFP_KERNEL);
+ }
+ if (!block->vaddr)
+ return -ENOMEM;
+
+ return 0;
+}
+
static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
{
@@ -303,12 +332,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
if (!block)
return ERR_PTR(-ENOMEM);

- block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
- &block->phys_addr, GFP_KERNEL);
- if (!block->vaddr) {
- err = -ENOMEM;
+ block->size = size;
+ block->queue = queue;
+
+ err = iio_dma_buffer_alloc_dmamem(block);
+ if (err)
goto err_free_block;
- }

einfo.ops = &iio_dma_buffer_dmabuf_ops;
einfo.size = PAGE_ALIGN(size);
@@ -322,10 +351,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
}

block->dmabuf = dmabuf;
- block->size = size;
block->bytes_used = size;
block->state = IIO_BLOCK_STATE_DONE;
- block->queue = queue;
block->fileio = fileio;
INIT_LIST_HEAD(&block->head);

@@ -338,8 +365,7 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
return block;

err_free_dma:
- dma_free_coherent(queue->dev, PAGE_ALIGN(size),
- block->vaddr, block->phys_addr);
+ iio_dma_buffer_free_dmamem(block);
err_free_block:
kfree(block);
return ERR_PTR(err);
--
2.33.0


2021-11-15 14:33:55

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API

Use the functions provided by the buffer-dma core to implement the
DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dmaengine.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 5cde8fd81c7f..57a8b2e4ba3c 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -133,6 +133,9 @@ static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
.space_available = iio_dma_buffer_space_available,
.release = iio_dmaengine_buffer_release,

+ .alloc_dmabuf = iio_dma_buffer_alloc_dmabuf,
+ .enqueue_dmabuf = iio_dma_buffer_enqueue_dmabuf,
+
.modes = INDIO_BUFFER_HARDWARE,
.flags = INDIO_BUFFER_FLAG_FIXED_WATERMARK,
};
--
2.33.0


2021-11-15 14:38:52

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> Hi Jonathan,
>
> This patchset introduces a new userspace interface based on DMABUF
> objects, to complement the existing fileio based API.
>
> The advantage of this DMABUF based interface vs. the fileio
> interface, is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.
>
> The first few patches [01/15] to [03/15] are not really related, but
> allow to reduce the size of the patches that introduce the new API.
>
> Patch [04/15] to [06/15] enables write() support to the buffer-dma
> implementation of the buffer API, to continue the work done by
> Mihail Chindris.
>
> Patches [07/15] to [12/15] introduce the new DMABUF based API.
>
> Patches [13/15] and [14/15] add support for cyclic buffers, only through
> the new API. A cyclic buffer will be repeated on the output until the
> buffer is disabled.
>
> Patch [15/15] adds documentation about the new API.
>
> For now, the API allows you to alloc DMABUF objects and mmap() them to
> read or write the samples. It does not yet allow to import DMABUFs
> parented to other subsystems, but that should eventually be possible
> once it's wired.
>
> This patchset is inspired by the "mmap interface" that was previously
> submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
> great if I could get a review from you guys. Alexandru's commit was
> signed with his @analog.com address but he doesn't work at ADI anymore,
> so I believe I'll need him to sign with a new email.

Why dma-buf? dma-buf looks like something super generic and useful, until
you realize that there's a metric ton of gpu/accelerator bagage piled in.
So unless buffer sharing with a gpu/video/accel/whatever device is the
goal here, and it's just for a convenient way to get at buffer handles,
this doesn't sound like a good idea.

Also if the idea is to this with gpus/accelerators then I'd really like to
see the full thing, since most likely at that point you also want
dma_fence. And once we talk dma_fence things get truly horrible from a
locking pov :-( Or well, just highly constrained and I get to review what
iio is doing with these buffers to make sure it all fits.

Cheers, Daniel

>
> Cheers,
> -Paul
>
> Alexandru Ardelean (1):
> iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>
> Paul Cercueil (14):
> iio: buffer-dma: Get rid of incoming/outgoing queues
> iio: buffer-dma: Remove unused iio_buffer_block struct
> iio: buffer-dma: Use round_down() instead of rounddown()
> iio: buffer-dma: Enable buffer write support
> iio: buffer-dmaengine: Support specifying buffer direction
> iio: buffer-dmaengine: Enable write support
> iio: core: Add new DMABUF interface infrastructure
> iio: buffer-dma: Use DMABUFs instead of custom solution
> iio: buffer-dma: Implement new DMABUF based userspace API
> iio: buffer-dma: Boost performance using write-combine cache setting
> iio: buffer-dmaengine: Support new DMABUF based userspace API
> iio: core: Add support for cyclic buffers
> iio: buffer-dmaengine: Add support for cyclic buffers
> Documentation: iio: Document high-speed DMABUF based API
>
> Documentation/driver-api/dma-buf.rst | 2 +
> Documentation/iio/dmabuf_api.rst | 94 +++
> Documentation/iio/index.rst | 2 +
> drivers/iio/adc/adi-axi-adc.c | 3 +-
> drivers/iio/buffer/industrialio-buffer-dma.c | 670 ++++++++++++++----
> .../buffer/industrialio-buffer-dmaengine.c | 42 +-
> drivers/iio/industrialio-buffer.c | 49 ++
> include/linux/iio/buffer-dma.h | 43 +-
> include/linux/iio/buffer-dmaengine.h | 5 +-
> include/linux/iio/buffer_impl.h | 8 +
> include/uapi/linux/iio/buffer.h | 30 +
> 11 files changed, 783 insertions(+), 165 deletions(-)
> create mode 100644 Documentation/iio/dmabuf_api.rst
>
> --
> 2.33.0
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-11-15 14:58:30

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

Hi Daniel,

Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter
<[email protected]> a ?crit :
> On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
>> Hi Jonathan,
>>
>> This patchset introduces a new userspace interface based on DMABUF
>> objects, to complement the existing fileio based API.
>>
>> The advantage of this DMABUF based interface vs. the fileio
>> interface, is that it avoids an extra copy of the data between the
>> kernel and userspace. This is particularly userful for high-speed
>> devices which produce several megabytes or even gigabytes of data
>> per
>> second.
>>
>> The first few patches [01/15] to [03/15] are not really related, but
>> allow to reduce the size of the patches that introduce the new API.
>>
>> Patch [04/15] to [06/15] enables write() support to the buffer-dma
>> implementation of the buffer API, to continue the work done by
>> Mihail Chindris.
>>
>> Patches [07/15] to [12/15] introduce the new DMABUF based API.
>>
>> Patches [13/15] and [14/15] add support for cyclic buffers, only
>> through
>> the new API. A cyclic buffer will be repeated on the output until
>> the
>> buffer is disabled.
>>
>> Patch [15/15] adds documentation about the new API.
>>
>> For now, the API allows you to alloc DMABUF objects and mmap() them
>> to
>> read or write the samples. It does not yet allow to import DMABUFs
>> parented to other subsystems, but that should eventually be possible
>> once it's wired.
>>
>> This patchset is inspired by the "mmap interface" that was
>> previously
>> submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would
>> be
>> great if I could get a review from you guys. Alexandru's commit was
>> signed with his @analog.com address but he doesn't work at ADI
>> anymore,
>> so I believe I'll need him to sign with a new email.
>
> Why dma-buf? dma-buf looks like something super generic and useful,
> until
> you realize that there's a metric ton of gpu/accelerator bagage piled
> in.
> So unless buffer sharing with a gpu/video/accel/whatever device is the
> goal here, and it's just for a convenient way to get at buffer
> handles,
> this doesn't sound like a good idea.

Good question. The first reason is that a somewhat similar API was
intented before[1], but refused upstream as it was kind of re-inventing
the wheel.

The second reason, is that we want to be able to share buffers too, not
with gpu/video but with the network (zctap) and in the future with USB
(functionFS) too.

[1]:
https://lore.kernel.org/linux-iio/[email protected]/T/

> Also if the idea is to this with gpus/accelerators then I'd really
> like to
> see the full thing, since most likely at that point you also want
> dma_fence. And once we talk dma_fence things get truly horrible from a
> locking pov :-( Or well, just highly constrained and I get to review
> what
> iio is doing with these buffers to make sure it all fits.

There is some dma_fence action in patch #10, which is enough for the
userspace apps to use the API.

What "horribleness" are we talking about here? It doesn't look that
scary to me, but I certainly don't have the complete picture.

Cheers,
-Paul

> Cheers, Daniel
>
>>
>> Cheers,
>> -Paul
>>
>> Alexandru Ardelean (1):
>> iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>>
>> Paul Cercueil (14):
>> iio: buffer-dma: Get rid of incoming/outgoing queues
>> iio: buffer-dma: Remove unused iio_buffer_block struct
>> iio: buffer-dma: Use round_down() instead of rounddown()
>> iio: buffer-dma: Enable buffer write support
>> iio: buffer-dmaengine: Support specifying buffer direction
>> iio: buffer-dmaengine: Enable write support
>> iio: core: Add new DMABUF interface infrastructure
>> iio: buffer-dma: Use DMABUFs instead of custom solution
>> iio: buffer-dma: Implement new DMABUF based userspace API
>> iio: buffer-dma: Boost performance using write-combine cache
>> setting
>> iio: buffer-dmaengine: Support new DMABUF based userspace API
>> iio: core: Add support for cyclic buffers
>> iio: buffer-dmaengine: Add support for cyclic buffers
>> Documentation: iio: Document high-speed DMABUF based API
>>
>> Documentation/driver-api/dma-buf.rst | 2 +
>> Documentation/iio/dmabuf_api.rst | 94 +++
>> Documentation/iio/index.rst | 2 +
>> drivers/iio/adc/adi-axi-adc.c | 3 +-
>> drivers/iio/buffer/industrialio-buffer-dma.c | 670
>> ++++++++++++++----
>> .../buffer/industrialio-buffer-dmaengine.c | 42 +-
>> drivers/iio/industrialio-buffer.c | 49 ++
>> include/linux/iio/buffer-dma.h | 43 +-
>> include/linux/iio/buffer-dmaengine.h | 5 +-
>> include/linux/iio/buffer_impl.h | 8 +
>> include/uapi/linux/iio/buffer.h | 30 +
>> 11 files changed, 783 insertions(+), 165 deletions(-)
>> create mode 100644 Documentation/iio/dmabuf_api.rst
>>
>> --
>> 2.33.0
>>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



2021-11-16 08:23:15

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <[email protected]> wrote:
>
> This structure was never used anywhere, so it can safely be dropped.
>
> It will later be re-introduced as a different structure in a
> different header.

Reviewed-by: Alexandru Ardelean <[email protected]>

>
> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> include/linux/iio/buffer-dma.h | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index d4ed5ff39d44..a65a005c4a19 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -17,11 +17,6 @@ struct iio_dma_buffer_queue;
> struct iio_dma_buffer_ops;
> struct device;
>
> -struct iio_buffer_block {
> - u32 size;
> - u32 bytes_used;
> -};
> -
> /**
> * enum iio_block_state - State of a struct iio_dma_buffer_block
> * @IIO_BLOCK_STATE_DEQUEUED: Block is not queued
> --
> 2.33.0
>

2021-11-16 08:23:37

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <[email protected]> wrote:
>
> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
>
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
>
> Since the new DMABUF based API wouldn't use these incoming and outgoing
> queues anyway, getting rid of them now makes the upcoming changes
> simpler.
>

Reviewed-by: Alexandru Ardelean <[email protected]>

> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 68 ++++++++++----------
> include/linux/iio/buffer-dma.h | 7 +-
> 2 files changed, 37 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index d348af8b9705..abac88f20104 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>
> static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> {
> - struct iio_dma_buffer_queue *queue = block->queue;
> -
> - /*
> - * The buffer has already been freed by the application, just drop the
> - * reference.
> - */
> - if (block->state != IIO_BLOCK_STATE_DEAD) {
> + if (block->state != IIO_BLOCK_STATE_DEAD)
> block->state = IIO_BLOCK_STATE_DONE;
> - list_add_tail(&block->head, &queue->outgoing);
> - }
> }
>
> /**
> @@ -317,11 +309,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> * dead. This means we can reset the lists without having to fear
> * corrution.
> */
> - INIT_LIST_HEAD(&queue->outgoing);
> spin_unlock_irq(&queue->list_lock);
>
> - INIT_LIST_HEAD(&queue->incoming);
> -
> for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> if (queue->fileio.blocks[i]) {
> block = queue->fileio.blocks[i];
> @@ -346,7 +335,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> }
>
> block->state = IIO_BLOCK_STATE_QUEUED;
> - list_add_tail(&block->head, &queue->incoming);
> }
>
> out_unlock:
> @@ -401,13 +389,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
> struct iio_dev *indio_dev)
> {
> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> - struct iio_dma_buffer_block *block, *_block;
> + struct iio_dma_buffer_block *block;
> + unsigned int i;
>
> mutex_lock(&queue->lock);
> queue->active = true;
> - list_for_each_entry_safe(block, _block, &queue->incoming, head) {
> - list_del(&block->head);
> - iio_dma_buffer_submit_block(queue, block);
> + queue->fileio.next_dequeue = 0;
> +
> + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> + block = queue->fileio.blocks[i];
> +
> + if (block->state == IIO_BLOCK_STATE_QUEUED)
> + iio_dma_buffer_submit_block(queue, block);
> }
> mutex_unlock(&queue->lock);
>
> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
> static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
> struct iio_dma_buffer_block *block)
> {
> - if (block->state == IIO_BLOCK_STATE_DEAD) {
> + if (block->state == IIO_BLOCK_STATE_DEAD)
> iio_buffer_block_put(block);
> - } else if (queue->active) {
> + else if (queue->active)
> iio_dma_buffer_submit_block(queue, block);
> - } else {
> + else
> block->state = IIO_BLOCK_STATE_QUEUED;
> - list_add_tail(&block->head, &queue->incoming);
> - }
> }
>
> static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
> struct iio_dma_buffer_queue *queue)
> {
> struct iio_dma_buffer_block *block;
> + unsigned int idx;
>
> spin_lock_irq(&queue->list_lock);
> - block = list_first_entry_or_null(&queue->outgoing, struct
> - iio_dma_buffer_block, head);
> - if (block != NULL) {
> - list_del(&block->head);
> +
> + idx = queue->fileio.next_dequeue;
> + block = queue->fileio.blocks[idx];
> +
> + if (block->state == IIO_BLOCK_STATE_DONE) {
> block->state = IIO_BLOCK_STATE_DEQUEUED;
> + idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
> + queue->fileio.next_dequeue = idx;
> + } else {
> + block = NULL;
> }
> +
> spin_unlock_irq(&queue->list_lock);
>
> return block;
> @@ -539,6 +537,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
> struct iio_dma_buffer_block *block;
> size_t data_available = 0;
> + unsigned int i;
>
> /*
> * For counting the available bytes we'll use the size of the block not
> @@ -552,8 +551,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
> data_available += queue->fileio.active_block->size;
>
> spin_lock_irq(&queue->list_lock);
> - list_for_each_entry(block, &queue->outgoing, head)
> - data_available += block->size;
> +
> + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> + block = queue->fileio.blocks[i];
> +
> + if (block != queue->fileio.active_block
> + && block->state == IIO_BLOCK_STATE_DONE)
> + data_available += block->size;
> + }
> +
> spin_unlock_irq(&queue->list_lock);
> mutex_unlock(&queue->lock);
>
> @@ -616,9 +622,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
> queue->dev = dev;
> queue->ops = ops;
>
> - INIT_LIST_HEAD(&queue->incoming);
> - INIT_LIST_HEAD(&queue->outgoing);
> -
> mutex_init(&queue->lock);
> spin_lock_init(&queue->list_lock);
>
> @@ -645,11 +648,8 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
> continue;
> queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> }
> - INIT_LIST_HEAD(&queue->outgoing);
> spin_unlock_irq(&queue->list_lock);
>
> - INIT_LIST_HEAD(&queue->incoming);
> -
> for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> if (!queue->fileio.blocks[i])
> continue;
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index ff15c61bf319..d4ed5ff39d44 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -78,12 +78,15 @@ struct iio_dma_buffer_block {
> * @active_block: Block being used in read()
> * @pos: Read offset in the active block
> * @block_size: Size of each block
> + * @next_dequeue: index of next block that will be dequeued
> */
> struct iio_dma_buffer_queue_fileio {
> struct iio_dma_buffer_block *blocks[2];
> struct iio_dma_buffer_block *active_block;
> size_t pos;
> size_t block_size;
> +
> + unsigned int next_dequeue;
> };
>
> /**
> @@ -97,8 +100,6 @@ struct iio_dma_buffer_queue_fileio {
> * atomic context as well as blocks on those lists. This is the outgoing queue
> * list and typically also a list of active blocks in the part that handles
> * the DMA controller
> - * @incoming: List of buffers on the incoming queue
> - * @outgoing: List of buffers on the outgoing queue
> * @active: Whether the buffer is currently active
> * @fileio: FileIO state
> */
> @@ -109,8 +110,6 @@ struct iio_dma_buffer_queue {
>
> struct mutex lock;
> spinlock_t list_lock;
> - struct list_head incoming;
> - struct list_head outgoing;
>
> bool active;
>
> --
> 2.33.0
>

2021-11-16 08:26:44

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <[email protected]> wrote:
>
> We know that the buffer's alignment will always be a power of two;
> therefore, we can use the faster round_down() macro.
>

Reviewed-by: Alexandru Ardelean <[email protected]>

> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index 1ac94c4e9792..f8ce26a24c57 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -67,7 +67,7 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> dma_cookie_t cookie;
>
> block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> - block->bytes_used = rounddown(block->bytes_used,
> + block->bytes_used = round_down(block->bytes_used,
> dmaengine_buffer->align);
>
> desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> --
> 2.33.0
>

2021-11-16 08:53:08

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <[email protected]> wrote:
>
> Adding write support to the buffer-dma code is easy - the write()
> function basically needs to do the exact same thing as the read()
> function: dequeue a block, read or write the data, enqueue the block
> when entirely processed.
>
> Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write()
> now both call a function iio_buffer_dma_io(), which will perform this
> task.
>
> The .space_available() callback can return the exact same value as the
> .data_available() callback for input buffers, since in both cases we
> count the exact same thing (the number of bytes in each available
> block).
>

A few comments inline from me.
Otherwise fine from me.


Reviewed-by: Alexandru Ardelean <[email protected]>



> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 75 +++++++++++++++-----
> include/linux/iio/buffer-dma.h | 7 ++
> 2 files changed, 66 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index abac88f20104..eeeed6b2e0cf 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> }
>
> block->size = size;
> - block->state = IIO_BLOCK_STATE_DEQUEUED;
> + block->bytes_used = size;
> + block->state = IIO_BLOCK_STATE_DONE;

This initial state change looks like it could use a comment in the git log.
Maybe it's fine, but I can't remember 100% about it.

> block->queue = queue;
> INIT_LIST_HEAD(&block->head);
> kref_init(&block->kref);
> @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> block->state = IIO_BLOCK_STATE_DONE;
> }
>
> +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
> +{
> + __poll_t flags;
> +
> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> + flags = EPOLLIN | EPOLLRDNORM;
> + else
> + flags = EPOLLOUT | EPOLLWRNORM;
> +
> + wake_up_interruptible_poll(&queue->buffer.pollq, flags);
> +}
> +
> /**
> * iio_dma_buffer_block_done() - Indicate that a block has been completed
> * @block: The completed block
> @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> spin_unlock_irqrestore(&queue->list_lock, flags);
>
> iio_buffer_block_put_atomic(block);
> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> + iio_dma_buffer_queue_wake(queue);
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
>
> @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
> }
> spin_unlock_irqrestore(&queue->list_lock, flags);
>
> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> + iio_dma_buffer_queue_wake(queue);
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
>
> @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> queue->fileio.blocks[i] = block;
> }
>
> - block->state = IIO_BLOCK_STATE_QUEUED;
> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> + block->state = IIO_BLOCK_STATE_QUEUED;
> }
>
> out_unlock:
> @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
> return block;
> }
>
> -/**
> - * iio_dma_buffer_read() - DMA buffer read callback
> - * @buffer: Buffer to read form
> - * @n: Number of bytes to read
> - * @user_buffer: Userspace buffer to copy the data to
> - *
> - * Should be used as the read callback for iio_buffer_access_ops
> - * struct for DMA buffers.
> - */
> -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> - char __user *user_buffer)
> +static int iio_dma_buffer_io(struct iio_buffer *buffer,
> + size_t n, char __user *user_buffer, bool is_write)
> {
> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> struct iio_dma_buffer_block *block;
> + void *addr;
> int ret;
>
> if (n < buffer->bytes_per_datum)
> @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> n = rounddown(n, buffer->bytes_per_datum);
> if (n > block->bytes_used - queue->fileio.pos)
> n = block->bytes_used - queue->fileio.pos;
> + addr = block->vaddr + queue->fileio.pos;
>
> - if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, n)) {
> + if (is_write)
> + ret = !!copy_from_user(addr, user_buffer, n);
> + else
> + ret = !!copy_to_user(user_buffer, addr, n);
> + if (ret) {
> ret = -EFAULT;
> goto out_unlock;
> }
> @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>
> if (queue->fileio.pos == block->bytes_used) {
> queue->fileio.active_block = NULL;
> + block->bytes_used = block->size;
> iio_dma_buffer_enqueue(queue, block);
> }
>
> @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>
> return ret;
> }
> +
> +/**
> + * iio_dma_buffer_read() - DMA buffer read callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data to
> + *
> + * Should be used as the read callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> + char __user *user_buffer)
> +{
> + return iio_dma_buffer_io(buffer, n, user_buffer, false);
> +}
> EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
>
> +/**
> + * iio_dma_buffer_write() - DMA buffer write callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data from
> + *
> + * Should be used as the write callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> + const char __user *user_buffer)
> +{
> + return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, true);
> +}
> +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
> +
> /**
> * iio_dma_buffer_data_available() - DMA buffer data_available callback
> * @buf: Buffer to check for data availability
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index a65a005c4a19..09c07d5563c0 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
> struct iio_dev *indio_dev);
> int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> char __user *user_buffer);
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> + const char __user *user_buffer);
> size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
> int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, size_t bpd);
> int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned int length);
> @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
> void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
> void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
>
> +static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
> +{
> + return iio_dma_buffer_data_available(buffer);
> +}

Maybe re-using iio_dma_buffer_data_available() (and updating the
return type) would have been a little neater than wrapping this.
Also fine like this though.

> +
> #endif
> --
> 2.33.0
>

2021-11-16 08:54:14

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction

On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <[email protected]> wrote:
>
> Update the devm_iio_dmaengine_buffer_setup() function to support
> specifying the buffer direction.
>
> Update the iio_dmaengine_buffer_submit() function to handle input
> buffers as well as output buffers.
>


Reviewed-by: Alexandru Ardelean <[email protected]>


> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/adc/adi-axi-adc.c | 3 ++-
> .../buffer/industrialio-buffer-dmaengine.c | 24 +++++++++++++++----
> include/linux/iio/buffer-dmaengine.h | 5 +++-
> 3 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/iio/adc/adi-axi-adc.c b/drivers/iio/adc/adi-axi-adc.c
> index a73e3c2d212f..0a6f2c32b1b9 100644
> --- a/drivers/iio/adc/adi-axi-adc.c
> +++ b/drivers/iio/adc/adi-axi-adc.c
> @@ -113,7 +113,8 @@ static int adi_axi_adc_config_dma_buffer(struct device *dev,
> dma_name = "rx";
>
> return devm_iio_dmaengine_buffer_setup(indio_dev->dev.parent,
> - indio_dev, dma_name);
> + indio_dev, dma_name,
> + IIO_BUFFER_DIRECTION_IN);
> }
>
> static int adi_axi_adc_read_raw(struct iio_dev *indio_dev,
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index f8ce26a24c57..ac26b04aa4a9 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -64,14 +64,25 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> struct dmaengine_buffer *dmaengine_buffer =
> iio_buffer_to_dmaengine_buffer(&queue->buffer);
> struct dma_async_tx_descriptor *desc;
> + enum dma_transfer_direction dma_dir;
> + size_t max_size;
> dma_cookie_t cookie;
>
> - block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> - block->bytes_used = round_down(block->bytes_used,
> - dmaengine_buffer->align);
> + max_size = min(block->size, dmaengine_buffer->max_size);
> + max_size = round_down(max_size, dmaengine_buffer->align);
> +
> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> + block->bytes_used = max_size;
> + dma_dir = DMA_DEV_TO_MEM;
> + } else {
> + dma_dir = DMA_MEM_TO_DEV;
> + }
> +
> + if (!block->bytes_used || block->bytes_used > max_size)
> + return -EINVAL;
>
> desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> - block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
> + block->phys_addr, block->bytes_used, dma_dir,
> DMA_PREP_INTERRUPT);
> if (!desc)
> return -ENOMEM;
> @@ -275,7 +286,8 @@ static struct iio_buffer *devm_iio_dmaengine_buffer_alloc(struct device *dev,
> */
> int devm_iio_dmaengine_buffer_setup(struct device *dev,
> struct iio_dev *indio_dev,
> - const char *channel)
> + const char *channel,
> + enum iio_buffer_direction dir)
> {
> struct iio_buffer *buffer;
>
> @@ -286,6 +298,8 @@ int devm_iio_dmaengine_buffer_setup(struct device *dev,
>
> indio_dev->modes |= INDIO_BUFFER_HARDWARE;
>
> + buffer->direction = dir;
> +
> return iio_device_attach_buffer(indio_dev, buffer);
> }
> EXPORT_SYMBOL_GPL(devm_iio_dmaengine_buffer_setup);
> diff --git a/include/linux/iio/buffer-dmaengine.h b/include/linux/iio/buffer-dmaengine.h
> index 5c355be89814..538d0479cdd6 100644
> --- a/include/linux/iio/buffer-dmaengine.h
> +++ b/include/linux/iio/buffer-dmaengine.h
> @@ -7,11 +7,14 @@
> #ifndef __IIO_DMAENGINE_H__
> #define __IIO_DMAENGINE_H__
>
> +#include <linux/iio/buffer.h>
> +
> struct iio_dev;
> struct device;
>
> int devm_iio_dmaengine_buffer_setup(struct device *dev,
> struct iio_dev *indio_dev,
> - const char *channel);
> + const char *channel,
> + enum iio_buffer_direction dir);
>
> #endif
> --
> 2.33.0
>

2021-11-16 08:56:13

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 06/15] iio: buffer-dmaengine: Enable write support

On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <[email protected]> wrote:
>
> Use the iio_dma_buffer_write() and iio_dma_buffer_space_available()
> functions provided by the buffer-dma core, to enable write support in
> the buffer-dmaengine code.
>

This is a bit related to the comment about
iio_dma_buffer_space_available() in patch 4

But otherwise:

Reviewed-by: Alexandru Ardelean <[email protected]>


> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index ac26b04aa4a9..5cde8fd81c7f 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -123,12 +123,14 @@ static void iio_dmaengine_buffer_release(struct iio_buffer *buf)
>
> static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
> .read = iio_dma_buffer_read,
> + .write = iio_dma_buffer_write,
> .set_bytes_per_datum = iio_dma_buffer_set_bytes_per_datum,
> .set_length = iio_dma_buffer_set_length,
> .request_update = iio_dma_buffer_request_update,
> .enable = iio_dma_buffer_enable,
> .disable = iio_dma_buffer_disable,
> .data_available = iio_dma_buffer_data_available,
> + .space_available = iio_dma_buffer_space_available,
> .release = iio_dmaengine_buffer_release,
>
> .modes = INDIO_BUFFER_HARDWARE,
> --
> 2.33.0
>

2021-11-16 10:59:44

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function

On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <[email protected]> wrote:
>
> From: Alexandru Ardelean <[email protected]>
>
> A part of the logic in the iio_dma_buffer_exit() is required for the change
> to add mmap support to IIO buffers.
> This change splits the logic into a separate function, which will be
> re-used later.
>

Not sure how the protocol is here, since my old @analog.com email
isn't working anymore.

But:

Signed-off-by: Alexandru Ardelean <[email protected]>

Thanks :)
Alex

> Signed-off-by: Alexandru Ardelean <[email protected]>
> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 39 +++++++++++---------
> 1 file changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index eeeed6b2e0cf..eb8cfd3af030 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -358,6 +358,27 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);
>
> +static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
> +{
> + unsigned int i;
> +
> + spin_lock_irq(&queue->list_lock);
> + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> + if (!queue->fileio.blocks[i])
> + continue;
> + queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> + }
> + spin_unlock_irq(&queue->list_lock);
> +
> + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> + if (!queue->fileio.blocks[i])
> + continue;
> + iio_buffer_block_put(queue->fileio.blocks[i]);
> + queue->fileio.blocks[i] = NULL;
> + }
> + queue->fileio.active_block = NULL;
> +}
> +
> static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> struct iio_dma_buffer_block *block)
> {
> @@ -681,25 +702,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
> */
> void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
> {
> - unsigned int i;
> -
> mutex_lock(&queue->lock);
>
> - spin_lock_irq(&queue->list_lock);
> - for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> - if (!queue->fileio.blocks[i])
> - continue;
> - queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> - }
> - spin_unlock_irq(&queue->list_lock);
> -
> - for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> - if (!queue->fileio.blocks[i])
> - continue;
> - iio_buffer_block_put(queue->fileio.blocks[i]);
> - queue->fileio.blocks[i] = NULL;
> - }
> - queue->fileio.active_block = NULL;
> + iio_dma_buffer_fileio_free(queue);
> queue->ops = NULL;
>
> mutex_unlock(&queue->lock);
> --
> 2.33.0
>

2021-11-16 16:02:38

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
> Hi Daniel,
>
> Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter <[email protected]> a
> ?crit :
> > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> > > Hi Jonathan,
> > >
> > > This patchset introduces a new userspace interface based on DMABUF
> > > objects, to complement the existing fileio based API.
> > >
> > > The advantage of this DMABUF based interface vs. the fileio
> > > interface, is that it avoids an extra copy of the data between the
> > > kernel and userspace. This is particularly userful for high-speed
> > > devices which produce several megabytes or even gigabytes of data
> > > per
> > > second.
> > >
> > > The first few patches [01/15] to [03/15] are not really related, but
> > > allow to reduce the size of the patches that introduce the new API.
> > >
> > > Patch [04/15] to [06/15] enables write() support to the buffer-dma
> > > implementation of the buffer API, to continue the work done by
> > > Mihail Chindris.
> > >
> > > Patches [07/15] to [12/15] introduce the new DMABUF based API.
> > >
> > > Patches [13/15] and [14/15] add support for cyclic buffers, only
> > > through
> > > the new API. A cyclic buffer will be repeated on the output until
> > > the
> > > buffer is disabled.
> > >
> > > Patch [15/15] adds documentation about the new API.
> > >
> > > For now, the API allows you to alloc DMABUF objects and mmap() them
> > > to
> > > read or write the samples. It does not yet allow to import DMABUFs
> > > parented to other subsystems, but that should eventually be possible
> > > once it's wired.
> > >
> > > This patchset is inspired by the "mmap interface" that was
> > > previously
> > > submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would
> > > be
> > > great if I could get a review from you guys. Alexandru's commit was
> > > signed with his @analog.com address but he doesn't work at ADI
> > > anymore,
> > > so I believe I'll need him to sign with a new email.
> >
> > Why dma-buf? dma-buf looks like something super generic and useful,
> > until
> > you realize that there's a metric ton of gpu/accelerator bagage piled
> > in.
> > So unless buffer sharing with a gpu/video/accel/whatever device is the
> > goal here, and it's just for a convenient way to get at buffer handles,
> > this doesn't sound like a good idea.
>
> Good question. The first reason is that a somewhat similar API was intented
> before[1], but refused upstream as it was kind of re-inventing the wheel.
>
> The second reason, is that we want to be able to share buffers too, not with
> gpu/video but with the network (zctap) and in the future with USB
> (functionFS) too.
>
> [1]: https://lore.kernel.org/linux-iio/[email protected]/T/

Hm is that code merged already in upstream already?

I know that dma-buf looks really generic, but like I said if there's no
need ever to interface with any of the gpu buffer sharing it might be
better to use something else (like get_user_pages maybe, would that work?).

> > Also if the idea is to this with gpus/accelerators then I'd really like
> > to
> > see the full thing, since most likely at that point you also want
> > dma_fence. And once we talk dma_fence things get truly horrible from a
> > locking pov :-( Or well, just highly constrained and I get to review
> > what
> > iio is doing with these buffers to make sure it all fits.
>
> There is some dma_fence action in patch #10, which is enough for the
> userspace apps to use the API.
>
> What "horribleness" are we talking about here? It doesn't look that scary to
> me, but I certainly don't have the complete picture.

You need to annotate all the code involved in signalling that dma_fence
using dma_fence_begin/end_signalling, and then enable full lockdep and
everything.

You can safely assume you'll find bugs, because we even have bugs about
this in gpu drivers (where that annotation isn't fully rolled out yet).

The tldr is that you can allocate memory in there. And a pile of other
restrictions, but not being able to allocate memory (well GFP_ATOMIC is
ok, but that can fail) is a very serious restriction.
-Daniel


>
> Cheers,
> -Paul
>
> > Cheers, Daniel
> >
> > >
> > > Cheers,
> > > -Paul
> > >
> > > Alexandru Ardelean (1):
> > > iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> > >
> > > Paul Cercueil (14):
> > > iio: buffer-dma: Get rid of incoming/outgoing queues
> > > iio: buffer-dma: Remove unused iio_buffer_block struct
> > > iio: buffer-dma: Use round_down() instead of rounddown()
> > > iio: buffer-dma: Enable buffer write support
> > > iio: buffer-dmaengine: Support specifying buffer direction
> > > iio: buffer-dmaengine: Enable write support
> > > iio: core: Add new DMABUF interface infrastructure
> > > iio: buffer-dma: Use DMABUFs instead of custom solution
> > > iio: buffer-dma: Implement new DMABUF based userspace API
> > > iio: buffer-dma: Boost performance using write-combine cache
> > > setting
> > > iio: buffer-dmaengine: Support new DMABUF based userspace API
> > > iio: core: Add support for cyclic buffers
> > > iio: buffer-dmaengine: Add support for cyclic buffers
> > > Documentation: iio: Document high-speed DMABUF based API
> > >
> > > Documentation/driver-api/dma-buf.rst | 2 +
> > > Documentation/iio/dmabuf_api.rst | 94 +++
> > > Documentation/iio/index.rst | 2 +
> > > drivers/iio/adc/adi-axi-adc.c | 3 +-
> > > drivers/iio/buffer/industrialio-buffer-dma.c | 670
> > > ++++++++++++++----
> > > .../buffer/industrialio-buffer-dmaengine.c | 42 +-
> > > drivers/iio/industrialio-buffer.c | 49 ++
> > > include/linux/iio/buffer-dma.h | 43 +-
> > > include/linux/iio/buffer-dmaengine.h | 5 +-
> > > include/linux/iio/buffer_impl.h | 8 +
> > > include/uapi/linux/iio/buffer.h | 30 +
> > > 11 files changed, 783 insertions(+), 165 deletions(-)
> > > create mode 100644 Documentation/iio/dmabuf_api.rst
> > >
> > > --
> > > 2.33.0
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-11-16 16:32:03

by Laurent Pinchart

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

On Tue, Nov 16, 2021 at 05:02:25PM +0100, Daniel Vetter wrote:
> On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
> > Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter a écrit :
> > > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> > > > Hi Jonathan,
> > > >
> > > > This patchset introduces a new userspace interface based on DMABUF
> > > > objects, to complement the existing fileio based API.
> > > >
> > > > The advantage of this DMABUF based interface vs. the fileio
> > > > interface, is that it avoids an extra copy of the data between the
> > > > kernel and userspace. This is particularly userful for high-speed
> > > > devices which produce several megabytes or even gigabytes of data per
> > > > second.
> > > >
> > > > The first few patches [01/15] to [03/15] are not really related, but
> > > > allow to reduce the size of the patches that introduce the new API.
> > > >
> > > > Patch [04/15] to [06/15] enables write() support to the buffer-dma
> > > > implementation of the buffer API, to continue the work done by
> > > > Mihail Chindris.
> > > >
> > > > Patches [07/15] to [12/15] introduce the new DMABUF based API.
> > > >
> > > > Patches [13/15] and [14/15] add support for cyclic buffers, only through
> > > > the new API. A cyclic buffer will be repeated on the output until the
> > > > buffer is disabled.
> > > >
> > > > Patch [15/15] adds documentation about the new API.
> > > >
> > > > For now, the API allows you to alloc DMABUF objects and mmap() them
> > > > to
> > > > read or write the samples. It does not yet allow to import DMABUFs
> > > > parented to other subsystems, but that should eventually be possible
> > > > once it's wired.
> > > >
> > > > This patchset is inspired by the "mmap interface" that was previously
> > > > submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
> > > > great if I could get a review from you guys. Alexandru's commit was
> > > > signed with his @analog.com address but he doesn't work at ADI anymore,
> > > > so I believe I'll need him to sign with a new email.
> > >
> > > Why dma-buf? dma-buf looks like something super generic and useful, until
> > > you realize that there's a metric ton of gpu/accelerator bagage piled in.
> > > So unless buffer sharing with a gpu/video/accel/whatever device is the

And cameras (maybe they're included in "whatever" ?).

> > > goal here, and it's just for a convenient way to get at buffer handles,
> > > this doesn't sound like a good idea.
> >
> > Good question. The first reason is that a somewhat similar API was intented
> > before[1], but refused upstream as it was kind of re-inventing the wheel.
> >
> > The second reason, is that we want to be able to share buffers too, not with
> > gpu/video but with the network (zctap) and in the future with USB
> > (functionFS) too.
> >
> > [1]: https://lore.kernel.org/linux-iio/[email protected]/T/
>
> Hm is that code merged already in upstream already?
>
> I know that dma-buf looks really generic, but like I said if there's no
> need ever to interface with any of the gpu buffer sharing it might be
> better to use something else (like get_user_pages maybe, would that work?).

Not GUP please. That brings another set of issues, especially when
dealing with DMA, we've suffered enough from the USERPTR support in V4L2
to avoid repeating this. Let's make dma-buf better instead.

> > > Also if the idea is to this with gpus/accelerators then I'd really like to
> > > see the full thing, since most likely at that point you also want
> > > dma_fence. And once we talk dma_fence things get truly horrible from a
> > > locking pov :-( Or well, just highly constrained and I get to review what
> > > iio is doing with these buffers to make sure it all fits.
> >
> > There is some dma_fence action in patch #10, which is enough for the
> > userspace apps to use the API.
> >
> > What "horribleness" are we talking about here? It doesn't look that scary to
> > me, but I certainly don't have the complete picture.
>
> You need to annotate all the code involved in signalling that dma_fence
> using dma_fence_begin/end_signalling, and then enable full lockdep and
> everything.
>
> You can safely assume you'll find bugs, because we even have bugs about
> this in gpu drivers (where that annotation isn't fully rolled out yet).
>
> The tldr is that you can allocate memory in there. And a pile of other
> restrictions, but not being able to allocate memory (well GFP_ATOMIC is
> ok, but that can fail) is a very serious restriction.
> -Daniel
>
> > > > Alexandru Ardelean (1):
> > > > iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> > > >
> > > > Paul Cercueil (14):
> > > > iio: buffer-dma: Get rid of incoming/outgoing queues
> > > > iio: buffer-dma: Remove unused iio_buffer_block struct
> > > > iio: buffer-dma: Use round_down() instead of rounddown()
> > > > iio: buffer-dma: Enable buffer write support
> > > > iio: buffer-dmaengine: Support specifying buffer direction
> > > > iio: buffer-dmaengine: Enable write support
> > > > iio: core: Add new DMABUF interface infrastructure
> > > > iio: buffer-dma: Use DMABUFs instead of custom solution
> > > > iio: buffer-dma: Implement new DMABUF based userspace API
> > > > iio: buffer-dma: Boost performance using write-combine cache setting
> > > > iio: buffer-dmaengine: Support new DMABUF based userspace API
> > > > iio: core: Add support for cyclic buffers
> > > > iio: buffer-dmaengine: Add support for cyclic buffers
> > > > Documentation: iio: Document high-speed DMABUF based API
> > > >
> > > > Documentation/driver-api/dma-buf.rst | 2 +
> > > > Documentation/iio/dmabuf_api.rst | 94 +++
> > > > Documentation/iio/index.rst | 2 +
> > > > drivers/iio/adc/adi-axi-adc.c | 3 +-
> > > > drivers/iio/buffer/industrialio-buffer-dma.c | 670 ++++++++++++++----
> > > > .../buffer/industrialio-buffer-dmaengine.c | 42 +-
> > > > drivers/iio/industrialio-buffer.c | 49 ++
> > > > include/linux/iio/buffer-dma.h | 43 +-
> > > > include/linux/iio/buffer-dmaengine.h | 5 +-
> > > > include/linux/iio/buffer_impl.h | 8 +
> > > > include/uapi/linux/iio/buffer.h | 30 +
> > > > 11 files changed, 783 insertions(+), 165 deletions(-)
> > > > create mode 100644 Documentation/iio/dmabuf_api.rst

--
Regards,

Laurent Pinchart

2021-11-17 08:48:58

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

Am 16.11.21 um 17:31 schrieb Laurent Pinchart:
> On Tue, Nov 16, 2021 at 05:02:25PM +0100, Daniel Vetter wrote:
>> On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
>>> Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter a écrit :
>>>> On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
>>>>> Hi Jonathan,
>>>>>
>>>>> This patchset introduces a new userspace interface based on DMABUF
>>>>> objects, to complement the existing fileio based API.
>>>>>
>>>>> The advantage of this DMABUF based interface vs. the fileio
>>>>> interface, is that it avoids an extra copy of the data between the
>>>>> kernel and userspace. This is particularly userful for high-speed
>>>>> devices which produce several megabytes or even gigabytes of data per
>>>>> second.
>>>>>
>>>>> The first few patches [01/15] to [03/15] are not really related, but
>>>>> allow to reduce the size of the patches that introduce the new API.
>>>>>
>>>>> Patch [04/15] to [06/15] enables write() support to the buffer-dma
>>>>> implementation of the buffer API, to continue the work done by
>>>>> Mihail Chindris.
>>>>>
>>>>> Patches [07/15] to [12/15] introduce the new DMABUF based API.
>>>>>
>>>>> Patches [13/15] and [14/15] add support for cyclic buffers, only through
>>>>> the new API. A cyclic buffer will be repeated on the output until the
>>>>> buffer is disabled.
>>>>>
>>>>> Patch [15/15] adds documentation about the new API.
>>>>>
>>>>> For now, the API allows you to alloc DMABUF objects and mmap() them
>>>>> to
>>>>> read or write the samples. It does not yet allow to import DMABUFs
>>>>> parented to other subsystems, but that should eventually be possible
>>>>> once it's wired.
>>>>>
>>>>> This patchset is inspired by the "mmap interface" that was previously
>>>>> submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
>>>>> great if I could get a review from you guys. Alexandru's commit was
>>>>> signed with his @analog.com address but he doesn't work at ADI anymore,
>>>>> so I believe I'll need him to sign with a new email.
>>>> Why dma-buf? dma-buf looks like something super generic and useful, until
>>>> you realize that there's a metric ton of gpu/accelerator bagage piled in.
>>>> So unless buffer sharing with a gpu/video/accel/whatever device is the
> And cameras (maybe they're included in "whatever" ?).
>
>>>> goal here, and it's just for a convenient way to get at buffer handles,
>>>> this doesn't sound like a good idea.
>>> Good question. The first reason is that a somewhat similar API was intented
>>> before[1], but refused upstream as it was kind of re-inventing the wheel.
>>>
>>> The second reason, is that we want to be able to share buffers too, not with
>>> gpu/video but with the network (zctap) and in the future with USB
>>> (functionFS) too.
>>>
>>> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-iio%2F20210217073638.21681-1-alexandru.ardelean%40analog.com%2FT%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7fffe09e82514577747808d9a91e9dd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637726772007336951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=w1vD8IKz5G7ut%2FlsOyuYXKKQRBV1s8dN7qJBUwo8x9g%3D&amp;reserved=0
>> Hm is that code merged already in upstream already?
>>
>> I know that dma-buf looks really generic, but like I said if there's no
>> need ever to interface with any of the gpu buffer sharing it might be
>> better to use something else (like get_user_pages maybe, would that work?).
> Not GUP please. That brings another set of issues, especially when
> dealing with DMA, we've suffered enough from the USERPTR support in V4L2
> to avoid repeating this. Let's make dma-buf better instead.

Yeah, when comparing GUP and DMA-buf the later is clearly the lesser evil.

DMA-buf indeed has some design issues we need to work on, especially
around the async operations and synchronization. But I still think those
are solvable.

GUP on the other hand has some hard fundamental problems which you can
only solved completely if the hardware is capable of fast and reliable
recoverable page faults.

Regards,
Christian.

2021-11-17 12:50:37

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

Hi Daniel,

Le mar., nov. 16 2021 at 17:02:25 +0100, Daniel Vetter
<[email protected]> a ?crit :
> On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
>> Hi Daniel,
>>
>> Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter
>> <[email protected]> a
>> ?crit :
>> > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
>> > > Hi Jonathan,
>> > >
>> > > This patchset introduces a new userspace interface based on
>> DMABUF
>> > > objects, to complement the existing fileio based API.
>> > >
>> > > The advantage of this DMABUF based interface vs. the fileio
>> > > interface, is that it avoids an extra copy of the data between
>> the
>> > > kernel and userspace. This is particularly userful for
>> high-speed
>> > > devices which produce several megabytes or even gigabytes of
>> data
>> > > per
>> > > second.
>> > >
>> > > The first few patches [01/15] to [03/15] are not really
>> related, but
>> > > allow to reduce the size of the patches that introduce the new
>> API.
>> > >
>> > > Patch [04/15] to [06/15] enables write() support to the
>> buffer-dma
>> > > implementation of the buffer API, to continue the work done by
>> > > Mihail Chindris.
>> > >
>> > > Patches [07/15] to [12/15] introduce the new DMABUF based API.
>> > >
>> > > Patches [13/15] and [14/15] add support for cyclic buffers,
>> only
>> > > through
>> > > the new API. A cyclic buffer will be repeated on the output
>> until
>> > > the
>> > > buffer is disabled.
>> > >
>> > > Patch [15/15] adds documentation about the new API.
>> > >
>> > > For now, the API allows you to alloc DMABUF objects and mmap()
>> them
>> > > to
>> > > read or write the samples. It does not yet allow to import
>> DMABUFs
>> > > parented to other subsystems, but that should eventually be
>> possible
>> > > once it's wired.
>> > >
>> > > This patchset is inspired by the "mmap interface" that was
>> > > previously
>> > > submitted by Alexandru Ardelean and Lars-Peter Clausen, so it
>> would
>> > > be
>> > > great if I could get a review from you guys. Alexandru's
>> commit was
>> > > signed with his @analog.com address but he doesn't work at ADI
>> > > anymore,
>> > > so I believe I'll need him to sign with a new email.
>> >
>> > Why dma-buf? dma-buf looks like something super generic and
>> useful,
>> > until
>> > you realize that there's a metric ton of gpu/accelerator bagage
>> piled
>> > in.
>> > So unless buffer sharing with a gpu/video/accel/whatever device
>> is the
>> > goal here, and it's just for a convenient way to get at buffer
>> handles,
>> > this doesn't sound like a good idea.
>>
>> Good question. The first reason is that a somewhat similar API was
>> intented
>> before[1], but refused upstream as it was kind of re-inventing the
>> wheel.
>>
>> The second reason, is that we want to be able to share buffers too,
>> not with
>> gpu/video but with the network (zctap) and in the future with USB
>> (functionFS) too.
>>
>> [1]:
>> https://lore.kernel.org/linux-iio/[email protected]/T/
>
> Hm is that code merged already in upstream already?

No, it was never merged.

> I know that dma-buf looks really generic, but like I said if there's
> no
> need ever to interface with any of the gpu buffer sharing it might be
> better to use something else (like get_user_pages maybe, would that
> work?).

If it was such a bad idea, why didn't you say it in the first place
when you replied to my request for feedback? [1]

I don't think we have any other solution. We can design a custom API to
pass buffers between IIO and user space, but that won't allow us to
share these buffers with other subsystems. If dma-buf is not a generic
solution, then we need a generic solution.

[1]:
https://x-lore.kernel.org/io-uring/[email protected]/T/

>> > Also if the idea is to this with gpus/accelerators then I'd
>> really like
>> > to
>> > see the full thing, since most likely at that point you also want
>> > dma_fence. And once we talk dma_fence things get truly horrible
>> from a
>> > locking pov :-( Or well, just highly constrained and I get to
>> review
>> > what
>> > iio is doing with these buffers to make sure it all fits.
>>
>> There is some dma_fence action in patch #10, which is enough for the
>> userspace apps to use the API.
>>
>> What "horribleness" are we talking about here? It doesn't look that
>> scary to
>> me, but I certainly don't have the complete picture.
>
> You need to annotate all the code involved in signalling that
> dma_fence
> using dma_fence_begin/end_signalling, and then enable full lockdep and
> everything.

Doesn't dma_fence_signal() do it for me? Looking at the code, it does
call dma_fence_begin/end_signalling.

Cheers,
-Paul

> You can safely assume you'll find bugs, because we even have bugs
> about
> this in gpu drivers (where that annotation isn't fully rolled out
> yet).
>
> The tldr is that you can allocate memory in there. And a pile of other
> restrictions, but not being able to allocate memory (well GFP_ATOMIC
> is
> ok, but that can fail) is a very serious restriction.
> -Daniel
>
>
>>
>> Cheers,
>> -Paul
>>
>> > Cheers, Daniel
>> >
>> > >
>> > > Cheers,
>> > > -Paul
>> > >
>> > > Alexandru Ardelean (1):
>> > > iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>> > >
>> > > Paul Cercueil (14):
>> > > iio: buffer-dma: Get rid of incoming/outgoing queues
>> > > iio: buffer-dma: Remove unused iio_buffer_block struct
>> > > iio: buffer-dma: Use round_down() instead of rounddown()
>> > > iio: buffer-dma: Enable buffer write support
>> > > iio: buffer-dmaengine: Support specifying buffer direction
>> > > iio: buffer-dmaengine: Enable write support
>> > > iio: core: Add new DMABUF interface infrastructure
>> > > iio: buffer-dma: Use DMABUFs instead of custom solution
>> > > iio: buffer-dma: Implement new DMABUF based userspace API
>> > > iio: buffer-dma: Boost performance using write-combine cache
>> > > setting
>> > > iio: buffer-dmaengine: Support new DMABUF based userspace API
>> > > iio: core: Add support for cyclic buffers
>> > > iio: buffer-dmaengine: Add support for cyclic buffers
>> > > Documentation: iio: Document high-speed DMABUF based API
>> > >
>> > > Documentation/driver-api/dma-buf.rst | 2 +
>> > > Documentation/iio/dmabuf_api.rst | 94 +++
>> > > Documentation/iio/index.rst | 2 +
>> > > drivers/iio/adc/adi-axi-adc.c | 3 +-
>> > > drivers/iio/buffer/industrialio-buffer-dma.c | 670
>> > > ++++++++++++++----
>> > > .../buffer/industrialio-buffer-dmaengine.c | 42 +-
>> > > drivers/iio/industrialio-buffer.c | 49 ++
>> > > include/linux/iio/buffer-dma.h | 43 +-
>> > > include/linux/iio/buffer-dmaengine.h | 5 +-
>> > > include/linux/iio/buffer_impl.h | 8 +
>> > > include/uapi/linux/iio/buffer.h | 30 +
>> > > 11 files changed, 783 insertions(+), 165 deletions(-)
>> > > create mode 100644 Documentation/iio/dmabuf_api.rst
>> > >
>> > > --
>> > > 2.33.0
>> > >
>> >
>> > --
>> > Daniel Vetter
>> > Software Engineer, Intel Corporation
>> > http://blog.ffwll.ch
>>
>>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



2021-11-17 13:43:13

by Hennerich, Michael

[permalink] [raw]
Subject: RE: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API



> -----Original Message-----
> From: Paul Cercueil <[email protected]>
> Sent: Mittwoch, 17. November 2021 13:50
> To: Daniel Vetter <[email protected]>
> Cc: Jonathan Cameron <[email protected]>; Hennerich, Michael
> <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]; Christian K?nig
> <[email protected]>; [email protected]; Alexandru
> Ardelean <[email protected]>; [email protected]
> Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based
> API
>
> [External]
>
> Hi Daniel,
>
> Le mar., nov. 16 2021 at 17:02:25 +0100, Daniel Vetter <[email protected]> a
> ?crit :
> > On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
> >> Hi Daniel,
> >>
> >> Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter
> >> <[email protected]> a ?crit :
> >> > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> >> > > Hi Jonathan,
> >> > >
> >> > > This patchset introduces a new userspace interface based on
> >> DMABUF > > objects, to complement the existing fileio based API.
> >> > >
> >> > > The advantage of this DMABUF based interface vs. the fileio >
> >> > interface, is that it avoids an extra copy of the data between the
> >> > > kernel and userspace. This is particularly userful for
> >> high-speed > > devices which produce several megabytes or even
> >> gigabytes of data > > per > > second.
> >> > >
> >> > > The first few patches [01/15] to [03/15] are not really
> >> related, but > > allow to reduce the size of the patches that
> >> introduce the new API.
> >> > >
> >> > > Patch [04/15] to [06/15] enables write() support to the
> >> buffer-dma > > implementation of the buffer API, to continue the
> >> work done by > > Mihail Chindris.
> >> > >
> >> > > Patches [07/15] to [12/15] introduce the new DMABUF based API.
> >> > >
> >> > > Patches [13/15] and [14/15] add support for cyclic buffers,
> >> only > > through > > the new API. A cyclic buffer will be repeated
> >> on the output until > > the > > buffer is disabled.
> >> > >
> >> > > Patch [15/15] adds documentation about the new API.
> >> > >
> >> > > For now, the API allows you to alloc DMABUF objects and mmap()
> >> them > > to > > read or write the samples. It does not yet allow
> >> to import DMABUFs > > parented to other subsystems, but that should
> >> eventually be possible > > once it's wired.
> >> > >
> >> > > This patchset is inspired by the "mmap interface" that was > >
> >> previously > > submitted by Alexandru Ardelean and Lars-Peter
> >> Clausen, so it would > > be > > great if I could get a review from
> >> you guys. Alexandru's commit was > > signed with his @analog.com
> >> address but he doesn't work at ADI > > anymore, > > so I believe
> >> I'll need him to sign with a new email.
> >> >
> >> > Why dma-buf? dma-buf looks like something super generic and
> >> useful, > until > you realize that there's a metric ton of
> >> gpu/accelerator bagage piled > in.
> >> > So unless buffer sharing with a gpu/video/accel/whatever device is
> >> the > goal here, and it's just for a convenient way to get at buffer
> >> handles, > this doesn't sound like a good idea.
> >>
> >> Good question. The first reason is that a somewhat similar API was
> >> intented before[1], but refused upstream as it was kind of
> >> re-inventing the wheel.
> >>
> >> The second reason, is that we want to be able to share buffers too,
> >> not with gpu/video but with the network (zctap) and in the future
> >> with USB
> >> (functionFS) too.
> >>
> >> [1]:
> >> https://urldefense.com/v3/__https://lore.kernel.org/linux-iio/2021021
> >> 7073638.21681-1-
> [email protected]/T/__;!!A3Ni8CS0y2Y!p67g
> >>
> KdXW2zXUxASbwbCKzXgcwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3wsQ
> c3gAgQ$
> >
> > Hm is that code merged already in upstream already?
>
> No, it was never merged.
>
> > I know that dma-buf looks really generic, but like I said if there's
> > no need ever to interface with any of the gpu buffer sharing it might
> > be better to use something else (like get_user_pages maybe, would that
> > work?).
>
> If it was such a bad idea, why didn't you say it in the first place when you
> replied to my request for feedback? [1]
>
> I don't think we have any other solution. We can design a custom API to pass
> buffers between IIO and user space, but that won't allow us to share these
> buffers with other subsystems. If dma-buf is not a generic solution, then we
> need a generic solution.

I don't think we need another generic solution - dma-buf is the solution:

From https://www.kernel.org/doc/html/v5.15/driver-api/dma-buf.html:
" The dma-buf subsystem provides the framework for sharing buffers for hardware (DMA)
access across multiple device drivers and subsystems, and for synchronizing asynchronous hardware access."

That's sounds pretty much like what we need.

It lives under linux/drivers/dma-buf
if it would be a video only shouldn't this be documented somewhere, and perhaps the code should be somewhere else?

Just my 2cents.

Greetings,
Michael

>
> [1]:
> https://urldefense.com/v3/__https://x-lore.kernel.org/io-uring/b0a336c0-
> ae2f-e77f-3c5f-
> [email protected]/T/__;!!A3Ni8CS0y2Y!p67gKdXW2zXUxASbwbCKzXg
> cwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3wu9jAKL1Q$
>
> >> > Also if the idea is to this with gpus/accelerators then I'd really
> >> like > to > see the full thing, since most likely at that point you
> >> also want > dma_fence. And once we talk dma_fence things get truly
> >> horrible from a > locking pov :-( Or well, just highly constrained
> >> and I get to review > what > iio is doing with these buffers to
> >> make sure it all fits.
> >>
> >> There is some dma_fence action in patch #10, which is enough for the
> >> userspace apps to use the API.
> >>
> >> What "horribleness" are we talking about here? It doesn't look that
> >> scary to me, but I certainly don't have the complete picture.
> >
> > You need to annotate all the code involved in signalling that
> > dma_fence using dma_fence_begin/end_signalling, and then enable full
> > lockdep and everything.
>
> Doesn't dma_fence_signal() do it for me? Looking at the code, it does call
> dma_fence_begin/end_signalling.
>
> Cheers,
> -Paul
>
> > You can safely assume you'll find bugs, because we even have bugs
> > about this in gpu drivers (where that annotation isn't fully rolled
> > out yet).
> >
> > The tldr is that you can allocate memory in there. And a pile of other
> > restrictions, but not being able to allocate memory (well GFP_ATOMIC
> > is ok, but that can fail) is a very serious restriction.
> > -Daniel
> >
> >
> >>
> >> Cheers,
> >> -Paul
> >>
> >> > Cheers, Daniel
> >> >
> >> > >
> >> > > Cheers,
> >> > > -Paul
> >> > >
> >> > > Alexandru Ardelean (1):
> >> > > iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> >> > >
> >> > > Paul Cercueil (14):
> >> > > iio: buffer-dma: Get rid of incoming/outgoing queues
> >> > > iio: buffer-dma: Remove unused iio_buffer_block struct
> >> > > iio: buffer-dma: Use round_down() instead of rounddown()
> >> > > iio: buffer-dma: Enable buffer write support
> >> > > iio: buffer-dmaengine: Support specifying buffer direction
> >> > > iio: buffer-dmaengine: Enable write support
> >> > > iio: core: Add new DMABUF interface infrastructure
> >> > > iio: buffer-dma: Use DMABUFs instead of custom solution
> >> > > iio: buffer-dma: Implement new DMABUF based userspace API
> >> > > iio: buffer-dma: Boost performance using write-combine cache
> >> > > setting
> >> > > iio: buffer-dmaengine: Support new DMABUF based userspace API
> >> > > iio: core: Add support for cyclic buffers
> >> > > iio: buffer-dmaengine: Add support for cyclic buffers
> >> > > Documentation: iio: Document high-speed DMABUF based API
> >> > >
> >> > > Documentation/driver-api/dma-buf.rst | 2 +
> >> > > Documentation/iio/dmabuf_api.rst | 94 +++
> >> > > Documentation/iio/index.rst | 2 +
> >> > > drivers/iio/adc/adi-axi-adc.c | 3 +-
> >> > > drivers/iio/buffer/industrialio-buffer-dma.c | 670
> >> > > ++++++++++++++----
> >> > > .../buffer/industrialio-buffer-dmaengine.c | 42 +-
> >> > > drivers/iio/industrialio-buffer.c | 49 ++
> >> > > include/linux/iio/buffer-dma.h | 43 +-
> >> > > include/linux/iio/buffer-dmaengine.h | 5 +-
> >> > > include/linux/iio/buffer_impl.h | 8 +
> >> > > include/uapi/linux/iio/buffer.h | 30 +
> >> > > 11 files changed, 783 insertions(+), 165 deletions(-)
> >> > > create mode 100644 Documentation/iio/dmabuf_api.rst
> >> > >
> >> > > --
> >> > > 2.33.0
> >> > >
> >> >
> >> > --
> >> > Daniel Vetter
> >> > Software Engineer, Intel Corporation >
> >> https://urldefense.com/v3/__http://blog.ffwll.ch__;!!A3Ni8CS0y2Y!p67g
> >>
> KdXW2zXUxASbwbCKzXgcwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3ws_
> BZIjsg$
> >>
> >>
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > https://urldefense.com/v3/__http://blog.ffwll.ch__;!!A3Ni8CS0y2Y!p67gK
> >
> dXW2zXUxASbwbCKzXgcwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3ws_B
> ZIjsg$
>


2021-11-18 11:46:43

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

Hi,

Le lun., nov. 15 2021 at 14:19:21 +0000, Paul Cercueil
<[email protected]> a ?crit :
> We can be certain that the input buffers will only be accessed by
> userspace for reading, and output buffers will mostly be accessed by
> userspace for writing.
>
> Therefore, it makes more sense to use only fully cached input buffers,
> and to use the write-combine cache coherency setting for output
> buffers.
>
> This boosts performance, as the data written to the output buffers
> does
> not have to be sync'd for coherency. It will halve performance if the
> userspace application tries to read from the output buffer, but this
> should never happen.
>
> Since we don't need to sync the cache when disabling CPU access either
> for input buffers or output buffers, the .end_cpu_access() callback
> can
> be dropped completely.
>
> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 82
> +++++++++++++-------
> 1 file changed, 54 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c
> b/drivers/iio/buffer/industrialio-buffer-dma.c
> index 92356ee02f30..fb39054d8c15 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct
> dma_buf *dbuf,
> if (vma->vm_ops->open)
> vma->vm_ops->open(vma);
>
> - return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> - virt_to_page(block->vaddr));
> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> + /*
> + * With an input buffer, userspace will only read the data and
> + * never write. We can mmap the buffer fully cached.
> + */
> + return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> + virt_to_page(block->vaddr));
> + } else {
> + /*
> + * With an output buffer, userspace will only write the data
> + * and should rarely (if never) read from it. It is better to
> + * use write-combine in this case.
> + */
> + return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
> + vma->vm_end - vma->vm_start);
> + }
> +}
> +
> +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block
> *block)
> +{
> + struct device *dev = block->queue->dev;
> + size_t size = PAGE_ALIGN(block->size);
> +
> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> + dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
> + else
> + dma_free_wc(dev, size, block->vaddr, block->phys_addr);
> }
>
> static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
> @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct
> dma_buf *dbuf)
>
> mutex_lock(&queue->lock);
>
> - dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> - block->vaddr, block->phys_addr);
> -
> + iio_dma_buffer_free_dmamem(block);
> kfree(block);
>
> queue->num_blocks--;
> @@ -268,19 +291,6 @@ static int
> iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
> return 0;
> }
>
> -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
> - enum dma_data_direction dma_dir)
> -{
> - struct iio_dma_buffer_block *block = dbuf->priv;
> - struct device *dev = block->queue->dev;
> -
> - /* We only need to sync the cache for output buffers */
> - if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
> - dma_sync_single_for_device(dev, block->phys_addr, block->size,
> dma_dir);
> -
> - return 0;
> -}
> -
> static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
> .attach = iio_buffer_dma_buf_attach,
> .map_dma_buf = iio_buffer_dma_buf_map,
> @@ -288,9 +298,28 @@ static const struct dma_buf_ops
> iio_dma_buffer_dmabuf_ops = {
> .mmap = iio_buffer_dma_buf_mmap,
> .release = iio_buffer_dma_buf_release,
> .begin_cpu_access = iio_buffer_dma_buf_begin_cpu_access,
> - .end_cpu_access = iio_buffer_dma_buf_end_cpu_access,
> };
>
> +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block
> *block)
> +{
> + struct device *dev = block->queue->dev;
> + size_t size = PAGE_ALIGN(block->size);
> +
> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> + block->vaddr = dma_alloc_coherent(dev, size,
> + &block->phys_addr,
> + GFP_KERNEL);

I'm so used to dma_alloc_noncoherent() that I didn't even notice that
it was dma_alloc_coherent() here. The code I added meant to work with
non-coherent memory - hence the dma_sync_* operations and the use of
dma_mmap_pages().

I'll fix that in V2.

Cheers,
-Paul

> + } else {
> + block->vaddr = dma_alloc_wc(dev, size,
> + &block->phys_addr,
> + GFP_KERNEL);
> + }
> + if (!block->vaddr)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
> {
> @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block
> *iio_dma_buffer_alloc_block(
> if (!block)
> return ERR_PTR(-ENOMEM);
>
> - block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> - &block->phys_addr, GFP_KERNEL);
> - if (!block->vaddr) {
> - err = -ENOMEM;
> + block->size = size;
> + block->queue = queue;
> +
> + err = iio_dma_buffer_alloc_dmamem(block);
> + if (err)
> goto err_free_block;
> - }
>
> einfo.ops = &iio_dma_buffer_dmabuf_ops;
> einfo.size = PAGE_ALIGN(size);
> @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block
> *iio_dma_buffer_alloc_block(
> }
>
> block->dmabuf = dmabuf;
> - block->size = size;
> block->bytes_used = size;
> block->state = IIO_BLOCK_STATE_DONE;
> - block->queue = queue;
> block->fileio = fileio;
> INIT_LIST_HEAD(&block->head);
>
> @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block
> *iio_dma_buffer_alloc_block(
> return block;
>
> err_free_dma:
> - dma_free_coherent(queue->dev, PAGE_ALIGN(size),
> - block->vaddr, block->phys_addr);
> + iio_dma_buffer_free_dmamem(block);
> err_free_block:
> kfree(block);
> return ERR_PTR(err);
> --
> 2.33.0
>



2021-11-21 13:44:21

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function

On Tue, 16 Nov 2021 12:59:30 +0200
Alexandru Ardelean <[email protected]> wrote:

> On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <[email protected]> wrote:
> >
> > From: Alexandru Ardelean <[email protected]>
> >
> > A part of the logic in the iio_dma_buffer_exit() is required for the change
> > to add mmap support to IIO buffers.
> > This change splits the logic into a separate function, which will be
> > re-used later.
> >
>
> Not sure how the protocol is here, since my old @analog.com email
> isn't working anymore.
>
> But:
>
> Signed-off-by: Alexandru Ardelean <[email protected]>

If you want to both document your email address change and make various scripts
find you more easily, send a patch for .mailmap

In meantime we can probably leave your original sign off in place but add
a Cc: with an up to date email address. That will reflect that the sign off
for this was (I assume) given back when you were at Analog.

>
> Thanks :)
> Alex
>
> > Signed-off-by: Alexandru Ardelean <[email protected]>
> > Signed-off-by: Paul Cercueil <[email protected]>
> > ---
> > drivers/iio/buffer/industrialio-buffer-dma.c | 39 +++++++++++---------
> > 1 file changed, 22 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> > index eeeed6b2e0cf..eb8cfd3af030 100644
> > --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> > +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> > @@ -358,6 +358,27 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> > }
> > EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);
> >
> > +static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
> > +{
> > + unsigned int i;
> > +
> > + spin_lock_irq(&queue->list_lock);
> > + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > + if (!queue->fileio.blocks[i])
> > + continue;
> > + queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> > + }
> > + spin_unlock_irq(&queue->list_lock);
> > +
> > + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > + if (!queue->fileio.blocks[i])
> > + continue;
> > + iio_buffer_block_put(queue->fileio.blocks[i]);
> > + queue->fileio.blocks[i] = NULL;
> > + }
> > + queue->fileio.active_block = NULL;
> > +}
> > +
> > static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> > struct iio_dma_buffer_block *block)
> > {
> > @@ -681,25 +702,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
> > */
> > void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
> > {
> > - unsigned int i;
> > -
> > mutex_lock(&queue->lock);
> >
> > - spin_lock_irq(&queue->list_lock);
> > - for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > - if (!queue->fileio.blocks[i])
> > - continue;
> > - queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> > - }
> > - spin_unlock_irq(&queue->list_lock);
> > -
> > - for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > - if (!queue->fileio.blocks[i])
> > - continue;
> > - iio_buffer_block_put(queue->fileio.blocks[i]);
> > - queue->fileio.blocks[i] = NULL;
> > - }
> > - queue->fileio.active_block = NULL;
> > + iio_dma_buffer_fileio_free(queue);
> > queue->ops = NULL;
> >
> > mutex_unlock(&queue->lock);
> > --
> > 2.33.0
> >


2021-11-21 13:53:07

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API

On Mon, 15 Nov 2021 14:19:10 +0000
Paul Cercueil <[email protected]> wrote:

> Hi Jonathan,
>
> This patchset introduces a new userspace interface based on DMABUF
> objects, to complement the existing fileio based API.
>
> The advantage of this DMABUF based interface vs. the fileio
> interface, is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.

I'm going to defer to others for the discussion of whether dmabuf is
the right way to do this. I look forwards to some conclusions.
Much like yourselves, I want to see 'a' way of doing this but I'm
not knowledgeable enough about the quirks and corner cases of the
different options to offer a strong opinion.

Either way, it's great to get that discussion going and it would never
have happened without having actual code so thanks for doing this!

>
> The first few patches [01/15] to [03/15] are not really related, but
> allow to reduce the size of the patches that introduce the new API.
>
> Patch [04/15] to [06/15] enables write() support to the buffer-dma
> implementation of the buffer API, to continue the work done by
> Mihail Chindris.
>
> Patches [07/15] to [12/15] introduce the new DMABUF based API.
>
> Patches [13/15] and [14/15] add support for cyclic buffers, only through
> the new API. A cyclic buffer will be repeated on the output until the
> buffer is disabled.
>
> Patch [15/15] adds documentation about the new API.
>
> For now, the API allows you to alloc DMABUF objects and mmap() them to
> read or write the samples. It does not yet allow to import DMABUFs
> parented to other subsystems, but that should eventually be possible
> once it's wired.
>
> This patchset is inspired by the "mmap interface" that was previously
> submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
> great if I could get a review from you guys. Alexandru's commit was
> signed with his @analog.com address but he doesn't work at ADI anymore,
> so I believe I'll need him to sign with a new email.

Given it reflects a sign off given back then I think it's fine to
leave them as analog.com but maybe he can give an Acked-by: on a different
email address to reflect that he is happy for these to go forwards.

It's far from unusual to have out of date email addresses in sign offs
given often code is coming out of trees many years after someone
originally wrote it for a vendor tree or similar. Whilst not a lawyer
I don't think that's a problem.

Jonathan

>
> Cheers,
> -Paul
>
> Alexandru Ardelean (1):
> iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>
> Paul Cercueil (14):
> iio: buffer-dma: Get rid of incoming/outgoing queues
> iio: buffer-dma: Remove unused iio_buffer_block struct
> iio: buffer-dma: Use round_down() instead of rounddown()
> iio: buffer-dma: Enable buffer write support
> iio: buffer-dmaengine: Support specifying buffer direction
> iio: buffer-dmaengine: Enable write support
> iio: core: Add new DMABUF interface infrastructure
> iio: buffer-dma: Use DMABUFs instead of custom solution
> iio: buffer-dma: Implement new DMABUF based userspace API
> iio: buffer-dma: Boost performance using write-combine cache setting
> iio: buffer-dmaengine: Support new DMABUF based userspace API
> iio: core: Add support for cyclic buffers
> iio: buffer-dmaengine: Add support for cyclic buffers
> Documentation: iio: Document high-speed DMABUF based API
>
> Documentation/driver-api/dma-buf.rst | 2 +
> Documentation/iio/dmabuf_api.rst | 94 +++
> Documentation/iio/index.rst | 2 +
> drivers/iio/adc/adi-axi-adc.c | 3 +-
> drivers/iio/buffer/industrialio-buffer-dma.c | 670 ++++++++++++++----
> .../buffer/industrialio-buffer-dmaengine.c | 42 +-
> drivers/iio/industrialio-buffer.c | 49 ++
> include/linux/iio/buffer-dma.h | 43 +-
> include/linux/iio/buffer-dmaengine.h | 5 +-
> include/linux/iio/buffer_impl.h | 8 +
> include/uapi/linux/iio/buffer.h | 30 +
> 11 files changed, 783 insertions(+), 165 deletions(-)
> create mode 100644 Documentation/iio/dmabuf_api.rst
>


2021-11-21 14:01:09

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

On Mon, 15 Nov 2021 14:19:11 +0000
Paul Cercueil <[email protected]> wrote:

> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
>
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
>
> Since the new DMABUF based API wouldn't use these incoming and outgoing
> queues anyway, getting rid of them now makes the upcoming changes
> simpler.
>
> Signed-off-by: Paul Cercueil <[email protected]>
This looks to be a nice little cleanup on it's own.

Let me know if you'd like me to pick this up before the main discussion
comes to any conclusion.

Jonathan

> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 68 ++++++++++----------
> include/linux/iio/buffer-dma.h | 7 +-
> 2 files changed, 37 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index d348af8b9705..abac88f20104 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>
> static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> {
> - struct iio_dma_buffer_queue *queue = block->queue;
> -
> - /*
> - * The buffer has already been freed by the application, just drop the
> - * reference.
> - */
> - if (block->state != IIO_BLOCK_STATE_DEAD) {
> + if (block->state != IIO_BLOCK_STATE_DEAD)
> block->state = IIO_BLOCK_STATE_DONE;
> - list_add_tail(&block->head, &queue->outgoing);
> - }
> }
>
> /**
> @@ -317,11 +309,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> * dead. This means we can reset the lists without having to fear
> * corrution.
> */
> - INIT_LIST_HEAD(&queue->outgoing);
> spin_unlock_irq(&queue->list_lock);
>
> - INIT_LIST_HEAD(&queue->incoming);
> -
> for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> if (queue->fileio.blocks[i]) {
> block = queue->fileio.blocks[i];
> @@ -346,7 +335,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> }
>
> block->state = IIO_BLOCK_STATE_QUEUED;
> - list_add_tail(&block->head, &queue->incoming);
> }
>
> out_unlock:
> @@ -401,13 +389,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
> struct iio_dev *indio_dev)
> {
> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> - struct iio_dma_buffer_block *block, *_block;
> + struct iio_dma_buffer_block *block;
> + unsigned int i;
>
> mutex_lock(&queue->lock);
> queue->active = true;
> - list_for_each_entry_safe(block, _block, &queue->incoming, head) {
> - list_del(&block->head);
> - iio_dma_buffer_submit_block(queue, block);
> + queue->fileio.next_dequeue = 0;
> +
> + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> + block = queue->fileio.blocks[i];
> +
> + if (block->state == IIO_BLOCK_STATE_QUEUED)
> + iio_dma_buffer_submit_block(queue, block);
> }
> mutex_unlock(&queue->lock);
>
> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
> static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
> struct iio_dma_buffer_block *block)
> {
> - if (block->state == IIO_BLOCK_STATE_DEAD) {
> + if (block->state == IIO_BLOCK_STATE_DEAD)
> iio_buffer_block_put(block);
> - } else if (queue->active) {
> + else if (queue->active)
> iio_dma_buffer_submit_block(queue, block);
> - } else {
> + else
> block->state = IIO_BLOCK_STATE_QUEUED;
> - list_add_tail(&block->head, &queue->incoming);
> - }
> }
>
> static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
> struct iio_dma_buffer_queue *queue)
> {
> struct iio_dma_buffer_block *block;
> + unsigned int idx;
>
> spin_lock_irq(&queue->list_lock);
> - block = list_first_entry_or_null(&queue->outgoing, struct
> - iio_dma_buffer_block, head);
> - if (block != NULL) {
> - list_del(&block->head);
> +
> + idx = queue->fileio.next_dequeue;
> + block = queue->fileio.blocks[idx];
> +
> + if (block->state == IIO_BLOCK_STATE_DONE) {
> block->state = IIO_BLOCK_STATE_DEQUEUED;
> + idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
> + queue->fileio.next_dequeue = idx;
> + } else {
> + block = NULL;
> }
> +
> spin_unlock_irq(&queue->list_lock);
>
> return block;
> @@ -539,6 +537,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
> struct iio_dma_buffer_block *block;
> size_t data_available = 0;
> + unsigned int i;
>
> /*
> * For counting the available bytes we'll use the size of the block not
> @@ -552,8 +551,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
> data_available += queue->fileio.active_block->size;
>
> spin_lock_irq(&queue->list_lock);
> - list_for_each_entry(block, &queue->outgoing, head)
> - data_available += block->size;
> +
> + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> + block = queue->fileio.blocks[i];
> +
> + if (block != queue->fileio.active_block
> + && block->state == IIO_BLOCK_STATE_DONE)
> + data_available += block->size;
> + }
> +
> spin_unlock_irq(&queue->list_lock);
> mutex_unlock(&queue->lock);
>
> @@ -616,9 +622,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
> queue->dev = dev;
> queue->ops = ops;
>
> - INIT_LIST_HEAD(&queue->incoming);
> - INIT_LIST_HEAD(&queue->outgoing);
> -
> mutex_init(&queue->lock);
> spin_lock_init(&queue->list_lock);
>
> @@ -645,11 +648,8 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
> continue;
> queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> }
> - INIT_LIST_HEAD(&queue->outgoing);
> spin_unlock_irq(&queue->list_lock);
>
> - INIT_LIST_HEAD(&queue->incoming);
> -
> for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> if (!queue->fileio.blocks[i])
> continue;
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index ff15c61bf319..d4ed5ff39d44 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -78,12 +78,15 @@ struct iio_dma_buffer_block {
> * @active_block: Block being used in read()
> * @pos: Read offset in the active block
> * @block_size: Size of each block
> + * @next_dequeue: index of next block that will be dequeued
> */
> struct iio_dma_buffer_queue_fileio {
> struct iio_dma_buffer_block *blocks[2];
> struct iio_dma_buffer_block *active_block;
> size_t pos;
> size_t block_size;
> +
> + unsigned int next_dequeue;
> };
>
> /**
> @@ -97,8 +100,6 @@ struct iio_dma_buffer_queue_fileio {
> * atomic context as well as blocks on those lists. This is the outgoing queue
> * list and typically also a list of active blocks in the part that handles
> * the DMA controller
> - * @incoming: List of buffers on the incoming queue
> - * @outgoing: List of buffers on the outgoing queue
> * @active: Whether the buffer is currently active
> * @fileio: FileIO state
> */
> @@ -109,8 +110,6 @@ struct iio_dma_buffer_queue {
>
> struct mutex lock;
> spinlock_t list_lock;
> - struct list_head incoming;
> - struct list_head outgoing;
>
> bool active;
>


2021-11-21 14:03:35

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()

On Mon, 15 Nov 2021 14:19:13 +0000
Paul Cercueil <[email protected]> wrote:

> We know that the buffer's alignment will always be a power of two;
> therefore, we can use the faster round_down() macro.
>
> Signed-off-by: Paul Cercueil <[email protected]>
*groan*. I don't want to know where the naming of these two came from but that
is spectacular...

Anyhow, happy to pick up 1-3 now if you like as all are good cleanup of
existing code.

Jonathan

> ---
> drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index 1ac94c4e9792..f8ce26a24c57 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -67,7 +67,7 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> dma_cookie_t cookie;
>
> block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> - block->bytes_used = rounddown(block->bytes_used,
> + block->bytes_used = round_down(block->bytes_used,
> dmaengine_buffer->align);
>
> desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,


2021-11-21 14:16:00

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support

On Mon, 15 Nov 2021 14:19:14 +0000
Paul Cercueil <[email protected]> wrote:

> Adding write support to the buffer-dma code is easy - the write()
> function basically needs to do the exact same thing as the read()
> function: dequeue a block, read or write the data, enqueue the block
> when entirely processed.
>
> Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write()
> now both call a function iio_buffer_dma_io(), which will perform this
> task.
>
> The .space_available() callback can return the exact same value as the
> .data_available() callback for input buffers, since in both cases we
> count the exact same thing (the number of bytes in each available
> block).
>
> Signed-off-by: Paul Cercueil <[email protected]>
Hi Paul,

There are a few changes in here, such as the bytes_used value being set that
I'm not following the reasoning behind. More info on those?
Also good to provide something about those in this patch description.

Thanks,

Jonathan


> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 75 +++++++++++++++-----
> include/linux/iio/buffer-dma.h | 7 ++
> 2 files changed, 66 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index abac88f20104..eeeed6b2e0cf 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> }
>
> block->size = size;
> - block->state = IIO_BLOCK_STATE_DEQUEUED;
> + block->bytes_used = size;
> + block->state = IIO_BLOCK_STATE_DONE;

I don't know why these are here - some more info?

> block->queue = queue;
> INIT_LIST_HEAD(&block->head);
> kref_init(&block->kref);
> @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> block->state = IIO_BLOCK_STATE_DONE;
> }
>
> +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
> +{
> + __poll_t flags;
> +
> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> + flags = EPOLLIN | EPOLLRDNORM;
> + else
> + flags = EPOLLOUT | EPOLLWRNORM;
> +
> + wake_up_interruptible_poll(&queue->buffer.pollq, flags);
> +}
> +
> /**
> * iio_dma_buffer_block_done() - Indicate that a block has been completed
> * @block: The completed block
> @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> spin_unlock_irqrestore(&queue->list_lock, flags);
>
> iio_buffer_block_put_atomic(block);
> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> + iio_dma_buffer_queue_wake(queue);
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
>
> @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
> }
> spin_unlock_irqrestore(&queue->list_lock, flags);
>
> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> + iio_dma_buffer_queue_wake(queue);
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
>
> @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> queue->fileio.blocks[i] = block;
> }
>
> - block->state = IIO_BLOCK_STATE_QUEUED;
> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> + block->state = IIO_BLOCK_STATE_QUEUED;

Possibly worth a comment on the state being set here. I figured it out, but might
save some brain cells in future if it's stated in the code.

> }
>
> out_unlock:
> @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
> return block;
> }
>
> -/**
> - * iio_dma_buffer_read() - DMA buffer read callback
> - * @buffer: Buffer to read form
> - * @n: Number of bytes to read
> - * @user_buffer: Userspace buffer to copy the data to
> - *
> - * Should be used as the read callback for iio_buffer_access_ops
> - * struct for DMA buffers.
> - */
> -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> - char __user *user_buffer)
> +static int iio_dma_buffer_io(struct iio_buffer *buffer,
> + size_t n, char __user *user_buffer, bool is_write)
> {
> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> struct iio_dma_buffer_block *block;
> + void *addr;
> int ret;
>
> if (n < buffer->bytes_per_datum)
> @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> n = rounddown(n, buffer->bytes_per_datum);
> if (n > block->bytes_used - queue->fileio.pos)
> n = block->bytes_used - queue->fileio.pos;
> + addr = block->vaddr + queue->fileio.pos;
>
> - if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, n)) {
> + if (is_write)
> + ret = !!copy_from_user(addr, user_buffer, n);
> + else
> + ret = !!copy_to_user(user_buffer, addr, n);

What is the !! gaining us here? We only care about == 0 vs != 0 so
forcing it to be 0 or 1 isn't useful.

> + if (ret) {
> ret = -EFAULT;
> goto out_unlock;
> }
> @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>
> if (queue->fileio.pos == block->bytes_used) {
> queue->fileio.active_block = NULL;
> + block->bytes_used = block->size;

This seems to be a functional change that isn't called out in the patch description.

> iio_dma_buffer_enqueue(queue, block);
> }
>
> @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>
> return ret;
> }
> +
> +/**
> + * iio_dma_buffer_read() - DMA buffer read callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data to
> + *
> + * Should be used as the read callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> + char __user *user_buffer)
> +{
> + return iio_dma_buffer_io(buffer, n, user_buffer, false);
> +}
> EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
>
> +/**
> + * iio_dma_buffer_write() - DMA buffer write callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data from
> + *
> + * Should be used as the write callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> + const char __user *user_buffer)
> +{
> + return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, true);
> +}
> +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
> +
> /**
> * iio_dma_buffer_data_available() - DMA buffer data_available callback
> * @buf: Buffer to check for data availability
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index a65a005c4a19..09c07d5563c0 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
> struct iio_dev *indio_dev);
> int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> char __user *user_buffer);
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> + const char __user *user_buffer);
> size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
> int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, size_t bpd);
> int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned int length);
> @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
> void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
> void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
>
> +static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
> +{
> + return iio_dma_buffer_data_available(buffer);
> +}
> +
> #endif


2021-11-21 14:26:16

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure

On Mon, 15 Nov 2021 14:19:17 +0000
Paul Cercueil <[email protected]> wrote:

> Add the necessary infrastructure to the IIO core to support a new DMABUF
> based interface.
>
> The advantage of this new DMABUF based interface vs. the read()
> interface, is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.
>
> The data in this new DMABUF interface is managed at the granularity of
> DMABUF objects. Reducing the granularity from byte level to block level
> is done to reduce the userspace-kernelspace synchronization overhead
> since performing syscalls for each byte at a few Mbps is just not
> feasible.
>
> This of course leads to a slightly increased latency. For this reason an
> application can choose the size of the DMABUFs as well as how many it
> allocates. E.g. two DMABUFs would be a traditional double buffering
> scheme. But using a higher number might be necessary to avoid
> underflow/overflow situations in the presence of scheduling latencies.
>
> As part of the interface, 2 new IOCTLs have been added:
>
> IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *):
> Each call will allocate a new DMABUF object. The return value (if not
> a negative errno value as error) will be the file descriptor of the new
> DMABUF.
>
> IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
> Place the DMABUF object into the queue pending for hardware process.
>
> These two IOCTLs have to be performed on the IIO buffer's file
> descriptor (either opened from the corresponding /dev/iio:deviceX, or
> obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl).

Unrelated to this patch except tangentially. Maybe we should enable
new buffer features only on the IIO_BUFFER_GET_FD_IOCTL() route as
we probably want to deprecate the old interfaces due to the it only
supporting a single buffer / datastream per device.

Possibly something for another day...

Nothing to add on actual code...

Jonathan

>
> To access the data stored in a block by userspace the block must be
> mapped to the process's memory. This is done by calling mmap() on the
> DMABUF's file descriptor.
>
> Before accessing the data through the map, you must use the
> DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
> DMA_BUF_SYNC_START flag, to make sure that the data is available.
> This call may block until the hardware is done with this block. Once
> you are done reading or writing the data, you must use this ioctl again
> with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
> kernel's queue.
>
> If you need to know when the hardware is done with a DMABUF, you can
> poll its file descriptor for the EPOLLOUT event.
>
> Finally, to destroy a DMABUF object, simply call close() on its file
> descriptor.
>
> A typical workflow for the new interface is:
>
> for block in blocks:
> DMABUF_ALLOC block
> mmap block
>
> enable buffer
>
> while !done
> for block in blocks:
> DMABUF_ENQUEUE block
>
> DMABUF_SYNC_START block
> process data
> DMABUF_SYNC_END block
>
> disable buffer
>
> for block in blocks:
> close block
>
> Signed-off-by: Paul Cercueil <[email protected]>
> ---
> drivers/iio/industrialio-buffer.c | 44 +++++++++++++++++++++++++++++++
> include/linux/iio/buffer_impl.h | 8 ++++++
> include/uapi/linux/iio/buffer.h | 29 ++++++++++++++++++++
> 3 files changed, 81 insertions(+)
>
> diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
> index e180728914c0..30910e6c2346 100644
> --- a/drivers/iio/industrialio-buffer.c
> +++ b/drivers/iio/industrialio-buffer.c
> @@ -17,6 +17,7 @@
> #include <linux/fs.h>
> #include <linux/cdev.h>
> #include <linux/slab.h>
> +#include <linux/mm.h>
> #include <linux/poll.h>
> #include <linux/sched/signal.h>
>
> @@ -1585,12 +1586,55 @@ static long iio_device_buffer_getfd(struct iio_dev *indio_dev, unsigned long arg
> return ret;
> }
>
> +static int iio_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
> + struct iio_dmabuf __user *user_buf)
> +{
> + struct iio_dmabuf dmabuf;
> +
> + if (!buffer->access->enqueue_dmabuf)
> + return -EPERM;
> +
> + if (copy_from_user(&dmabuf, user_buf, sizeof(dmabuf)))
> + return -EFAULT;
> +
> + if (dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
> + return -EINVAL;
> +
> + return buffer->access->enqueue_dmabuf(buffer, &dmabuf);
> +}
> +
> +static int iio_buffer_alloc_dmabuf(struct iio_buffer *buffer,
> + struct iio_dmabuf_alloc_req __user *user_req)
> +{
> + struct iio_dmabuf_alloc_req req;
> +
> + if (!buffer->access->alloc_dmabuf)
> + return -EPERM;
> +
> + if (copy_from_user(&req, user_req, sizeof(req)))
> + return -EFAULT;
> +
> + if (req.resv)
> + return -EINVAL;
> +
> + return buffer->access->alloc_dmabuf(buffer, &req);
> +}
> +
> static long iio_device_buffer_ioctl(struct iio_dev *indio_dev, struct file *filp,
> unsigned int cmd, unsigned long arg)
> {
> + struct iio_dev_buffer_pair *ib = filp->private_data;
> + struct iio_buffer *buffer = ib->buffer;
> + void __user *_arg = (void __user *)arg;
> +
> switch (cmd) {
> case IIO_BUFFER_GET_FD_IOCTL:
> return iio_device_buffer_getfd(indio_dev, arg);
> + case IIO_BUFFER_DMABUF_ALLOC_IOCTL:
> + return iio_buffer_alloc_dmabuf(buffer, _arg);
> + case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
> + /* TODO: support non-blocking enqueue operation */
> + return iio_buffer_enqueue_dmabuf(buffer, _arg);
> default:
> return IIO_IOCTL_UNHANDLED;
> }
> diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h
> index e2ca8ea23e19..728541bc2c63 100644
> --- a/include/linux/iio/buffer_impl.h
> +++ b/include/linux/iio/buffer_impl.h
> @@ -39,6 +39,9 @@ struct iio_buffer;
> * device stops sampling. Calles are balanced with @enable.
> * @release: called when the last reference to the buffer is dropped,
> * should free all resources allocated by the buffer.
> + * @alloc_dmabuf: called from userspace via ioctl to allocate one DMABUF.
> + * @enqueue_dmabuf: called from userspace via ioctl to queue this DMABUF
> + * object to this buffer. Requires a valid DMABUF fd.
> * @modes: Supported operating modes by this buffer type
> * @flags: A bitmask combination of INDIO_BUFFER_FLAG_*
> *
> @@ -68,6 +71,11 @@ struct iio_buffer_access_funcs {
>
> void (*release)(struct iio_buffer *buffer);
>
> + int (*alloc_dmabuf)(struct iio_buffer *buffer,
> + struct iio_dmabuf_alloc_req *req);
> + int (*enqueue_dmabuf)(struct iio_buffer *buffer,
> + struct iio_dmabuf *block);
> +
> unsigned int modes;
> unsigned int flags;
> };
> diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
> index 13939032b3f6..e4621b926262 100644
> --- a/include/uapi/linux/iio/buffer.h
> +++ b/include/uapi/linux/iio/buffer.h
> @@ -5,6 +5,35 @@
> #ifndef _UAPI_IIO_BUFFER_H_
> #define _UAPI_IIO_BUFFER_H_
>
> +#include <linux/types.h>
> +
> +#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS 0x00000000
> +
> +/**
> + * struct iio_dmabuf_alloc_req - Descriptor for allocating IIO DMABUFs
> + * @size: the size of a single DMABUF
> + * @resv: reserved
> + */
> +struct iio_dmabuf_alloc_req {
> + __u64 size;
> + __u64 resv;
> +};
> +
> +/**
> + * struct iio_dmabuf - Descriptor for a single IIO DMABUF object
> + * @fd: file descriptor of the DMABUF object
> + * @flags: one or more IIO_BUFFER_DMABUF_* flags
> + * @bytes_used: number of bytes used in this DMABUF for the data transfer.
> + * If zero, the full buffer is used.
> + */
> +struct iio_dmabuf {
> + __u32 fd;
> + __u32 flags;
> + __u64 bytes_used;
> +};
> +
> #define IIO_BUFFER_GET_FD_IOCTL _IOWR('i', 0x91, int)
> +#define IIO_BUFFER_DMABUF_ALLOC_IOCTL _IOW('i', 0x92, struct iio_dmabuf_alloc_req)
> +#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL _IOW('i', 0x93, struct iio_dmabuf)
>
> #endif /* _UAPI_IIO_BUFFER_H_ */


2021-11-21 14:55:52

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

On Mon, 15 Nov 2021 14:19:21 +0000
Paul Cercueil <[email protected]> wrote:

> We can be certain that the input buffers will only be accessed by
> userspace for reading, and output buffers will mostly be accessed by
> userspace for writing.

Mostly? Perhaps a little more info on why that's not 'only'.

>
> Therefore, it makes more sense to use only fully cached input buffers,
> and to use the write-combine cache coherency setting for output buffers.
>
> This boosts performance, as the data written to the output buffers does
> not have to be sync'd for coherency. It will halve performance if the
> userspace application tries to read from the output buffer, but this
> should never happen.
>
> Since we don't need to sync the cache when disabling CPU access either
> for input buffers or output buffers, the .end_cpu_access() callback can
> be dropped completely.

We have an odd mix of coherent and non coherent DMA in here as you noted,
but are you sure this is safe on all platforms?

>
> Signed-off-by: Paul Cercueil <[email protected]>

Any numbers to support this patch? The mapping types are performance
optimisations so nice to know how much of a difference they make.


> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 82 +++++++++++++-------
> 1 file changed, 54 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index 92356ee02f30..fb39054d8c15 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct dma_buf *dbuf,
> if (vma->vm_ops->open)
> vma->vm_ops->open(vma);
>
> - return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> - virt_to_page(block->vaddr));
> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> + /*
> + * With an input buffer, userspace will only read the data and
> + * never write. We can mmap the buffer fully cached.
> + */
> + return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> + virt_to_page(block->vaddr));
> + } else {
> + /*
> + * With an output buffer, userspace will only write the data
> + * and should rarely (if never) read from it. It is better to
> + * use write-combine in this case.
> + */
> + return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
> + vma->vm_end - vma->vm_start);
> + }
> +}
> +
> +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block *block)
> +{
> + struct device *dev = block->queue->dev;
> + size_t size = PAGE_ALIGN(block->size);
> +
> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> + dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
> + else
> + dma_free_wc(dev, size, block->vaddr, block->phys_addr);
> }
>
> static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
> @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
>
> mutex_lock(&queue->lock);
>
> - dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> - block->vaddr, block->phys_addr);
> -
> + iio_dma_buffer_free_dmamem(block);
> kfree(block);
>
> queue->num_blocks--;
> @@ -268,19 +291,6 @@ static int iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
> return 0;
> }
>
> -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
> - enum dma_data_direction dma_dir)
> -{
> - struct iio_dma_buffer_block *block = dbuf->priv;
> - struct device *dev = block->queue->dev;
> -
> - /* We only need to sync the cache for output buffers */
> - if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
> - dma_sync_single_for_device(dev, block->phys_addr, block->size, dma_dir);
> -
> - return 0;
> -}
> -
> static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
> .attach = iio_buffer_dma_buf_attach,
> .map_dma_buf = iio_buffer_dma_buf_map,
> @@ -288,9 +298,28 @@ static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
> .mmap = iio_buffer_dma_buf_mmap,
> .release = iio_buffer_dma_buf_release,
> .begin_cpu_access = iio_buffer_dma_buf_begin_cpu_access,
> - .end_cpu_access = iio_buffer_dma_buf_end_cpu_access,
> };
>
> +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block *block)
> +{
> + struct device *dev = block->queue->dev;
> + size_t size = PAGE_ALIGN(block->size);
> +
> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> + block->vaddr = dma_alloc_coherent(dev, size,
> + &block->phys_addr,
> + GFP_KERNEL);
> + } else {
> + block->vaddr = dma_alloc_wc(dev, size,
> + &block->phys_addr,
> + GFP_KERNEL);
> + }
> + if (!block->vaddr)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
> {
> @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> if (!block)
> return ERR_PTR(-ENOMEM);
>
> - block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> - &block->phys_addr, GFP_KERNEL);
> - if (!block->vaddr) {
> - err = -ENOMEM;
> + block->size = size;
> + block->queue = queue;
> +
> + err = iio_dma_buffer_alloc_dmamem(block);
> + if (err)
> goto err_free_block;
> - }
>
> einfo.ops = &iio_dma_buffer_dmabuf_ops;
> einfo.size = PAGE_ALIGN(size);
> @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> }
>
> block->dmabuf = dmabuf;
> - block->size = size;
> block->bytes_used = size;
> block->state = IIO_BLOCK_STATE_DONE;
> - block->queue = queue;
> block->fileio = fileio;
> INIT_LIST_HEAD(&block->head);
>
> @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> return block;
>
> err_free_dma:
> - dma_free_coherent(queue->dev, PAGE_ALIGN(size),
> - block->vaddr, block->phys_addr);
> + iio_dma_buffer_free_dmamem(block);
> err_free_block:
> kfree(block);
> return ERR_PTR(err);


2021-11-21 16:23:46

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

On 11/15/21 3:19 PM, Paul Cercueil wrote:
> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
>
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
>
> Since the new DMABUF based API wouldn't use these incoming and outgoing
> queues anyway, getting rid of them now makes the upcoming changes
> simpler.
>
> Signed-off-by: Paul Cercueil <[email protected]>
The outgoing queue is going to be replaced by fences, but I think we
need to keep the incoming queue.
> [...]
> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
> static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
> struct iio_dma_buffer_block *block)
> {
> - if (block->state == IIO_BLOCK_STATE_DEAD) {
> + if (block->state == IIO_BLOCK_STATE_DEAD)
> iio_buffer_block_put(block);
> - } else if (queue->active) {
> + else if (queue->active)
> iio_dma_buffer_submit_block(queue, block);
> - } else {
> + else
> block->state = IIO_BLOCK_STATE_QUEUED;
> - list_add_tail(&block->head, &queue->incoming);
If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer is
not active, it will be marked as queued, but we don't actually keep a
reference to it anywhere. It will never be submitted to the DMA, and it
will never be signaled as completed.


2021-11-21 17:19:50

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support

Hi Jonathan,

Le dim., nov. 21 2021 at 14:20:49 +0000, Jonathan Cameron
<[email protected]> a ?crit :
> On Mon, 15 Nov 2021 14:19:14 +0000
> Paul Cercueil <[email protected]> wrote:
>
>> Adding write support to the buffer-dma code is easy - the write()
>> function basically needs to do the exact same thing as the read()
>> function: dequeue a block, read or write the data, enqueue the block
>> when entirely processed.
>>
>> Therefore, the iio_buffer_dma_read() and the new
>> iio_buffer_dma_write()
>> now both call a function iio_buffer_dma_io(), which will perform
>> this
>> task.
>>
>> The .space_available() callback can return the exact same value as
>> the
>> .data_available() callback for input buffers, since in both cases we
>> count the exact same thing (the number of bytes in each available
>> block).
>>
>> Signed-off-by: Paul Cercueil <[email protected]>
> Hi Paul,
>
> There are a few changes in here, such as the bytes_used value being
> set that
> I'm not following the reasoning behind. More info on those?
> Also good to provide something about those in this patch description.
>
> Thanks,
>
> Jonathan
>
>
>> ---
>> drivers/iio/buffer/industrialio-buffer-dma.c | 75
>> +++++++++++++++-----
>> include/linux/iio/buffer-dma.h | 7 ++
>> 2 files changed, 66 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c
>> b/drivers/iio/buffer/industrialio-buffer-dma.c
>> index abac88f20104..eeeed6b2e0cf 100644
>> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
>> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
>> @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block
>> *iio_dma_buffer_alloc_block(
>> }
>>
>> block->size = size;
>> - block->state = IIO_BLOCK_STATE_DEQUEUED;
>> + block->bytes_used = size;
>> + block->state = IIO_BLOCK_STATE_DONE;
>
> I don't know why these are here - some more info?

When using an input buffer the block->bytes_used is unconditionally
reset in iio_dmaengine_buffer_submit_block(), so this was fine until
now.

When using an output buffer the block->bytes_used can actually (with
the new API) be specified by the user, so we don't want
iio_dmaengine_buffer_submit_block() to unconditionally override it.
Which means that in the case where we have an output buffer in fileio
mode, we do need block->bytes_used to be initialized to the buffer's
size since it won't be set anywhere else.

About the change in block->state: in patch [01/15] we removed the
incoming/outgoing queues, and while the "enqueued" state is still
useful to know which buffers have to be submitted when the buffer is
enabled, the "dequeued" state is not useful anymore since there is no
more distinction vs. the "done" state.

I believe this change should be moved to patch [01/15] then, and I
should go further and remove the IIO_BLOCK_STATE_DEQUEUED completely.

>> block->queue = queue;
>> INIT_LIST_HEAD(&block->head);
>> kref_init(&block->kref);
>> @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct
>> iio_dma_buffer_block *block)
>> block->state = IIO_BLOCK_STATE_DONE;
>> }
>>
>> +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue
>> *queue)
>> +{
>> + __poll_t flags;
>> +
>> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
>> + flags = EPOLLIN | EPOLLRDNORM;
>> + else
>> + flags = EPOLLOUT | EPOLLWRNORM;
>> +
>> + wake_up_interruptible_poll(&queue->buffer.pollq, flags);
>> +}
>> +
>> /**
>> * iio_dma_buffer_block_done() - Indicate that a block has been
>> completed
>> * @block: The completed block
>> @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct
>> iio_dma_buffer_block *block)
>> spin_unlock_irqrestore(&queue->list_lock, flags);
>>
>> iio_buffer_block_put_atomic(block);
>> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN |
>> EPOLLRDNORM);
>> + iio_dma_buffer_queue_wake(queue);
>> }
>> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
>>
>> @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct
>> iio_dma_buffer_queue *queue,
>> }
>> spin_unlock_irqrestore(&queue->list_lock, flags);
>>
>> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN |
>> EPOLLRDNORM);
>> + iio_dma_buffer_queue_wake(queue);
>> }
>> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
>>
>> @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct
>> iio_buffer *buffer)
>> queue->fileio.blocks[i] = block;
>> }
>>
>> - block->state = IIO_BLOCK_STATE_QUEUED;
>> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
>> + block->state = IIO_BLOCK_STATE_QUEUED;
>
> Possibly worth a comment on the state being set here. I figured it
> out, but might
> save some brain cells in future if it's stated in the code.

Ok.

>> }
>>
>> out_unlock:
>> @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block
>> *iio_dma_buffer_dequeue(
>> return block;
>> }
>>
>> -/**
>> - * iio_dma_buffer_read() - DMA buffer read callback
>> - * @buffer: Buffer to read form
>> - * @n: Number of bytes to read
>> - * @user_buffer: Userspace buffer to copy the data to
>> - *
>> - * Should be used as the read callback for iio_buffer_access_ops
>> - * struct for DMA buffers.
>> - */
>> -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>> - char __user *user_buffer)
>> +static int iio_dma_buffer_io(struct iio_buffer *buffer,
>> + size_t n, char __user *user_buffer, bool is_write)
>> {
>> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
>> struct iio_dma_buffer_block *block;
>> + void *addr;
>> int ret;
>>
>> if (n < buffer->bytes_per_datum)
>> @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer
>> *buffer, size_t n,
>> n = rounddown(n, buffer->bytes_per_datum);
>> if (n > block->bytes_used - queue->fileio.pos)
>> n = block->bytes_used - queue->fileio.pos;
>> + addr = block->vaddr + queue->fileio.pos;
>>
>> - if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos,
>> n)) {
>> + if (is_write)
>> + ret = !!copy_from_user(addr, user_buffer, n);
>> + else
>> + ret = !!copy_to_user(user_buffer, addr, n);
>
> What is the !! gaining us here? We only care about == 0 vs != 0 so
> forcing it to be 0 or 1 isn't useful.

Right.

>> + if (ret) {
>> ret = -EFAULT;
>> goto out_unlock;
>> }
>> @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer
>> *buffer, size_t n,
>>
>> if (queue->fileio.pos == block->bytes_used) {
>> queue->fileio.active_block = NULL;
>> + block->bytes_used = block->size;
>
> This seems to be a functional change that isn't called out in the
> patch description.

See the explanation above. Although I most likely don't need to set it
at two different spots... I'll check that in detail next week.

Cheers,
-Paul

>> iio_dma_buffer_enqueue(queue, block);
>> }
>>
>> @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer
>> *buffer, size_t n,
>>
>> return ret;
>> }
>> +
>> +/**
>> + * iio_dma_buffer_read() - DMA buffer read callback
>> + * @buffer: Buffer to read form
>> + * @n: Number of bytes to read
>> + * @user_buffer: Userspace buffer to copy the data to
>> + *
>> + * Should be used as the read callback for iio_buffer_access_ops
>> + * struct for DMA buffers.
>> + */
>> +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>> + char __user *user_buffer)
>> +{
>> + return iio_dma_buffer_io(buffer, n, user_buffer, false);
>> +}
>> EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
>>
>> +/**
>> + * iio_dma_buffer_write() - DMA buffer write callback
>> + * @buffer: Buffer to read form
>> + * @n: Number of bytes to read
>> + * @user_buffer: Userspace buffer to copy the data from
>> + *
>> + * Should be used as the write callback for iio_buffer_access_ops
>> + * struct for DMA buffers.
>> + */
>> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
>> + const char __user *user_buffer)
>> +{
>> + return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer,
>> true);
>> +}
>> +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
>> +
>> /**
>> * iio_dma_buffer_data_available() - DMA buffer data_available
>> callback
>> * @buf: Buffer to check for data availability
>> diff --git a/include/linux/iio/buffer-dma.h
>> b/include/linux/iio/buffer-dma.h
>> index a65a005c4a19..09c07d5563c0 100644
>> --- a/include/linux/iio/buffer-dma.h
>> +++ b/include/linux/iio/buffer-dma.h
>> @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer
>> *buffer,
>> struct iio_dev *indio_dev);
>> int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>> char __user *user_buffer);
>> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
>> + const char __user *user_buffer);
>> size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
>> int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer,
>> size_t bpd);
>> int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned
>> int length);
>> @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct
>> iio_dma_buffer_queue *queue,
>> void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
>> void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
>>
>> +static inline size_t iio_dma_buffer_space_available(struct
>> iio_buffer *buffer)
>> +{
>> + return iio_dma_buffer_data_available(buffer);
>> +}
>> +
>> #endif
>



2021-11-21 17:43:40

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

Hi Jonathan,

Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron
<[email protected]> a ?crit :
> On Mon, 15 Nov 2021 14:19:21 +0000
> Paul Cercueil <[email protected]> wrote:
>
>> We can be certain that the input buffers will only be accessed by
>> userspace for reading, and output buffers will mostly be accessed by
>> userspace for writing.
>
> Mostly? Perhaps a little more info on why that's not 'only'.

Just like with a framebuffer, it really depends on what the application
does. Most of the cases it will just read sequentially an input buffer,
or write sequentially an output buffer. But then you get the exotic
application that will try to do something like alpha blending, which
means read+write. Hence "mostly".

>>
>> Therefore, it makes more sense to use only fully cached input
>> buffers,
>> and to use the write-combine cache coherency setting for output
>> buffers.
>>
>> This boosts performance, as the data written to the output buffers
>> does
>> not have to be sync'd for coherency. It will halve performance if
>> the
>> userspace application tries to read from the output buffer, but this
>> should never happen.
>>
>> Since we don't need to sync the cache when disabling CPU access
>> either
>> for input buffers or output buffers, the .end_cpu_access() callback
>> can
>> be dropped completely.
>
> We have an odd mix of coherent and non coherent DMA in here as you
> noted,
> but are you sure this is safe on all platforms?

The mix isn't safe, but using only coherent or only non-coherent should
be safe, yes.

>
>>
>> Signed-off-by: Paul Cercueil <[email protected]>
>
> Any numbers to support this patch? The mapping types are performance
> optimisations so nice to know how much of a difference they make.

Output buffers are definitely faster in write-combine mode. On a
ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size set
to 8192, I would get about 185 MiB/s before, 197 MiB/s after.

Input buffers... early results are mixed. On ARM32 it does look like it
is slightly faster to read from *uncached* memory than reading from
cached memory. The cache sync does take a long time.

Other architectures might have a different result, for instance on MIPS
invalidating the cache is a very fast operation, so using cached
buffers would be a huge win in performance.

Setups where the DMA operations are coherent also wouldn't require any
cache sync and this patch would give a huge win in performance.

I'll run some more tests next week to have some fresh numbers.

Cheers,
-Paul

>> ---
>> drivers/iio/buffer/industrialio-buffer-dma.c | 82
>> +++++++++++++-------
>> 1 file changed, 54 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c
>> b/drivers/iio/buffer/industrialio-buffer-dma.c
>> index 92356ee02f30..fb39054d8c15 100644
>> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
>> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
>> @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct
>> dma_buf *dbuf,
>> if (vma->vm_ops->open)
>> vma->vm_ops->open(vma);
>>
>> - return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
>> - virt_to_page(block->vaddr));
>> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
>> + /*
>> + * With an input buffer, userspace will only read the data and
>> + * never write. We can mmap the buffer fully cached.
>> + */
>> + return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
>> + virt_to_page(block->vaddr));
>> + } else {
>> + /*
>> + * With an output buffer, userspace will only write the data
>> + * and should rarely (if never) read from it. It is better to
>> + * use write-combine in this case.
>> + */
>> + return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
>> + vma->vm_end - vma->vm_start);
>> + }
>> +}
>> +
>> +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block
>> *block)
>> +{
>> + struct device *dev = block->queue->dev;
>> + size_t size = PAGE_ALIGN(block->size);
>> +
>> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
>> + dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
>> + else
>> + dma_free_wc(dev, size, block->vaddr, block->phys_addr);
>> }
>>
>> static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
>> @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct
>> dma_buf *dbuf)
>>
>> mutex_lock(&queue->lock);
>>
>> - dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
>> - block->vaddr, block->phys_addr);
>> -
>> + iio_dma_buffer_free_dmamem(block);
>> kfree(block);
>>
>> queue->num_blocks--;
>> @@ -268,19 +291,6 @@ static int
>> iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
>> return 0;
>> }
>>
>> -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
>> - enum dma_data_direction dma_dir)
>> -{
>> - struct iio_dma_buffer_block *block = dbuf->priv;
>> - struct device *dev = block->queue->dev;
>> -
>> - /* We only need to sync the cache for output buffers */
>> - if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
>> - dma_sync_single_for_device(dev, block->phys_addr, block->size,
>> dma_dir);
>> -
>> - return 0;
>> -}
>> -
>> static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
>> .attach = iio_buffer_dma_buf_attach,
>> .map_dma_buf = iio_buffer_dma_buf_map,
>> @@ -288,9 +298,28 @@ static const struct dma_buf_ops
>> iio_dma_buffer_dmabuf_ops = {
>> .mmap = iio_buffer_dma_buf_mmap,
>> .release = iio_buffer_dma_buf_release,
>> .begin_cpu_access = iio_buffer_dma_buf_begin_cpu_access,
>> - .end_cpu_access = iio_buffer_dma_buf_end_cpu_access,
>> };
>>
>> +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block
>> *block)
>> +{
>> + struct device *dev = block->queue->dev;
>> + size_t size = PAGE_ALIGN(block->size);
>> +
>> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
>> + block->vaddr = dma_alloc_coherent(dev, size,
>> + &block->phys_addr,
>> + GFP_KERNEL);
>> + } else {
>> + block->vaddr = dma_alloc_wc(dev, size,
>> + &block->phys_addr,
>> + GFP_KERNEL);
>> + }
>> + if (!block->vaddr)
>> + return -ENOMEM;
>> +
>> + return 0;
>> +}
>> +
>> static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>> struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
>> {
>> @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block
>> *iio_dma_buffer_alloc_block(
>> if (!block)
>> return ERR_PTR(-ENOMEM);
>>
>> - block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
>> - &block->phys_addr, GFP_KERNEL);
>> - if (!block->vaddr) {
>> - err = -ENOMEM;
>> + block->size = size;
>> + block->queue = queue;
>> +
>> + err = iio_dma_buffer_alloc_dmamem(block);
>> + if (err)
>> goto err_free_block;
>> - }
>>
>> einfo.ops = &iio_dma_buffer_dmabuf_ops;
>> einfo.size = PAGE_ALIGN(size);
>> @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block
>> *iio_dma_buffer_alloc_block(
>> }
>>
>> block->dmabuf = dmabuf;
>> - block->size = size;
>> block->bytes_used = size;
>> block->state = IIO_BLOCK_STATE_DONE;
>> - block->queue = queue;
>> block->fileio = fileio;
>> INIT_LIST_HEAD(&block->head);
>>
>> @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block
>> *iio_dma_buffer_alloc_block(
>> return block;
>>
>> err_free_dma:
>> - dma_free_coherent(queue->dev, PAGE_ALIGN(size),
>> - block->vaddr, block->phys_addr);
>> + iio_dma_buffer_free_dmamem(block);
>> err_free_block:
>> kfree(block);
>> return ERR_PTR(err);
>



2021-11-21 17:53:04

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

Hi Lars,

Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
<[email protected]> a ?crit :
> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>> The buffer-dma code was using two queues, incoming and outgoing, to
>> manage the state of the blocks in use.
>>
>> While this totally works, it adds some complexity to the code,
>> especially since the code only manages 2 blocks. It is much easier to
>> just check each block's state manually, and keep a counter for the
>> next
>> block to dequeue.
>>
>> Since the new DMABUF based API wouldn't use these incoming and
>> outgoing
>> queues anyway, getting rid of them now makes the upcoming changes
>> simpler.
>>
>> Signed-off-by: Paul Cercueil <[email protected]>
> The outgoing queue is going to be replaced by fences, but I think we
> need to keep the incoming queue.

Blocks are always accessed in sequential order, so we now have a
"queue->next_dequeue" that cycles between the buffers allocated for
fileio.

>> [...]
>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>> static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue
>> *queue,
>> struct iio_dma_buffer_block *block)
>> {
>> - if (block->state == IIO_BLOCK_STATE_DEAD) {
>> + if (block->state == IIO_BLOCK_STATE_DEAD)
>> iio_buffer_block_put(block);
>> - } else if (queue->active) {
>> + else if (queue->active)
>> iio_dma_buffer_submit_block(queue, block);
>> - } else {
>> + else
>> block->state = IIO_BLOCK_STATE_QUEUED;
>> - list_add_tail(&block->head, &queue->incoming);
> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer is
> not active, it will be marked as queued, but we don't actually keep a
> reference to it anywhere. It will never be submitted to the DMA, and
> it will never be signaled as completed.

We do keep a reference to the buffers, in the queue->fileio.blocks
array. When the buffer is enabled, all the blocks in that array that
are in the "queued" state will be submitted to the DMA.

Cheers,
-Paul



2021-11-21 18:49:14

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

On 11/21/21 6:52 PM, Paul Cercueil wrote:
> Hi Lars,
>
> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
> <[email protected]> a écrit :
>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>> manage the state of the blocks in use.
>>>
>>> While this totally works, it adds some complexity to the code,
>>> especially since the code only manages 2 blocks. It is much easier to
>>> just check each block's state manually, and keep a counter for the next
>>> block to dequeue.
>>>
>>> Since the new DMABUF based API wouldn't use these incoming and outgoing
>>> queues anyway, getting rid of them now makes the upcoming changes
>>> simpler.
>>>
>>> Signed-off-by: Paul Cercueil <[email protected]>
>> The outgoing queue is going to be replaced by fences, but I think we
>> need to keep the incoming queue.
>
> Blocks are always accessed in sequential order, so we now have a
> "queue->next_dequeue" that cycles between the buffers allocated for
> fileio.
>
>>> [...]
>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue
>>> *queue,
>>>       struct iio_dma_buffer_block *block)
>>>   {
>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>           iio_buffer_block_put(block);
>>> -    } else if (queue->active) {
>>> +    else if (queue->active)
>>>           iio_dma_buffer_submit_block(queue, block);
>>> -    } else {
>>> +    else
>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>> -        list_add_tail(&block->head, &queue->incoming);
>> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer is
>> not active, it will be marked as queued, but we don't actually keep a
>> reference to it anywhere. It will never be submitted to the DMA, and
>> it will never be signaled as completed.
>
> We do keep a reference to the buffers, in the queue->fileio.blocks
> array. When the buffer is enabled, all the blocks in that array that
> are in the "queued" state will be submitted to the DMA.
>
But not when used in combination with the DMA buf changes later in this
series.


2021-11-21 20:08:52

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues



Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen
<[email protected]> a ?crit :
> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>> Hi Lars,
>>
>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
>> <[email protected]> a ?crit :
>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>>> manage the state of the blocks in use.
>>>>
>>>> While this totally works, it adds some complexity to the code,
>>>> especially since the code only manages 2 blocks. It is much easier
>>>> to
>>>> just check each block's state manually, and keep a counter for the
>>>> next
>>>> block to dequeue.
>>>>
>>>> Since the new DMABUF based API wouldn't use these incoming and
>>>> outgoing
>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>> simpler.
>>>>
>>>> Signed-off-by: Paul Cercueil <[email protected]>
>>> The outgoing queue is going to be replaced by fences, but I think
>>> we need to keep the incoming queue.
>>
>> Blocks are always accessed in sequential order, so we now have a
>> "queue->next_dequeue" that cycles between the buffers allocated for
>> fileio.
>>
>>>> [...]
>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>> static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue
>>>> *queue,
>>>> struct iio_dma_buffer_block *block)
>>>> {
>>>> - if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>> + if (block->state == IIO_BLOCK_STATE_DEAD)
>>>> iio_buffer_block_put(block);
>>>> - } else if (queue->active) {
>>>> + else if (queue->active)
>>>> iio_dma_buffer_submit_block(queue, block);
>>>> - } else {
>>>> + else
>>>> block->state = IIO_BLOCK_STATE_QUEUED;
>>>> - list_add_tail(&block->head, &queue->incoming);
>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer
>>> is not active, it will be marked as queued, but we don't actually
>>> keep a reference to it anywhere. It will never be submitted to
>>> the DMA, and it will never be signaled as completed.
>>
>> We do keep a reference to the buffers, in the queue->fileio.blocks
>> array. When the buffer is enabled, all the blocks in that array
>> that are in the "queued" state will be submitted to the DMA.
>>
> But not when used in combination with the DMA buf changes later in
> this series.
>

That's still the case after the DMABUF changes of the series. Or can
you point me exactly what you think is broken?

-Paul



2021-11-22 10:00:34

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()

Hi Jonathan,

Le dim., nov. 21 2021 at 14:08:23 +0000, Jonathan Cameron
<[email protected]> a ?crit :
> On Mon, 15 Nov 2021 14:19:13 +0000
> Paul Cercueil <[email protected]> wrote:
>
>> We know that the buffer's alignment will always be a power of two;
>> therefore, we can use the faster round_down() macro.
>>
>> Signed-off-by: Paul Cercueil <[email protected]>
> *groan*. I don't want to know where the naming of these two came
> from but that
> is spectacular...
>
> Anyhow, happy to pick up 1-3 now if you like as all are good cleanup
> of
> existing code.

I think you can pick 2-3 now; I will do some changes to patch [01/15]
in the V2.

Cheers,
-Paul



2021-11-22 15:09:01

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

On 11/21/21 9:08 PM, Paul Cercueil wrote:
>
>
> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen
> <[email protected]> a écrit :
>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>> Hi Lars,
>>>
>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
>>> <[email protected]> a écrit :
>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>>>> manage the state of the blocks in use.
>>>>>
>>>>> While this totally works, it adds some complexity to the code,
>>>>> especially since the code only manages 2 blocks. It is much easier to
>>>>> just check each block's state manually, and keep a counter for the
>>>>> next
>>>>> block to dequeue.
>>>>>
>>>>> Since the new DMABUF based API wouldn't use these incoming and
>>>>> outgoing
>>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>>> simpler.
>>>>>
>>>>> Signed-off-by: Paul Cercueil <[email protected]>
>>>> The outgoing queue is going to be replaced by fences, but I think
>>>> we need to keep the incoming queue.
>>>
>>> Blocks are always accessed in sequential order, so we now have a
>>> "queue->next_dequeue" that cycles between the buffers allocated for
>>> fileio.
>>>
>>>>> [...]
>>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue
>>>>> *queue,
>>>>>       struct iio_dma_buffer_block *block)
>>>>>   {
>>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>           iio_buffer_block_put(block);
>>>>> -    } else if (queue->active) {
>>>>> +    else if (queue->active)
>>>>>           iio_dma_buffer_submit_block(queue, block);
>>>>> -    } else {
>>>>> +    else
>>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>>> -        list_add_tail(&block->head, &queue->incoming);
>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer
>>>> is not active, it will be marked as queued, but we don't actually
>>>> keep a reference to it anywhere. It will never be submitted to
>>>> the DMA, and it will never be signaled as completed.
>>>
>>> We do keep a reference to the buffers, in the queue->fileio.blocks
>>> array. When the buffer is enabled, all the blocks in that array
>>> that are in the "queued" state will be submitted to the DMA.
>>>
>> But not when used in combination with the DMA buf changes later in
>> this series.
>>
>
> That's still the case after the DMABUF changes of the series. Or can
> you point me exactly what you think is broken?
>
When you allocate a DMABUF with the allocate IOCTL and then submit it
with the enqueue IOCTL before the buffer is enabled it will end up
marked as queued, but not actually be queued anywhere.



2021-11-22 15:16:24

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

Hi Lars,

Le lun., nov. 22 2021 at 16:08:51 +0100, Lars-Peter Clausen
<[email protected]> a ?crit :
> On 11/21/21 9:08 PM, Paul Cercueil wrote:
>>
>>
>> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen
>> <[email protected]> a ?crit :
>>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>>> Hi Lars,
>>>>
>>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
>>>> <[email protected]> a ?crit :
>>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>>> The buffer-dma code was using two queues, incoming and outgoing,
>>>>>> to
>>>>>> manage the state of the blocks in use.
>>>>>>
>>>>>> While this totally works, it adds some complexity to the code,
>>>>>> especially since the code only manages 2 blocks. It is much
>>>>>> easier to
>>>>>> just check each block's state manually, and keep a counter for
>>>>>> the next
>>>>>> block to dequeue.
>>>>>>
>>>>>> Since the new DMABUF based API wouldn't use these incoming and
>>>>>> outgoing
>>>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>>>> simpler.
>>>>>>
>>>>>> Signed-off-by: Paul Cercueil <[email protected]>
>>>>> The outgoing queue is going to be replaced by fences, but I think
>>>>> we need to keep the incoming queue.
>>>>
>>>> Blocks are always accessed in sequential order, so we now have a
>>>> "queue->next_dequeue" that cycles between the buffers
>>>> allocated for fileio.
>>>>
>>>>>> [...]
>>>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>> static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue
>>>>>> *queue,
>>>>>> struct iio_dma_buffer_block *block)
>>>>>> {
>>>>>> - if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>>> + if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>> iio_buffer_block_put(block);
>>>>>> - } else if (queue->active) {
>>>>>> + else if (queue->active)
>>>>>> iio_dma_buffer_submit_block(queue, block);
>>>>>> - } else {
>>>>>> + else
>>>>>> block->state = IIO_BLOCK_STATE_QUEUED;
>>>>>> - list_add_tail(&block->head, &queue->incoming);
>>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the
>>>>> buffer is not active, it will be marked as queued, but we
>>>>> don't actually keep a reference to it anywhere. It will
>>>>> never be submitted to the DMA, and it will never be
>>>>> signaled as completed.
>>>>
>>>> We do keep a reference to the buffers, in the queue->fileio.blocks
>>>> array. When the buffer is enabled, all the blocks in that
>>>> array that are in the "queued" state will be submitted to the
>>>> DMA.
>>>>
>>> But not when used in combination with the DMA buf changes later in
>>> this series.
>>>
>>
>> That's still the case after the DMABUF changes of the series. Or can
>> you point me exactly what you think is broken?
>>
> When you allocate a DMABUF with the allocate IOCTL and then submit it
> with the enqueue IOCTL before the buffer is enabled it will end up
> marked as queued, but not actually be queued anywhere.
>

Ok, it works for me because I never enqueue blocks before enabling the
buffer. I can add a requirement that blocks must be enqueued only after
the buffer is enabled.

Cheers,
-Paul



2021-11-22 15:18:06

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues

On 11/22/21 4:16 PM, Paul Cercueil wrote:
> Hi Lars,
>
> Le lun., nov. 22 2021 at 16:08:51 +0100, Lars-Peter Clausen
> <[email protected]> a écrit :
>> On 11/21/21 9:08 PM, Paul Cercueil wrote:
>>>
>>>
>>> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen
>>> <[email protected]> a écrit :
>>>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>>>> Hi Lars,
>>>>>
>>>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
>>>>> <[email protected]> a écrit :
>>>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>>>>>> manage the state of the blocks in use.
>>>>>>>
>>>>>>> While this totally works, it adds some complexity to the code,
>>>>>>> especially since the code only manages 2 blocks. It is much
>>>>>>> easier to
>>>>>>> just check each block's state manually, and keep a counter for
>>>>>>> the next
>>>>>>> block to dequeue.
>>>>>>>
>>>>>>> Since the new DMABUF based API wouldn't use these incoming and
>>>>>>> outgoing
>>>>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>>>>> simpler.
>>>>>>>
>>>>>>> Signed-off-by: Paul Cercueil <[email protected]>
>>>>>> The outgoing queue is going to be replaced by fences, but I think
>>>>>> we need to keep the incoming queue.
>>>>>
>>>>> Blocks are always accessed in sequential order, so we now have a
>>>>> "queue->next_dequeue" that cycles between the buffers
>>>>> allocated for fileio.
>>>>>
>>>>>>> [...]
>>>>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue
>>>>>>> *queue,
>>>>>>>       struct iio_dma_buffer_block *block)
>>>>>>>   {
>>>>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>>>           iio_buffer_block_put(block);
>>>>>>> -    } else if (queue->active) {
>>>>>>> +    else if (queue->active)
>>>>>>>           iio_dma_buffer_submit_block(queue, block);
>>>>>>> -    } else {
>>>>>>> +    else
>>>>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>>>>> -        list_add_tail(&block->head, &queue->incoming);
>>>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the
>>>>>> buffer is not active, it will be marked as queued, but we
>>>>>> don't actually keep a reference to it anywhere. It will
>>>>>> never be submitted to the DMA, and it will never be
>>>>>> signaled as completed.
>>>>>
>>>>> We do keep a reference to the buffers, in the queue->fileio.blocks
>>>>> array. When the buffer is enabled, all the blocks in that
>>>>> array that are in the "queued" state will be submitted to the
>>>>> DMA.
>>>>>
>>>> But not when used in combination with the DMA buf changes later in
>>>> this series.
>>>>
>>>
>>> That's still the case after the DMABUF changes of the series. Or can
>>> you point me exactly what you think is broken?
>>>
>> When you allocate a DMABUF with the allocate IOCTL and then submit it
>> with the enqueue IOCTL before the buffer is enabled it will end up
>> marked as queued, but not actually be queued anywhere.
>>
>
> Ok, it works for me because I never enqueue blocks before enabling the
> buffer. I can add a requirement that blocks must be enqueued only
> after the buffer is enabled.

I don't think that is a good idea. This way you are going to potentially
drop data at the begining of your stream when the DMA isn't ready yet.


2021-11-22 15:27:31

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues



Le lun., nov. 22 2021 at 16:17:59 +0100, Lars-Peter Clausen
<[email protected]> a ?crit :
> On 11/22/21 4:16 PM, Paul Cercueil wrote:
>> Hi Lars,
>>
>> Le lun., nov. 22 2021 at 16:08:51 +0100, Lars-Peter Clausen
>> <[email protected]> a ?crit :
>>> On 11/21/21 9:08 PM, Paul Cercueil wrote:
>>>>
>>>>
>>>> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen
>>>> <[email protected]> a ?crit :
>>>>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>>>>> Hi Lars,
>>>>>>
>>>>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen
>>>>>> <[email protected]> a ?crit :
>>>>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>>>>> The buffer-dma code was using two queues, incoming and
>>>>>>>> outgoing, to
>>>>>>>> manage the state of the blocks in use.
>>>>>>>>
>>>>>>>> While this totally works, it adds some complexity to the code,
>>>>>>>> especially since the code only manages 2 blocks. It is much
>>>>>>>> easier to
>>>>>>>> just check each block's state manually, and keep a counter for
>>>>>>>> the next
>>>>>>>> block to dequeue.
>>>>>>>>
>>>>>>>> Since the new DMABUF based API wouldn't use these incoming and
>>>>>>>> outgoing
>>>>>>>> queues anyway, getting rid of them now makes the upcoming
>>>>>>>> changes
>>>>>>>> simpler.
>>>>>>>>
>>>>>>>> Signed-off-by: Paul Cercueil <[email protected]>
>>>>>>> The outgoing queue is going to be replaced by fences, but I
>>>>>>> think we need to keep the incoming queue.
>>>>>>
>>>>>> Blocks are always accessed in sequential order, so we now have a
>>>>>> "queue->next_dequeue" that cycles between the buffers
>>>>>> allocated for fileio.
>>>>>>
>>>>>>>> [...]
>>>>>>>> @@ -442,28 +435,33 @@
>>>>>>>> EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>>>> static void iio_dma_buffer_enqueue(struct
>>>>>>>> iio_dma_buffer_queue *queue,
>>>>>>>> struct iio_dma_buffer_block *block)
>>>>>>>> {
>>>>>>>> - if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>>>>> + if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>>>> iio_buffer_block_put(block);
>>>>>>>> - } else if (queue->active) {
>>>>>>>> + else if (queue->active)
>>>>>>>> iio_dma_buffer_submit_block(queue, block);
>>>>>>>> - } else {
>>>>>>>> + else
>>>>>>>> block->state = IIO_BLOCK_STATE_QUEUED;
>>>>>>>> - list_add_tail(&block->head, &queue->incoming);
>>>>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the
>>>>>>> buffer is not active, it will be marked as queued,
>>>>>>> but we don't actually keep a reference to it
>>>>>>> anywhere. It will never be submitted to the DMA, and
>>>>>>> it will never be signaled as completed.
>>>>>>
>>>>>> We do keep a reference to the buffers, in the
>>>>>> queue->fileio.blocks array. When the buffer is enabled,
>>>>>> all the blocks in that array that are in the "queued"
>>>>>> state will be submitted to the DMA.
>>>>>>
>>>>> But not when used in combination with the DMA buf changes later
>>>>> in this series.
>>>>>
>>>>
>>>> That's still the case after the DMABUF changes of the series. Or
>>>> can you point me exactly what you think is broken?
>>>>
>>> When you allocate a DMABUF with the allocate IOCTL and then submit
>>> it with the enqueue IOCTL before the buffer is enabled it will
>>> end up marked as queued, but not actually be queued anywhere.
>>>
>>
>> Ok, it works for me because I never enqueue blocks before enabling
>> the buffer. I can add a requirement that blocks must be enqueued
>> only after the buffer is enabled.
>
> I don't think that is a good idea. This way you are going to
> potentially drop data at the begining of your stream when the DMA
> isn't ready yet.
>

You wouldn't drop data, but it could cause an underrun, yes. Is it such
a big deal, knowing that the buffer was just enabled? I don't think you
can disable then enable the buffer without causing a discontinuity in
the stream.

-Paul



2021-11-25 17:32:13

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

Hi Jonathan,

Le dim., nov. 21 2021 at 17:43:20 +0000, Paul Cercueil
<[email protected]> a ?crit :
> Hi Jonathan,
>
> Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron
> <[email protected]> a ?crit :
>> On Mon, 15 Nov 2021 14:19:21 +0000
>> Paul Cercueil <[email protected]> wrote:
>>
>>> We can be certain that the input buffers will only be accessed by
>>> userspace for reading, and output buffers will mostly be accessed
>>> by
>>> userspace for writing.
>>
>> Mostly? Perhaps a little more info on why that's not 'only'.
>
> Just like with a framebuffer, it really depends on what the
> application does. Most of the cases it will just read sequentially an
> input buffer, or write sequentially an output buffer. But then you
> get the exotic application that will try to do something like alpha
> blending, which means read+write. Hence "mostly".
>
>>>
>>> Therefore, it makes more sense to use only fully cached input
>>> buffers,
>>> and to use the write-combine cache coherency setting for output
>>> buffers.
>>>
>>> This boosts performance, as the data written to the output buffers
>>> does
>>> not have to be sync'd for coherency. It will halve performance if
>>> the
>>> userspace application tries to read from the output buffer, but
>>> this
>>> should never happen.
>>>
>>> Since we don't need to sync the cache when disabling CPU access
>>> either
>>> for input buffers or output buffers, the .end_cpu_access()
>>> callback can
>>> be dropped completely.
>>
>> We have an odd mix of coherent and non coherent DMA in here as you
>> noted,
>> but are you sure this is safe on all platforms?
>
> The mix isn't safe, but using only coherent or only non-coherent
> should be safe, yes.
>
>>
>>>
>>> Signed-off-by: Paul Cercueil <[email protected]>
>>
>> Any numbers to support this patch? The mapping types are performance
>> optimisations so nice to know how much of a difference they make.
>
> Output buffers are definitely faster in write-combine mode. On a
> ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size
> set to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
>
> Input buffers... early results are mixed. On ARM32 it does look like
> it is slightly faster to read from *uncached* memory than reading
> from cached memory. The cache sync does take a long time.
>
> Other architectures might have a different result, for instance on
> MIPS invalidating the cache is a very fast operation, so using cached
> buffers would be a huge win in performance.
>
> Setups where the DMA operations are coherent also wouldn't require
> any cache sync and this patch would give a huge win in performance.
>
> I'll run some more tests next week to have some fresh numbers.

I think I mixed things up before, because I get different results now.

Here are some fresh benchmarks, triple-checked, using libiio's
iio_readdev and iio_writedev tools, with 64K samples buffers at 61.44
MSPS (max. theorical throughput: 234 MiB/s):
iio_readdev -b 65536 cf-ad9361-lpc voltage0 voltage1 | pv > /dev/null
pv /dev/zero | iio_writedev -b 65536 cf-ad9361-dds-core-lpc voltage0
voltage1

Coherent mapping:
- fileio:
read: 125 MiB/s
write: 141 MiB/s
- dmabuf:
read: 171 MiB/s
write: 210 MiB/s

Coherent reads + Write-combine writes:
- fileio:
read: 125 MiB/s
write: 141 MiB/s
- dmabuf:
read: 171 MiB/s
write: 210 MiB/s

Non-coherent mapping:
- fileio:
read: 119 MiB/s
write: 124 MiB/s
- dmabuf:
read: 159 MiB/s
write: 124 MiB/s

Non-coherent reads + write-combine writes:
- fileio:
read: 119 MiB/s
write: 140 MiB/s
- dmabuf:
read: 159 MiB/s
write: 210 MiB/s

Non-coherent mapping with no cache sync:
- fileio:
read: 156 MiB/s
write: 123 MiB/s
- dmabuf:
read: 234 MiB/s (capped by sample rate)
write: 182 MiB/s

Non-coherent reads with no cache sync + write-combine writes:
- fileio:
read: 156 MiB/s
write: 140 MiB/s
- dmabuf:
read: 234 MiB/s (capped by sample rate)
write: 210 MiB/s


A few things we can deduce from this:

* Write-combine is not available on Zynq/ARM? If it was working, it
should give a better performance than the coherent mapping, but it
doesn't seem to do anything at all. At least it doesn't harm
performance.

* Non-coherent + cache invalidation is definitely a good deal slower
than using coherent mapping, at least on ARM32. However, when the cache
sync is disabled (e.g. if the DMA operations are coherent) the reads
are much faster.

* The new dma-buf based API is a great deal faster than the fileio API.

So in the future we could use coherent reads + write-combine writes,
unless we know the DMA operations are coherent, and in this case use
non-coherent reads + write-combine writes.

Regarding this patch, unfortunately I cannot prove that write-combine
is faster, so I'll just drop this patch for now.

Cheers,
-Paul



2021-11-27 15:12:46

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()

On Mon, 22 Nov 2021 10:00:19 +0000
Paul Cercueil <[email protected]> wrote:

> Hi Jonathan,
>
> Le dim., nov. 21 2021 at 14:08:23 +0000, Jonathan Cameron
> <[email protected]> a écrit :
> > On Mon, 15 Nov 2021 14:19:13 +0000
> > Paul Cercueil <[email protected]> wrote:
> >
> >> We know that the buffer's alignment will always be a power of two;
> >> therefore, we can use the faster round_down() macro.
> >>
> >> Signed-off-by: Paul Cercueil <[email protected]>
> > *groan*. I don't want to know where the naming of these two came
> > from but that
> > is spectacular...
> >
> > Anyhow, happy to pick up 1-3 now if you like as all are good cleanup
> > of
> > existing code.
>
> I think you can pick 2-3 now; I will do some changes to patch [01/15]
> in the V2.

Applied, 2+3 to the togreg branch of iio.git and pushed out as testing for all
the normal reasons.

Thanks,

Jonathan

>
> Cheers,
> -Paul
>
>


2021-11-27 15:15:01

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support

On Sun, 21 Nov 2021 17:19:32 +0000
Paul Cercueil <[email protected]> wrote:

> Hi Jonathan,
>
> Le dim., nov. 21 2021 at 14:20:49 +0000, Jonathan Cameron
> <[email protected]> a écrit :
> > On Mon, 15 Nov 2021 14:19:14 +0000
> > Paul Cercueil <[email protected]> wrote:
> >
> >> Adding write support to the buffer-dma code is easy - the write()
> >> function basically needs to do the exact same thing as the read()
> >> function: dequeue a block, read or write the data, enqueue the block
> >> when entirely processed.
> >>
> >> Therefore, the iio_buffer_dma_read() and the new
> >> iio_buffer_dma_write()
> >> now both call a function iio_buffer_dma_io(), which will perform
> >> this
> >> task.
> >>
> >> The .space_available() callback can return the exact same value as
> >> the
> >> .data_available() callback for input buffers, since in both cases we
> >> count the exact same thing (the number of bytes in each available
> >> block).
> >>
> >> Signed-off-by: Paul Cercueil <[email protected]>
> > Hi Paul,
> >
> > There are a few changes in here, such as the bytes_used value being
> > set that
> > I'm not following the reasoning behind. More info on those?
> > Also good to provide something about those in this patch description.
> >
> > Thanks,
> >
> > Jonathan
> >
> >
> >> ---
> >> drivers/iio/buffer/industrialio-buffer-dma.c | 75
> >> +++++++++++++++-----
> >> include/linux/iio/buffer-dma.h | 7 ++
> >> 2 files changed, 66 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c
> >> b/drivers/iio/buffer/industrialio-buffer-dma.c
> >> index abac88f20104..eeeed6b2e0cf 100644
> >> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> >> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> >> @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block
> >> *iio_dma_buffer_alloc_block(
> >> }
> >>
> >> block->size = size;
> >> - block->state = IIO_BLOCK_STATE_DEQUEUED;
> >> + block->bytes_used = size;
> >> + block->state = IIO_BLOCK_STATE_DONE;
> >
> > I don't know why these are here - some more info?
>
> When using an input buffer the block->bytes_used is unconditionally
> reset in iio_dmaengine_buffer_submit_block(), so this was fine until
> now.
>
> When using an output buffer the block->bytes_used can actually (with
> the new API) be specified by the user, so we don't want
> iio_dmaengine_buffer_submit_block() to unconditionally override it.
> Which means that in the case where we have an output buffer in fileio
> mode, we do need block->bytes_used to be initialized to the buffer's
> size since it won't be set anywhere else.

Makes sense. Thanks for the explanation.

>
> About the change in block->state: in patch [01/15] we removed the
> incoming/outgoing queues, and while the "enqueued" state is still
> useful to know which buffers have to be submitted when the buffer is
> enabled, the "dequeued" state is not useful anymore since there is no
> more distinction vs. the "done" state.
>
> I believe this change should be moved to patch [01/15] then, and I
> should go further and remove the IIO_BLOCK_STATE_DEQUEUED completely.

Complete removal would indeed be a more 'obvious' change, so I'd support
that assuming now disadvantages we havent' thought of yet.
>
> >> block->queue = queue;
> >> INIT_LIST_HEAD(&block->head);
> >> kref_init(&block->kref);
> >> @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct
> >> iio_dma_buffer_block *block)
> >> block->state = IIO_BLOCK_STATE_DONE;
> >> }
> >>
> >> +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue
> >> *queue)
> >> +{
> >> + __poll_t flags;
> >> +
> >> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> >> + flags = EPOLLIN | EPOLLRDNORM;
> >> + else
> >> + flags = EPOLLOUT | EPOLLWRNORM;
> >> +
> >> + wake_up_interruptible_poll(&queue->buffer.pollq, flags);
> >> +}
> >> +
> >> /**
> >> * iio_dma_buffer_block_done() - Indicate that a block has been
> >> completed
> >> * @block: The completed block
> >> @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct
> >> iio_dma_buffer_block *block)
> >> spin_unlock_irqrestore(&queue->list_lock, flags);
> >>
> >> iio_buffer_block_put_atomic(block);
> >> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN |
> >> EPOLLRDNORM);
> >> + iio_dma_buffer_queue_wake(queue);
> >> }
> >> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
> >>
> >> @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct
> >> iio_dma_buffer_queue *queue,
> >> }
> >> spin_unlock_irqrestore(&queue->list_lock, flags);
> >>
> >> - wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN |
> >> EPOLLRDNORM);
> >> + iio_dma_buffer_queue_wake(queue);
> >> }
> >> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
> >>
> >> @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct
> >> iio_buffer *buffer)
> >> queue->fileio.blocks[i] = block;
> >> }
> >>
> >> - block->state = IIO_BLOCK_STATE_QUEUED;
> >> + if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> >> + block->state = IIO_BLOCK_STATE_QUEUED;
> >
> > Possibly worth a comment on the state being set here. I figured it
> > out, but might
> > save some brain cells in future if it's stated in the code.
>
> Ok.
>
> >> }
> >>
> >> out_unlock:
> >> @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block
> >> *iio_dma_buffer_dequeue(
> >> return block;
> >> }
> >>
> >> -/**
> >> - * iio_dma_buffer_read() - DMA buffer read callback
> >> - * @buffer: Buffer to read form
> >> - * @n: Number of bytes to read
> >> - * @user_buffer: Userspace buffer to copy the data to
> >> - *
> >> - * Should be used as the read callback for iio_buffer_access_ops
> >> - * struct for DMA buffers.
> >> - */
> >> -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> >> - char __user *user_buffer)
> >> +static int iio_dma_buffer_io(struct iio_buffer *buffer,
> >> + size_t n, char __user *user_buffer, bool is_write)
> >> {
> >> struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> >> struct iio_dma_buffer_block *block;
> >> + void *addr;
> >> int ret;
> >>
> >> if (n < buffer->bytes_per_datum)
> >> @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer
> >> *buffer, size_t n,
> >> n = rounddown(n, buffer->bytes_per_datum);
> >> if (n > block->bytes_used - queue->fileio.pos)
> >> n = block->bytes_used - queue->fileio.pos;
> >> + addr = block->vaddr + queue->fileio.pos;
> >>
> >> - if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos,
> >> n)) {
> >> + if (is_write)
> >> + ret = !!copy_from_user(addr, user_buffer, n);
> >> + else
> >> + ret = !!copy_to_user(user_buffer, addr, n);
> >
> > What is the !! gaining us here? We only care about == 0 vs != 0 so
> > forcing it to be 0 or 1 isn't useful.
>
> Right.
>
> >> + if (ret) {
> >> ret = -EFAULT;
> >> goto out_unlock;
> >> }
> >> @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer
> >> *buffer, size_t n,
> >>
> >> if (queue->fileio.pos == block->bytes_used) {
> >> queue->fileio.active_block = NULL;
> >> + block->bytes_used = block->size;
> >
> > This seems to be a functional change that isn't called out in the
> > patch description.
>
> See the explanation above. Although I most likely don't need to set it
> at two different spots... I'll check that in detail next week.
>
> Cheers,
> -Paul
>
> >> iio_dma_buffer_enqueue(queue, block);
> >> }
> >>
> >> @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer
> >> *buffer, size_t n,
> >>
> >> return ret;
> >> }
> >> +
> >> +/**
> >> + * iio_dma_buffer_read() - DMA buffer read callback
> >> + * @buffer: Buffer to read form
> >> + * @n: Number of bytes to read
> >> + * @user_buffer: Userspace buffer to copy the data to
> >> + *
> >> + * Should be used as the read callback for iio_buffer_access_ops
> >> + * struct for DMA buffers.
> >> + */
> >> +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> >> + char __user *user_buffer)
> >> +{
> >> + return iio_dma_buffer_io(buffer, n, user_buffer, false);
> >> +}
> >> EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
> >>
> >> +/**
> >> + * iio_dma_buffer_write() - DMA buffer write callback
> >> + * @buffer: Buffer to read form
> >> + * @n: Number of bytes to read
> >> + * @user_buffer: Userspace buffer to copy the data from
> >> + *
> >> + * Should be used as the write callback for iio_buffer_access_ops
> >> + * struct for DMA buffers.
> >> + */
> >> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> >> + const char __user *user_buffer)
> >> +{
> >> + return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer,
> >> true);
> >> +}
> >> +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
> >> +
> >> /**
> >> * iio_dma_buffer_data_available() - DMA buffer data_available
> >> callback
> >> * @buf: Buffer to check for data availability
> >> diff --git a/include/linux/iio/buffer-dma.h
> >> b/include/linux/iio/buffer-dma.h
> >> index a65a005c4a19..09c07d5563c0 100644
> >> --- a/include/linux/iio/buffer-dma.h
> >> +++ b/include/linux/iio/buffer-dma.h
> >> @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer
> >> *buffer,
> >> struct iio_dev *indio_dev);
> >> int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> >> char __user *user_buffer);
> >> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> >> + const char __user *user_buffer);
> >> size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
> >> int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer,
> >> size_t bpd);
> >> int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned
> >> int length);
> >> @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct
> >> iio_dma_buffer_queue *queue,
> >> void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
> >> void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
> >>
> >> +static inline size_t iio_dma_buffer_space_available(struct
> >> iio_buffer *buffer)
> >> +{
> >> + return iio_dma_buffer_data_available(buffer);
> >> +}
> >> +
> >> #endif
> >
>
>


2021-11-27 15:18:05

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

On Sun, 21 Nov 2021 17:43:20 +0000
Paul Cercueil <[email protected]> wrote:

> Hi Jonathan,
>
> Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron
> <[email protected]> a écrit :
> > On Mon, 15 Nov 2021 14:19:21 +0000
> > Paul Cercueil <[email protected]> wrote:
> >
> >> We can be certain that the input buffers will only be accessed by
> >> userspace for reading, and output buffers will mostly be accessed by
> >> userspace for writing.
> >
> > Mostly? Perhaps a little more info on why that's not 'only'.
>
> Just like with a framebuffer, it really depends on what the application
> does. Most of the cases it will just read sequentially an input buffer,
> or write sequentially an output buffer. But then you get the exotic
> application that will try to do something like alpha blending, which
> means read+write. Hence "mostly".

Ok. That makes sense though I hope no one actually does it, we can't
prevent them doing so.


>
> >>
> >> Therefore, it makes more sense to use only fully cached input
> >> buffers,
> >> and to use the write-combine cache coherency setting for output
> >> buffers.
> >>
> >> This boosts performance, as the data written to the output buffers
> >> does
> >> not have to be sync'd for coherency. It will halve performance if
> >> the
> >> userspace application tries to read from the output buffer, but this
> >> should never happen.
> >>
> >> Since we don't need to sync the cache when disabling CPU access
> >> either
> >> for input buffers or output buffers, the .end_cpu_access() callback
> >> can
> >> be dropped completely.
> >
> > We have an odd mix of coherent and non coherent DMA in here as you
> > noted,
> > but are you sure this is safe on all platforms?
>
> The mix isn't safe, but using only coherent or only non-coherent should
> be safe, yes.

yup

>
> >
> >>
> >> Signed-off-by: Paul Cercueil <[email protected]>
> >
> > Any numbers to support this patch? The mapping types are performance
> > optimisations so nice to know how much of a difference they make.
>
> Output buffers are definitely faster in write-combine mode. On a
> ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size set
> to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
>
> Input buffers... early results are mixed. On ARM32 it does look like it
> is slightly faster to read from *uncached* memory than reading from
> cached memory. The cache sync does take a long time.
>
> Other architectures might have a different result, for instance on MIPS
> invalidating the cache is a very fast operation, so using cached
> buffers would be a huge win in performance.
>
> Setups where the DMA operations are coherent also wouldn't require any
> cache sync and this patch would give a huge win in performance.
>
> I'll run some more tests next week to have some fresh numbers.

Great.

Thanks,

Jonathan

>
> Cheers,
> -Paul
>
> >> ---
> >> drivers/iio/buffer/industrialio-buffer-dma.c | 82
> >> +++++++++++++-------
> >> 1 file changed, 54 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c
> >> b/drivers/iio/buffer/industrialio-buffer-dma.c
> >> index 92356ee02f30..fb39054d8c15 100644
> >> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> >> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> >> @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct
> >> dma_buf *dbuf,
> >> if (vma->vm_ops->open)
> >> vma->vm_ops->open(vma);
> >>
> >> - return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> >> - virt_to_page(block->vaddr));
> >> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> >> + /*
> >> + * With an input buffer, userspace will only read the data and
> >> + * never write. We can mmap the buffer fully cached.
> >> + */
> >> + return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> >> + virt_to_page(block->vaddr));
> >> + } else {
> >> + /*
> >> + * With an output buffer, userspace will only write the data
> >> + * and should rarely (if never) read from it. It is better to
> >> + * use write-combine in this case.
> >> + */
> >> + return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
> >> + vma->vm_end - vma->vm_start);
> >> + }
> >> +}
> >> +
> >> +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block
> >> *block)
> >> +{
> >> + struct device *dev = block->queue->dev;
> >> + size_t size = PAGE_ALIGN(block->size);
> >> +
> >> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> >> + dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
> >> + else
> >> + dma_free_wc(dev, size, block->vaddr, block->phys_addr);
> >> }
> >>
> >> static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
> >> @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct
> >> dma_buf *dbuf)
> >>
> >> mutex_lock(&queue->lock);
> >>
> >> - dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> >> - block->vaddr, block->phys_addr);
> >> -
> >> + iio_dma_buffer_free_dmamem(block);
> >> kfree(block);
> >>
> >> queue->num_blocks--;
> >> @@ -268,19 +291,6 @@ static int
> >> iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
> >> return 0;
> >> }
> >>
> >> -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
> >> - enum dma_data_direction dma_dir)
> >> -{
> >> - struct iio_dma_buffer_block *block = dbuf->priv;
> >> - struct device *dev = block->queue->dev;
> >> -
> >> - /* We only need to sync the cache for output buffers */
> >> - if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
> >> - dma_sync_single_for_device(dev, block->phys_addr, block->size,
> >> dma_dir);
> >> -
> >> - return 0;
> >> -}
> >> -
> >> static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
> >> .attach = iio_buffer_dma_buf_attach,
> >> .map_dma_buf = iio_buffer_dma_buf_map,
> >> @@ -288,9 +298,28 @@ static const struct dma_buf_ops
> >> iio_dma_buffer_dmabuf_ops = {
> >> .mmap = iio_buffer_dma_buf_mmap,
> >> .release = iio_buffer_dma_buf_release,
> >> .begin_cpu_access = iio_buffer_dma_buf_begin_cpu_access,
> >> - .end_cpu_access = iio_buffer_dma_buf_end_cpu_access,
> >> };
> >>
> >> +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block
> >> *block)
> >> +{
> >> + struct device *dev = block->queue->dev;
> >> + size_t size = PAGE_ALIGN(block->size);
> >> +
> >> + if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> >> + block->vaddr = dma_alloc_coherent(dev, size,
> >> + &block->phys_addr,
> >> + GFP_KERNEL);
> >> + } else {
> >> + block->vaddr = dma_alloc_wc(dev, size,
> >> + &block->phys_addr,
> >> + GFP_KERNEL);
> >> + }
> >> + if (!block->vaddr)
> >> + return -ENOMEM;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> >> struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
> >> {
> >> @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block
> >> *iio_dma_buffer_alloc_block(
> >> if (!block)
> >> return ERR_PTR(-ENOMEM);
> >>
> >> - block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> >> - &block->phys_addr, GFP_KERNEL);
> >> - if (!block->vaddr) {
> >> - err = -ENOMEM;
> >> + block->size = size;
> >> + block->queue = queue;
> >> +
> >> + err = iio_dma_buffer_alloc_dmamem(block);
> >> + if (err)
> >> goto err_free_block;
> >> - }
> >>
> >> einfo.ops = &iio_dma_buffer_dmabuf_ops;
> >> einfo.size = PAGE_ALIGN(size);
> >> @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block
> >> *iio_dma_buffer_alloc_block(
> >> }
> >>
> >> block->dmabuf = dmabuf;
> >> - block->size = size;
> >> block->bytes_used = size;
> >> block->state = IIO_BLOCK_STATE_DONE;
> >> - block->queue = queue;
> >> block->fileio = fileio;
> >> INIT_LIST_HEAD(&block->head);
> >>
> >> @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block
> >> *iio_dma_buffer_alloc_block(
> >> return block;
> >>
> >> err_free_dma:
> >> - dma_free_coherent(queue->dev, PAGE_ALIGN(size),
> >> - block->vaddr, block->phys_addr);
> >> + iio_dma_buffer_free_dmamem(block);
> >> err_free_block:
> >> kfree(block);
> >> return ERR_PTR(err);
> >
>
>


2021-11-27 16:02:40

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

On Thu, 25 Nov 2021 17:29:58 +0000
Paul Cercueil <[email protected]> wrote:

> Hi Jonathan,
>
> Le dim., nov. 21 2021 at 17:43:20 +0000, Paul Cercueil
> <[email protected]> a écrit :
> > Hi Jonathan,
> >
> > Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron
> > <[email protected]> a écrit :
> >> On Mon, 15 Nov 2021 14:19:21 +0000
> >> Paul Cercueil <[email protected]> wrote:
> >>
> >>> We can be certain that the input buffers will only be accessed by
> >>> userspace for reading, and output buffers will mostly be accessed
> >>> by
> >>> userspace for writing.
> >>
> >> Mostly? Perhaps a little more info on why that's not 'only'.
> >
> > Just like with a framebuffer, it really depends on what the
> > application does. Most of the cases it will just read sequentially an
> > input buffer, or write sequentially an output buffer. But then you
> > get the exotic application that will try to do something like alpha
> > blending, which means read+write. Hence "mostly".
> >
> >>>
> >>> Therefore, it makes more sense to use only fully cached input
> >>> buffers,
> >>> and to use the write-combine cache coherency setting for output
> >>> buffers.
> >>>
> >>> This boosts performance, as the data written to the output buffers
> >>> does
> >>> not have to be sync'd for coherency. It will halve performance if
> >>> the
> >>> userspace application tries to read from the output buffer, but
> >>> this
> >>> should never happen.
> >>>
> >>> Since we don't need to sync the cache when disabling CPU access
> >>> either
> >>> for input buffers or output buffers, the .end_cpu_access()
> >>> callback can
> >>> be dropped completely.
> >>
> >> We have an odd mix of coherent and non coherent DMA in here as you
> >> noted,
> >> but are you sure this is safe on all platforms?
> >
> > The mix isn't safe, but using only coherent or only non-coherent
> > should be safe, yes.
> >
> >>
> >>>
> >>> Signed-off-by: Paul Cercueil <[email protected]>
> >>
> >> Any numbers to support this patch? The mapping types are performance
> >> optimisations so nice to know how much of a difference they make.
> >
> > Output buffers are definitely faster in write-combine mode. On a
> > ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size
> > set to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
> >
> > Input buffers... early results are mixed. On ARM32 it does look like
> > it is slightly faster to read from *uncached* memory than reading
> > from cached memory. The cache sync does take a long time.
> >
> > Other architectures might have a different result, for instance on
> > MIPS invalidating the cache is a very fast operation, so using cached
> > buffers would be a huge win in performance.
> >
> > Setups where the DMA operations are coherent also wouldn't require
> > any cache sync and this patch would give a huge win in performance.
> >
> > I'll run some more tests next week to have some fresh numbers.
>
> I think I mixed things up before, because I get different results now.
>
> Here are some fresh benchmarks, triple-checked, using libiio's
> iio_readdev and iio_writedev tools, with 64K samples buffers at 61.44
> MSPS (max. theorical throughput: 234 MiB/s):
> iio_readdev -b 65536 cf-ad9361-lpc voltage0 voltage1 | pv > /dev/null
> pv /dev/zero | iio_writedev -b 65536 cf-ad9361-dds-core-lpc voltage0
> voltage1

There is a bit of a terminology confusion going on here. I think
for the mappings you mean cacheable vs non-cacheable but maybe
I'm misunderstanding. That doesn't necessarily correspond to
coherency. Non cached memory is always coherent because all caches
miss.

Non-cacheable can be related to coherency of course. Also beware that given
hardware might not implement non-cacheable if it knows all possible
accesses are IO-coherent. Affect is the same and if implemented
correctly it will not hurt performance significantly.

firmware should be letting the OS know if the device does coherent
DMA or not... dma-coherent in dt. It might be optional for a given
piece of DMA engine but I've not seen that..

I'm not sure I see how you can do a mixture of cacheable for reads
and write combine (which means uncacheable) for writes...

>
> Coherent mapping:
> - fileio:
> read: 125 MiB/s
> write: 141 MiB/s
> - dmabuf:
> read: 171 MiB/s
> write: 210 MiB/s
>
> Coherent reads + Write-combine writes:
> - fileio:
> read: 125 MiB/s
> write: 141 MiB/s
> - dmabuf:
> read: 171 MiB/s
> write: 210 MiB/s
>
> Non-coherent mapping:
> - fileio:
> read: 119 MiB/s
> write: 124 MiB/s
> - dmabuf:
> read: 159 MiB/s
> write: 124 MiB/s
>
> Non-coherent reads + write-combine writes:
> - fileio:
> read: 119 MiB/s
> write: 140 MiB/s
> - dmabuf:
> read: 159 MiB/s
> write: 210 MiB/s
>


> Non-coherent mapping with no cache sync:
> - fileio:
> read: 156 MiB/s
> write: 123 MiB/s
> - dmabuf:
> read: 234 MiB/s (capped by sample rate)
> write: 182 MiB/s
>
> Non-coherent reads with no cache sync + write-combine writes:
> - fileio:
> read: 156 MiB/s
> write: 140 MiB/s
> - dmabuf:
> read: 234 MiB/s (capped by sample rate)
> write: 210 MiB/s
>
>
> A few things we can deduce from this:
>
> * Write-combine is not available on Zynq/ARM? If it was working, it
> should give a better performance than the coherent mapping, but it
> doesn't seem to do anything at all. At least it doesn't harm
> performance.

I'm not sure it's very relevant to this sort of streaming write.
If you write a sequence of addresses then nothing stops them getting combined
into a single write whether or not it is write-combining.

You may be right that the particular path to memory doesn't support it anyway.
Also some cache architectures will rapidly detect streaming writes and
elect not to cache them whether coherent or not.




>
> * Non-coherent + cache invalidation is definitely a good deal slower
> than using coherent mapping, at least on ARM32. However, when the cache
> sync is disabled (e.g. if the DMA operations are coherent) the reads
> are much faster.

If you are running with cache sync then it better not be cached
as such it's coherent in the sense of there being no entries in the cache
in either direction.

>
> * The new dma-buf based API is a great deal faster than the fileio API.

:)

>
> So in the future we could use coherent reads + write-combine writes,
> unless we know the DMA operations are coherent, and in this case use
> non-coherent reads + write-combine writes.

Not following this argument at all, but anyway we can revisit when it mattrs.

>
> Regarding this patch, unfortunately I cannot prove that write-combine
> is faster, so I'll just drop this patch for now.

Sure, thanks for checking. It's worth noting that WC usage in kernel
is vanishingly rare and I suspect that's mostly because it doesn't
do anything on many implementations.

Jonathan

>
> Cheers,
> -Paul
>
>


2021-11-28 13:28:12

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

On 11/27/21 5:05 PM, Jonathan Cameron wrote:
>> Non-coherent mapping with no cache sync:
>> - fileio:
>> read: 156 MiB/s
>> write: 123 MiB/s
>> - dmabuf:
>> read: 234 MiB/s (capped by sample rate)
>> write: 182 MiB/s
>>
>> Non-coherent reads with no cache sync + write-combine writes:
>> - fileio:
>> read: 156 MiB/s
>> write: 140 MiB/s
>> - dmabuf:
>> read: 234 MiB/s (capped by sample rate)
>> write: 210 MiB/s
>>
>>
>> A few things we can deduce from this:
>>
>> * Write-combine is not available on Zynq/ARM? If it was working, it
>> should give a better performance than the coherent mapping, but it
>> doesn't seem to do anything at all. At least it doesn't harm
>> performance.
> I'm not sure it's very relevant to this sort of streaming write.
> If you write a sequence of addresses then nothing stops them getting combined
> into a single write whether or not it is write-combining.

There is a difference at which point they can get combined. With
write-combine they can be coalesced into a single transaction anywhere
in the interconnect, as early as the CPU itself. Without write-cobmine
the DDR controller might decide to combine them, but not earlier. This
can make a difference especially if the write is a narrow write, i.e.
the access size is smaller than the buswidth.

Lets say you do 32-bit writes, but your bus is 64 bits wide. With WC two
32-bits can be combined into a 64-bit write. Without WC that is not
possible and you are potentially not using the bus to its fullest
capacity. This is especially true if the memory bus is wider than the
widest access size of the CPU.