2023-12-19 17:53:28

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 0/8] iio: new DMABUF based API, v5

[V4 was: "iio: Add buffer write() support"][1]

Hi Jonathan,

This is a respin of the V3 of my patchset that introduced a new
interface based on DMABUF objects [2].

The V4 was a split of the patchset, to attempt to upstream buffer
write() support first. But since there is no current user upstream, it
was not merged. This V5 is about doing the opposite, and contains the
new DMABUF interface, without adding the buffer write() support. It can
already be used with the upstream adi-axi-adc driver.

In user-space, Libiio uses it to transfer back and forth blocks of
samples between the hardware and the applications, without having to
copy the data.

On a ZCU102 with a FMComms3 daughter board, running Libiio from the
pcercuei/dev-new-dmabuf-api branch [3], compiled with
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
Throughput: 116 MiB/s

Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
Throughput: 475 MiB/s

This benchmark only measures the speed at which the data can be fetched
to iio_rwdev's internal buffers, and does not actually try to read the
data (e.g. to pipe it to stdout). It shows that fetching the data is
more than 4x faster using the new interface.

When actually reading the data, the performance difference isn't that
impressive (maybe because in case of DMABUF the data is not in cache):

WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress
2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s

WITH_LOCAL_DMABUF_API=ON:
sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress
2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s

One interesting thing to note is that fileio is (currently) actually
faster than the DMABUF interface if you increase a lot the buffer size.
My explanation is that the cache invalidation routine takes more and
more time the bigger the DMABUF gets. This is because the DMABUF is
backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up
to 16 thousands pages, that have to be invalidated one by one. This can
be addressed by using huge pages, but the udmabuf driver does not (yet)
support creating DMABUFs backed by huge pages.

Anyway, the real benefits happen when the DMABUFs are either shared
between IIO devices, or between the IIO subsystem and another
filesystem. In that case, the DMABUFs are simply passed around drivers,
without the data being copied at any moment.

We use that feature to transfer samples from our transceivers to USB,
using a DMABUF interface to FunctionFS [4].

This drastically increases the throughput, to about 274 MiB/s over a
USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the
FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).

Based on linux-next/next-20231219.

Cheers,
-Paul

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
[4] https://lore.kernel.org/all/[email protected]/

---
Changelog:
- [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct.
Note that at some point we will need to support cyclic transfers
using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
parameter to the function?

- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
.device_prep_slave_dma_array().

@Vinod: this patch will cause a small conflict with my other
patchset adding scatter-gather support to the axi-dmac driver.
This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
prototype of this function changed in my other patchset - it would
have to be passed the "chan" variable. I don't know how you prefer it
to be resolved. Worst case scenario (and if @Jonathan is okay with
that) this one patch can be re-sent later, but it would make this
patchset less "atomic".

- [5/8]:
- Use dev_err() instead of pr_err()
- Inline to_iio_dma_fence()
- Add comment to explain why we unref twice when detaching dmabuf
- Remove TODO comment. It is actually safe to free the file's
private data even when transfers are still pending because it
won't be accessed.
- Fix documentation of new fields in struct iio_buffer_access_funcs
- iio_dma_resv_lock() does not need to be exported, make it static

- [7/8]:
- Use the new dmaengine_prep_slave_dma_vec().
- Restrict to input buffers, since output buffers are not yet
supported by IIO buffers.

- [8/8]:
Use description lists for the documentation of the three new IOCTLs
instead of abusing subsections.

---
Alexandru Ardelean (1):
iio: buffer-dma: split iio_dma_buffer_fileio_free() function

Paul Cercueil (7):
iio: buffer-dma: Get rid of outgoing queue
dmaengine: Add API function dmaengine_prep_slave_dma_vec()
dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
iio: core: Add new DMABUF interface infrastructure
iio: buffer-dma: Enable support for DMABUFs
iio: buffer-dmaengine: Support new DMABUF based userspace API
Documentation: iio: Document high-speed DMABUF based API

Documentation/iio/dmabuf_api.rst | 54 +++
Documentation/iio/index.rst | 2 +
drivers/dma/dma-axi-dmac.c | 40 ++
drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++---
.../buffer/industrialio-buffer-dmaengine.c | 52 ++-
drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++
include/linux/dmaengine.h | 25 ++
include/linux/iio/buffer-dma.h | 33 +-
include/linux/iio/buffer_impl.h | 26 ++
include/uapi/linux/iio/buffer.h | 22 +
10 files changed, 836 insertions(+), 62 deletions(-)
create mode 100644 Documentation/iio/dmabuf_api.rst

--
2.43.0



2023-12-19 17:55:14

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 1/8] iio: buffer-dma: Get rid of outgoing queue

The buffer-dma code was using two queues, incoming and outgoing, to
manage the state of the blocks in use.

While this totally works, it adds some complexity to the code,
especially since the code only manages 2 blocks. It is much easier to
just check each block's state manually, and keep a counter for the next
block to dequeue.

Since the new DMABUF based API wouldn't use the outgoing queue anyway,
getting rid of it now makes the upcoming changes simpler.

With this change, the IIO_BLOCK_STATE_DEQUEUED is now useless, and can
be removed.

Signed-off-by: Paul Cercueil <[email protected]>

---
v2: - Only remove the outgoing queue, and keep the incoming queue, as we
want the buffer to start streaming data as soon as it is enabled.
- Remove IIO_BLOCK_STATE_DEQUEUED, since it is now functionally the
same as IIO_BLOCK_STATE_DONE.
---
drivers/iio/buffer/industrialio-buffer-dma.c | 44 ++++++++++----------
include/linux/iio/buffer-dma.h | 7 ++--
2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index d348af8b9705..1fc91467d1aa 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -179,7 +179,7 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
}

block->size = size;
- block->state = IIO_BLOCK_STATE_DEQUEUED;
+ block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
INIT_LIST_HEAD(&block->head);
kref_init(&block->kref);
@@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(

static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
{
- struct iio_dma_buffer_queue *queue = block->queue;
-
- /*
- * The buffer has already been freed by the application, just drop the
- * reference.
- */
- if (block->state != IIO_BLOCK_STATE_DEAD) {
+ if (block->state != IIO_BLOCK_STATE_DEAD)
block->state = IIO_BLOCK_STATE_DONE;
- list_add_tail(&block->head, &queue->outgoing);
- }
}

/**
@@ -261,7 +253,6 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block)
* not support abort and has not given back the block yet.
*/
switch (block->state) {
- case IIO_BLOCK_STATE_DEQUEUED:
case IIO_BLOCK_STATE_QUEUED:
case IIO_BLOCK_STATE_DONE:
return true;
@@ -317,7 +308,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
* dead. This means we can reset the lists without having to fear
* corrution.
*/
- INIT_LIST_HEAD(&queue->outgoing);
spin_unlock_irq(&queue->list_lock);

INIT_LIST_HEAD(&queue->incoming);
@@ -456,14 +446,20 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
struct iio_dma_buffer_queue *queue)
{
struct iio_dma_buffer_block *block;
+ unsigned int idx;

spin_lock_irq(&queue->list_lock);
- block = list_first_entry_or_null(&queue->outgoing, struct
- iio_dma_buffer_block, head);
- if (block != NULL) {
- list_del(&block->head);
- block->state = IIO_BLOCK_STATE_DEQUEUED;
+
+ idx = queue->fileio.next_dequeue;
+ block = queue->fileio.blocks[idx];
+
+ if (block->state == IIO_BLOCK_STATE_DONE) {
+ idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
+ queue->fileio.next_dequeue = idx;
+ } else {
+ block = NULL;
}
+
spin_unlock_irq(&queue->list_lock);

return block;
@@ -539,6 +535,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
struct iio_dma_buffer_block *block;
size_t data_available = 0;
+ unsigned int i;

/*
* For counting the available bytes we'll use the size of the block not
@@ -552,8 +549,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
data_available += queue->fileio.active_block->size;

spin_lock_irq(&queue->list_lock);
- list_for_each_entry(block, &queue->outgoing, head)
- data_available += block->size;
+
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ block = queue->fileio.blocks[i];
+
+ if (block != queue->fileio.active_block
+ && block->state == IIO_BLOCK_STATE_DONE)
+ data_available += block->size;
+ }
+
spin_unlock_irq(&queue->list_lock);
mutex_unlock(&queue->lock);

@@ -617,7 +621,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
queue->ops = ops;

INIT_LIST_HEAD(&queue->incoming);
- INIT_LIST_HEAD(&queue->outgoing);

mutex_init(&queue->lock);
spin_lock_init(&queue->list_lock);
@@ -645,7 +648,6 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
continue;
queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
}
- INIT_LIST_HEAD(&queue->outgoing);
spin_unlock_irq(&queue->list_lock);

INIT_LIST_HEAD(&queue->incoming);
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 6564bdcdac66..18d3702fa95d 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -19,14 +19,12 @@ struct device;

/**
* enum iio_block_state - State of a struct iio_dma_buffer_block
- * @IIO_BLOCK_STATE_DEQUEUED: Block is not queued
* @IIO_BLOCK_STATE_QUEUED: Block is on the incoming queue
* @IIO_BLOCK_STATE_ACTIVE: Block is currently being processed by the DMA
* @IIO_BLOCK_STATE_DONE: Block is on the outgoing queue
* @IIO_BLOCK_STATE_DEAD: Block has been marked as to be freed
*/
enum iio_block_state {
- IIO_BLOCK_STATE_DEQUEUED,
IIO_BLOCK_STATE_QUEUED,
IIO_BLOCK_STATE_ACTIVE,
IIO_BLOCK_STATE_DONE,
@@ -73,12 +71,15 @@ struct iio_dma_buffer_block {
* @active_block: Block being used in read()
* @pos: Read offset in the active block
* @block_size: Size of each block
+ * @next_dequeue: index of next block that will be dequeued
*/
struct iio_dma_buffer_queue_fileio {
struct iio_dma_buffer_block *blocks[2];
struct iio_dma_buffer_block *active_block;
size_t pos;
size_t block_size;
+
+ unsigned int next_dequeue;
};

/**
@@ -93,7 +94,6 @@ struct iio_dma_buffer_queue_fileio {
* list and typically also a list of active blocks in the part that handles
* the DMA controller
* @incoming: List of buffers on the incoming queue
- * @outgoing: List of buffers on the outgoing queue
* @active: Whether the buffer is currently active
* @fileio: FileIO state
*/
@@ -105,7 +105,6 @@ struct iio_dma_buffer_queue {
struct mutex lock;
spinlock_t list_lock;
struct list_head incoming;
- struct list_head outgoing;

bool active;

--
2.43.0


2023-12-19 17:55:35

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 2/8] iio: buffer-dma: split iio_dma_buffer_fileio_free() function

From: Alexandru Ardelean <[email protected]>

This change splits the logic into a separate function, which will be
re-used later.

Signed-off-by: Alexandru Ardelean <[email protected]>
Cc: Alexandru Ardelean <[email protected]>
Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/iio/buffer/industrialio-buffer-dma.c | 43 +++++++++++---------
1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index 1fc91467d1aa..5610ba67925e 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -346,6 +346,29 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);

+static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
+{
+ unsigned int i;
+
+ spin_lock_irq(&queue->list_lock);
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ if (!queue->fileio.blocks[i])
+ continue;
+ queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
+ }
+ spin_unlock_irq(&queue->list_lock);
+
+ INIT_LIST_HEAD(&queue->incoming);
+
+ for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+ if (!queue->fileio.blocks[i])
+ continue;
+ iio_buffer_block_put(queue->fileio.blocks[i]);
+ queue->fileio.blocks[i] = NULL;
+ }
+ queue->fileio.active_block = NULL;
+}
+
static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
struct iio_dma_buffer_block *block)
{
@@ -638,27 +661,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
*/
void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
{
- unsigned int i;
-
mutex_lock(&queue->lock);

- spin_lock_irq(&queue->list_lock);
- for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
- if (!queue->fileio.blocks[i])
- continue;
- queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
- }
- spin_unlock_irq(&queue->list_lock);
-
- INIT_LIST_HEAD(&queue->incoming);
-
- for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
- if (!queue->fileio.blocks[i])
- continue;
- iio_buffer_block_put(queue->fileio.blocks[i]);
- queue->fileio.blocks[i] = NULL;
- }
- queue->fileio.active_block = NULL;
+ iio_dma_buffer_fileio_free(queue);
queue->ops = NULL;

mutex_unlock(&queue->lock);
--
2.43.0


2023-12-19 17:55:56

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 3/8] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

This function can be used to initiate a scatter-gather DMA transfer,
where the address and size of each segment is located in one entry of
the dma_vec array.

The major difference with dmaengine_prep_slave_sg() is that it supports
specifying the lengths of each DMA transfer; as trying to override the
length of the transfer with dmaengine_prep_slave_sg() is a very tedious
process. The introduction of a new API function is also justified by the
fact that scatterlists are on their way out.

Note that dmaengine_prep_interleaved_dma() is not helpful either in that
case, as it assumes that the address of each segment will be higher than
the one of the previous segment, which we just cannot guarantee in case
of a scatter-gather transfer.

Signed-off-by: Paul Cercueil <[email protected]>

---
v3: New patch

v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct
'dma_vec'.
Note that at some point we will need to support cyclic transfers
using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
parameter to the function?
---
include/linux/dmaengine.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 3df70d6131c8..ee5931ddb42f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -160,6 +160,16 @@ struct dma_interleaved_template {
struct data_chunk sgl[];
};

+/**
+ * struct dma_vec - DMA vector
+ * @addr: Bus address of the start of the vector
+ * @len: Length in bytes of the DMA vector
+ */
+struct dma_vec {
+ dma_addr_t addr;
+ size_t len;
+};
+
/**
* enum dma_ctrl_flags - DMA flags to augment operation preparation,
* control completion, and communicate status.
@@ -910,6 +920,10 @@ struct dma_device {
struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)(
struct dma_chan *chan, unsigned long flags);

+ struct dma_async_tx_descriptor *(*device_prep_slave_dma_vec)(
+ struct dma_chan *chan, const struct dma_vec *vecs,
+ size_t nents, enum dma_transfer_direction direction,
+ unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_transfer_direction direction,
@@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single(
dir, flags, NULL);
}

+static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
+ struct dma_chan *chan, const struct dma_vec *vecs, size_t nents,
+ enum dma_transfer_direction dir, unsigned long flags)
+{
+ if (!chan || !chan->device || !chan->device->device_prep_slave_dma_vec)
+ return NULL;
+
+ return chan->device->device_prep_slave_dma_vec(chan, vecs, nents,
+ dir, flags);
+}
+
static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len,
enum dma_transfer_direction dir, unsigned long flags)
--
2.43.0


2023-12-19 17:56:35

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Add the necessary infrastructure to the IIO core to support a new
optional DMABUF based interface.

With this new interface, DMABUF objects (externally created) can be
attached to a IIO buffer, and subsequently used for data transfer.

A userspace application can then use this interface to share DMABUF
objects between several interfaces, allowing it to transfer data in a
zero-copy fashion, for instance between IIO and the USB stack.

The userspace application can also memory-map the DMABUF objects, and
access the sample data directly. The advantage of doing this vs. the
read() interface is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

As part of the interface, 3 new IOCTLs have been added:

IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
Attach the DMABUF object identified by the given file descriptor to the
buffer.

IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
Detach the DMABUF object identified by the given file descriptor from
the buffer. Note that closing the IIO buffer's file descriptor will
automatically detach all previously attached DMABUF objects.

IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
Request a data transfer to/from the given DMABUF object. Its file
descriptor, as well as the transfer size and flags are provided in the
"iio_dmabuf" structure.

These three IOCTLs have to be performed on the IIO buffer's file
descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.

Signed-off-by: Paul Cercueil <[email protected]>

---
v2: Only allow the new IOCTLs on the buffer FD created with
IIO_BUFFER_GET_FD_IOCTL().

v3: - Get rid of the old IOCTLs. The IIO subsystem does not create or
manage DMABUFs anymore, and only attaches/detaches externally
created DMABUFs.
- Add IIO_BUFFER_DMABUF_CYCLIC to the supported flags.

v5: - Use dev_err() instead of pr_err()
- Inline to_iio_dma_fence()
- Add comment to explain why we unref twice when detaching dmabuf
- Remove TODO comment. It is actually safe to free the file's
private data even when transfers are still pending because it
won't be accessed.
- Fix documentation of new fields in struct iio_buffer_access_funcs
- iio_dma_resv_lock() does not need to be exported, make it static
---
drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++++++++++++++
include/linux/iio/buffer_impl.h | 26 ++
include/uapi/linux/iio/buffer.h | 22 ++
3 files changed, 450 insertions(+)

diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index 09c41e9ccf87..24c040e073a7 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -13,10 +13,14 @@
#include <linux/kernel.h>
#include <linux/export.h>
#include <linux/device.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/slab.h>
+#include <linux/mm.h>
#include <linux/poll.h>
#include <linux/sched/signal.h>

@@ -28,6 +32,31 @@
#include <linux/iio/buffer.h>
#include <linux/iio/buffer_impl.h>

+#define DMABUF_ENQUEUE_TIMEOUT_MS 5000
+
+struct iio_dma_fence;
+
+struct iio_dmabuf_priv {
+ struct list_head entry;
+ struct kref ref;
+
+ struct iio_buffer *buffer;
+ struct iio_dma_buffer_block *block;
+
+ u64 context;
+ spinlock_t lock;
+
+ struct dma_buf_attachment *attach;
+ struct iio_dma_fence *fence;
+};
+
+struct iio_dma_fence {
+ struct dma_fence base;
+ struct iio_dmabuf_priv *priv;
+ struct sg_table *sgt;
+ enum dma_data_direction dir;
+};
+
static const char * const iio_endian_prefix[] = {
[IIO_BE] = "be",
[IIO_LE] = "le",
@@ -332,6 +361,7 @@ void iio_buffer_init(struct iio_buffer *buffer)
{
INIT_LIST_HEAD(&buffer->demux_list);
INIT_LIST_HEAD(&buffer->buffer_list);
+ INIT_LIST_HEAD(&buffer->dmabufs);
init_waitqueue_head(&buffer->pollq);
kref_init(&buffer->ref);
if (!buffer->watermark)
@@ -1519,14 +1549,54 @@ static void iio_buffer_unregister_legacy_sysfs_groups(struct iio_dev *indio_dev)
kfree(iio_dev_opaque->legacy_scan_el_group.attrs);
}

+static void iio_buffer_dmabuf_release(struct kref *ref)
+{
+ struct iio_dmabuf_priv *priv = container_of(ref, struct iio_dmabuf_priv, ref);
+ struct dma_buf_attachment *attach = priv->attach;
+ struct iio_buffer *buffer = priv->buffer;
+ struct dma_buf *dmabuf = attach->dmabuf;
+
+ buffer->access->detach_dmabuf(buffer, priv->block);
+
+ dma_buf_detach(attach->dmabuf, attach);
+ dma_buf_put(dmabuf);
+ kfree(priv);
+}
+
+void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach)
+{
+ struct iio_dmabuf_priv *priv = attach->importer_priv;
+
+ kref_get(&priv->ref);
+}
+EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_get);
+
+void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach)
+{
+ struct iio_dmabuf_priv *priv = attach->importer_priv;
+
+ kref_put(&priv->ref, iio_buffer_dmabuf_release);
+}
+EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_put);
+
static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep)
{
struct iio_dev_buffer_pair *ib = filep->private_data;
struct iio_dev *indio_dev = ib->indio_dev;
struct iio_buffer *buffer = ib->buffer;
+ struct iio_dmabuf_priv *priv, *tmp;

wake_up(&buffer->pollq);

+ /* Close all attached DMABUFs */
+ list_for_each_entry_safe(priv, tmp, &buffer->dmabufs, entry) {
+ list_del_init(&priv->entry);
+ iio_buffer_dmabuf_put(priv->attach);
+ }
+
+ if (!list_empty(&buffer->dmabufs))
+ dev_warn(&indio_dev->dev, "Buffer FD closed with active transfers\n");
+
kfree(ib);
clear_bit(IIO_BUSY_BIT_POS, &buffer->flags);
iio_device_put(indio_dev);
@@ -1534,11 +1604,343 @@ static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep)
return 0;
}

+static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock)
+{
+ int ret;
+
+ ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
+ if (ret) {
+ if (ret != -EDEADLK)
+ goto out;
+ if (nonblock) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ ret = dma_resv_lock_slow_interruptible(dmabuf->resv, NULL);
+ }
+
+out:
+ return ret;
+}
+
+static struct dma_buf_attachment *
+iio_buffer_find_attachment(struct iio_dev *indio_dev, struct dma_buf *dmabuf)
+{
+ struct dma_buf_attachment *elm, *attach = NULL;
+ int ret;
+
+ ret = iio_dma_resv_lock(dmabuf, false);
+ if (ret)
+ return ERR_PTR(ret);
+
+ list_for_each_entry(elm, &dmabuf->attachments, node) {
+ if (elm->dev == indio_dev->dev.parent) {
+ attach = elm;
+ break;
+ }
+ }
+
+ if (attach)
+ iio_buffer_dmabuf_get(elm);
+
+ dma_resv_unlock(dmabuf->resv);
+
+ return attach ?: ERR_PTR(-EPERM);
+}
+
+static int iio_buffer_attach_dmabuf(struct iio_dev_buffer_pair *ib,
+ int __user *user_fd)
+{
+ struct iio_dev *indio_dev = ib->indio_dev;
+ struct iio_buffer *buffer = ib->buffer;
+ struct dma_buf_attachment *attach;
+ struct iio_dmabuf_priv *priv;
+ struct dma_buf *dmabuf;
+ int err, fd;
+
+ if (!buffer->access->attach_dmabuf
+ || !buffer->access->detach_dmabuf
+ || !buffer->access->enqueue_dmabuf)
+ return -EPERM;
+
+ if (copy_from_user(&fd, user_fd, sizeof(fd)))
+ return -EFAULT;
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ spin_lock_init(&priv->lock);
+ priv->context = dma_fence_context_alloc(1);
+
+ dmabuf = dma_buf_get(fd);
+ if (IS_ERR(dmabuf)) {
+ err = PTR_ERR(dmabuf);
+ goto err_free_priv;
+ }
+
+ attach = dma_buf_attach(dmabuf, indio_dev->dev.parent);
+ if (IS_ERR(attach)) {
+ err = PTR_ERR(attach);
+ goto err_dmabuf_put;
+ }
+
+ kref_init(&priv->ref);
+ priv->buffer = buffer;
+ priv->attach = attach;
+ attach->importer_priv = priv;
+
+ priv->block = buffer->access->attach_dmabuf(buffer, attach);
+ if (IS_ERR(priv->block)) {
+ err = PTR_ERR(priv->block);
+ goto err_dmabuf_detach;
+ }
+
+ list_add(&priv->entry, &buffer->dmabufs);
+
+ return 0;
+
+err_dmabuf_detach:
+ dma_buf_detach(dmabuf, attach);
+err_dmabuf_put:
+ dma_buf_put(dmabuf);
+err_free_priv:
+ kfree(priv);
+
+ return err;
+}
+
+static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req)
+{
+ struct dma_buf_attachment *attach;
+ struct iio_dmabuf_priv *priv;
+ struct dma_buf *dmabuf;
+ int dmabuf_fd, ret = 0;
+
+ if (copy_from_user(&dmabuf_fd, user_req, sizeof(dmabuf_fd)))
+ return -EFAULT;
+
+ dmabuf = dma_buf_get(dmabuf_fd);
+ if (IS_ERR(dmabuf))
+ return PTR_ERR(dmabuf);
+
+ attach = iio_buffer_find_attachment(ib->indio_dev, dmabuf);
+ if (IS_ERR(attach)) {
+ ret = PTR_ERR(attach);
+ goto out_dmabuf_put;
+ }
+
+ priv = attach->importer_priv;
+ list_del_init(&priv->entry);
+
+ /*
+ * Unref twice to release the reference obtained with
+ * iio_buffer_find_attachment() above, and the one obtained in
+ * iio_buffer_attach_dmabuf().
+ */
+ iio_buffer_dmabuf_put(attach);
+ iio_buffer_dmabuf_put(attach);
+
+out_dmabuf_put:
+ dma_buf_put(dmabuf);
+
+ return ret;
+}
+
+static const char *
+iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
+{
+ return "iio";
+}
+
+static void iio_buffer_dma_fence_release(struct dma_fence *fence)
+{
+ struct iio_dma_fence *iio_fence =
+ container_of(fence, struct iio_dma_fence, base);
+
+ kfree(iio_fence);
+}
+
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
+ .get_driver_name = iio_buffer_dma_fence_get_driver_name,
+ .get_timeline_name = iio_buffer_dma_fence_get_driver_name,
+ .release = iio_buffer_dma_fence_release,
+};
+
+static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
+ struct iio_dmabuf __user *iio_dmabuf_req,
+ bool nonblock)
+{
+ struct iio_dev *indio_dev = ib->indio_dev;
+ struct iio_buffer *buffer = ib->buffer;
+ struct iio_dmabuf iio_dmabuf;
+ struct dma_buf_attachment *attach;
+ struct iio_dmabuf_priv *priv;
+ enum dma_data_direction dir;
+ struct iio_dma_fence *fence;
+ struct dma_buf *dmabuf;
+ struct sg_table *sgt;
+ unsigned long timeout;
+ bool dma_to_ram;
+ bool cyclic;
+ int ret;
+
+ if (copy_from_user(&iio_dmabuf, iio_dmabuf_req, sizeof(iio_dmabuf)))
+ return -EFAULT;
+
+ if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
+ return -EINVAL;
+
+ cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
+
+ /* Cyclic flag is only supported on output buffers */
+ if (cyclic && buffer->direction != IIO_BUFFER_DIRECTION_OUT)
+ return -EINVAL;
+
+ dmabuf = dma_buf_get(iio_dmabuf.fd);
+ if (IS_ERR(dmabuf))
+ return PTR_ERR(dmabuf);
+
+ if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used > dmabuf->size) {
+ ret = -EINVAL;
+ goto err_dmabuf_put;
+ }
+
+ attach = iio_buffer_find_attachment(indio_dev, dmabuf);
+ if (IS_ERR(attach)) {
+ ret = PTR_ERR(attach);
+ goto err_dmabuf_put;
+ }
+
+ priv = attach->importer_priv;
+
+ dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
+ dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+ sgt = dma_buf_map_attachment(attach, dir);
+ if (IS_ERR(sgt)) {
+ ret = PTR_ERR(sgt);
+ dev_err(&indio_dev->dev, "Unable to map attachment: %d\n", ret);
+ goto err_attachment_put;
+ }
+
+ fence = kmalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence) {
+ ret = -ENOMEM;
+ goto err_unmap_attachment;
+ }
+
+ fence->priv = priv;
+ fence->sgt = sgt;
+ fence->dir = dir;
+ priv->fence = fence;
+
+ dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
+ &priv->lock, priv->context, 0);
+
+ ret = iio_dma_resv_lock(dmabuf, nonblock);
+ if (ret)
+ goto err_fence_put;
+
+ timeout = nonblock ? 0 : msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
+
+ /* Make sure we don't have writers */
+ ret = (int) dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_WRITE,
+ true, timeout);
+ if (ret == 0)
+ ret = -EBUSY;
+ if (ret < 0)
+ goto err_resv_unlock;
+
+ if (dma_to_ram) {
+ /*
+ * If we're writing to the DMABUF, make sure we don't have
+ * readers
+ */
+ ret = (int) dma_resv_wait_timeout(dmabuf->resv,
+ DMA_RESV_USAGE_READ, true,
+ timeout);
+ if (ret == 0)
+ ret = -EBUSY;
+ if (ret < 0)
+ goto err_resv_unlock;
+ }
+
+ ret = dma_resv_reserve_fences(dmabuf->resv, 1);
+ if (ret)
+ goto err_resv_unlock;
+
+ dma_resv_add_fence(dmabuf->resv, &fence->base,
+ dma_resv_usage_rw(dma_to_ram));
+ dma_resv_unlock(dmabuf->resv);
+
+ ret = buffer->access->enqueue_dmabuf(buffer, priv->block, sgt,
+ iio_dmabuf.bytes_used, cyclic);
+ if (ret)
+ iio_buffer_signal_dmabuf_done(attach, ret);
+
+ dma_buf_put(dmabuf);
+
+ return ret;
+
+err_resv_unlock:
+ dma_resv_unlock(dmabuf->resv);
+err_fence_put:
+ dma_fence_put(&fence->base);
+err_unmap_attachment:
+ dma_buf_unmap_attachment(attach, sgt, dir);
+err_attachment_put:
+ iio_buffer_dmabuf_put(attach);
+err_dmabuf_put:
+ dma_buf_put(dmabuf);
+
+ return ret;
+}
+
+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret)
+{
+ struct iio_dmabuf_priv *priv = attach->importer_priv;
+ struct iio_dma_fence *fence = priv->fence;
+ enum dma_data_direction dir = fence->dir;
+ struct sg_table *sgt = fence->sgt;
+
+ dma_fence_get(&fence->base);
+ fence->base.error = ret;
+ dma_fence_signal(&fence->base);
+ dma_fence_put(&fence->base);
+
+ dma_buf_unmap_attachment(attach, sgt, dir);
+ iio_buffer_dmabuf_put(attach);
+}
+EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
+
+static long iio_buffer_chrdev_ioctl(struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+ struct iio_dev_buffer_pair *ib = filp->private_data;
+ void __user *_arg = (void __user *)arg;
+
+ switch (cmd) {
+ case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
+ return iio_buffer_attach_dmabuf(ib, _arg);
+ case IIO_BUFFER_DMABUF_DETACH_IOCTL:
+ return iio_buffer_detach_dmabuf(ib, _arg);
+ case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
+ return iio_buffer_enqueue_dmabuf(ib, _arg,
+ filp->f_flags & O_NONBLOCK);
+ default:
+ return IIO_IOCTL_UNHANDLED;
+ }
+}
+
static const struct file_operations iio_buffer_chrdev_fileops = {
.owner = THIS_MODULE,
.llseek = noop_llseek,
.read = iio_buffer_read,
.write = iio_buffer_write,
+ .unlocked_ioctl = iio_buffer_chrdev_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
.poll = iio_buffer_poll,
.release = iio_buffer_chrdev_release,
};
diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h
index 89c3fd7c29ca..55d93705c96b 100644
--- a/include/linux/iio/buffer_impl.h
+++ b/include/linux/iio/buffer_impl.h
@@ -9,8 +9,11 @@
#include <uapi/linux/iio/buffer.h>
#include <linux/iio/buffer.h>

+struct dma_buf_attachment;
struct iio_dev;
+struct iio_dma_buffer_block;
struct iio_buffer;
+struct sg_table;

/**
* INDIO_BUFFER_FLAG_FIXED_WATERMARK - Watermark level of the buffer can not be
@@ -39,6 +42,13 @@ struct iio_buffer;
* device stops sampling. Calles are balanced with @enable.
* @release: called when the last reference to the buffer is dropped,
* should free all resources allocated by the buffer.
+ * @attach_dmabuf: called from userspace via ioctl to attach one external
+ * DMABUF.
+ * @detach_dmabuf: called from userspace via ioctl to detach one previously
+ * attached DMABUF.
+ * @enqueue_dmabuf: called from userspace via ioctl to queue this DMABUF
+ * object to this buffer. Requires a valid DMABUF fd, that
+ * was previouly attached to this buffer.
* @modes: Supported operating modes by this buffer type
* @flags: A bitmask combination of INDIO_BUFFER_FLAG_*
*
@@ -68,6 +78,14 @@ struct iio_buffer_access_funcs {

void (*release)(struct iio_buffer *buffer);

+ struct iio_dma_buffer_block * (*attach_dmabuf)(struct iio_buffer *buffer,
+ struct dma_buf_attachment *attach);
+ void (*detach_dmabuf)(struct iio_buffer *buffer,
+ struct iio_dma_buffer_block *block);
+ int (*enqueue_dmabuf)(struct iio_buffer *buffer,
+ struct iio_dma_buffer_block *block,
+ struct sg_table *sgt, size_t size, bool cyclic);
+
unsigned int modes;
unsigned int flags;
};
@@ -136,6 +154,9 @@ struct iio_buffer {

/* @ref: Reference count of the buffer. */
struct kref ref;
+
+ /* @dmabufs: List of DMABUF attachments */
+ struct list_head dmabufs;
};

/**
@@ -156,9 +177,14 @@ int iio_update_buffers(struct iio_dev *indio_dev,
**/
void iio_buffer_init(struct iio_buffer *buffer);

+void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach);
+void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach);
+
struct iio_buffer *iio_buffer_get(struct iio_buffer *buffer);
void iio_buffer_put(struct iio_buffer *buffer);

+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret);
+
#else /* CONFIG_IIO_BUFFER */

static inline void iio_buffer_get(struct iio_buffer *buffer) {}
diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
index 13939032b3f6..c666aa95e532 100644
--- a/include/uapi/linux/iio/buffer.h
+++ b/include/uapi/linux/iio/buffer.h
@@ -5,6 +5,28 @@
#ifndef _UAPI_IIO_BUFFER_H_
#define _UAPI_IIO_BUFFER_H_

+#include <linux/types.h>
+
+/* Flags for iio_dmabuf.flags */
+#define IIO_BUFFER_DMABUF_CYCLIC (1 << 0)
+#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS 0x00000001
+
+/**
+ * struct iio_dmabuf - Descriptor for a single IIO DMABUF object
+ * @fd: file descriptor of the DMABUF object
+ * @flags: one or more IIO_BUFFER_DMABUF_* flags
+ * @bytes_used: number of bytes used in this DMABUF for the data transfer.
+ * Should generally be set to the DMABUF's size.
+ */
+struct iio_dmabuf {
+ __u32 fd;
+ __u32 flags;
+ __u64 bytes_used;
+};
+
#define IIO_BUFFER_GET_FD_IOCTL _IOWR('i', 0x91, int)
+#define IIO_BUFFER_DMABUF_ATTACH_IOCTL _IOW('i', 0x92, int)
+#define IIO_BUFFER_DMABUF_DETACH_IOCTL _IOW('i', 0x93, int)
+#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL _IOW('i', 0x94, struct iio_dmabuf)

#endif /* _UAPI_IIO_BUFFER_H_ */
--
2.43.0


2023-12-19 17:56:56

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 6/8] iio: buffer-dma: Enable support for DMABUFs

Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf()
and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO
DMA buffer implementations.

Signed-off-by: Paul Cercueil <[email protected]>

---
v3: Update code to provide the functions that will be used as callbacks
for the new IOCTLs.
---
drivers/iio/buffer/industrialio-buffer-dma.c | 157 +++++++++++++++++--
include/linux/iio/buffer-dma.h | 26 +++
2 files changed, 170 insertions(+), 13 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index 5610ba67925e..adb64f975064 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -14,6 +14,7 @@
#include <linux/poll.h>
#include <linux/iio/buffer_impl.h>
#include <linux/iio/buffer-dma.h>
+#include <linux/dma-buf.h>
#include <linux/dma-mapping.h>
#include <linux/sizes.h>

@@ -94,14 +95,24 @@ static void iio_buffer_block_release(struct kref *kref)
{
struct iio_dma_buffer_block *block = container_of(kref,
struct iio_dma_buffer_block, kref);
+ struct iio_dma_buffer_queue *queue = block->queue;

- WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
+ WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);

- dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
- block->vaddr, block->phys_addr);
+ mutex_lock(&queue->lock);

- iio_buffer_put(&block->queue->buffer);
+ if (block->fileio) {
+ dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
+ block->vaddr, block->phys_addr);
+ queue->num_fileio_blocks--;
+ }
+
+ queue->num_blocks--;
kfree(block);
+
+ mutex_unlock(&queue->lock);
+
+ iio_buffer_put(&queue->buffer);
}

static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
@@ -163,7 +174,7 @@ static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf)
}

static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
- struct iio_dma_buffer_queue *queue, size_t size)
+ struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
{
struct iio_dma_buffer_block *block;

@@ -171,13 +182,16 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
if (!block)
return NULL;

- block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
- &block->phys_addr, GFP_KERNEL);
- if (!block->vaddr) {
- kfree(block);
- return NULL;
+ if (fileio) {
+ block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
+ &block->phys_addr, GFP_KERNEL);
+ if (!block->vaddr) {
+ kfree(block);
+ return NULL;
+ }
}

+ block->fileio = fileio;
block->size = size;
block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
@@ -186,6 +200,9 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(

iio_buffer_get(&queue->buffer);

+ queue->num_blocks++;
+ queue->num_fileio_blocks += fileio;
+
return block;
}

@@ -211,6 +228,9 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
_iio_dma_buffer_block_done(block);
spin_unlock_irqrestore(&queue->list_lock, flags);

+ if (!block->fileio)
+ iio_buffer_signal_dmabuf_done(block->attach, 0);
+
iio_buffer_block_put_atomic(block);
wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
}
@@ -237,10 +257,14 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
list_del(&block->head);
block->bytes_used = 0;
_iio_dma_buffer_block_done(block);
+
+ if (!block->fileio)
+ iio_buffer_signal_dmabuf_done(block->attach, -EINTR);
iio_buffer_block_put_atomic(block);
}
spin_unlock_irqrestore(&queue->list_lock, flags);

+ queue->fileio.enabled = false;
wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
@@ -261,6 +285,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block)
}
}

+static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue)
+{
+ return queue->fileio.enabled ||
+ queue->num_blocks == queue->num_fileio_blocks;
+}
+
/**
* iio_dma_buffer_request_update() - DMA buffer request_update callback
* @buffer: The buffer which to request an update
@@ -287,6 +317,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)

mutex_lock(&queue->lock);

+ queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
+
+ /* If DMABUFs were created, disable fileio interface */
+ if (!queue->fileio.enabled)
+ goto out_unlock;
+
/* Allocations are page aligned */
if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size))
try_reuse = true;
@@ -317,7 +353,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
block = queue->fileio.blocks[i];
if (block->state == IIO_BLOCK_STATE_DEAD) {
/* Could not reuse it */
- iio_buffer_block_put(block);
+ iio_buffer_block_put_atomic(block);
block = NULL;
} else {
block->size = size;
@@ -327,7 +363,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
}

if (!block) {
- block = iio_dma_buffer_alloc_block(queue, size);
+ block = iio_dma_buffer_alloc_block(queue, size, true);
if (!block) {
ret = -ENOMEM;
goto out_unlock;
@@ -363,7 +399,7 @@ static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
if (!queue->fileio.blocks[i])
continue;
- iio_buffer_block_put(queue->fileio.blocks[i]);
+ iio_buffer_block_put_atomic(queue->fileio.blocks[i]);
queue->fileio.blocks[i] = NULL;
}
queue->fileio.active_block = NULL;
@@ -384,8 +420,12 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,

block->state = IIO_BLOCK_STATE_ACTIVE;
iio_buffer_block_get(block);
+
ret = queue->ops->submit(queue, block);
if (ret) {
+ if (!block->fileio)
+ iio_buffer_signal_dmabuf_done(block->attach, ret);
+
/*
* This is a bit of a problem and there is not much we can do
* other then wait for the buffer to be disabled and re-enabled
@@ -588,6 +628,97 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
}
EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available);

+struct iio_dma_buffer_block *
+iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer,
+ struct dma_buf_attachment *attach)
+{
+ struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
+ struct iio_dma_buffer_block *block;
+ int err;
+
+ mutex_lock(&queue->lock);
+
+ /*
+ * If the buffer is enabled and in fileio mode new blocks can't be
+ * allocated.
+ */
+ if (queue->fileio.enabled) {
+ err = -EBUSY;
+ goto err_unlock;
+ }
+
+ block = iio_dma_buffer_alloc_block(queue, attach->dmabuf->size, false);
+ if (!block) {
+ err = -ENOMEM;
+ goto err_unlock;
+ }
+
+ block->attach = attach;
+
+ /* Free memory that might be in use for fileio mode */
+ iio_dma_buffer_fileio_free(queue);
+
+ mutex_unlock(&queue->lock);
+
+ return block;
+
+err_unlock:
+ mutex_unlock(&queue->lock);
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_attach_dmabuf);
+
+void iio_dma_buffer_detach_dmabuf(struct iio_buffer *buffer,
+ struct iio_dma_buffer_block *block)
+{
+ block->state = IIO_BLOCK_STATE_DEAD;
+ iio_buffer_block_put_atomic(block);
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_detach_dmabuf);
+
+static int iio_dma_can_enqueue_block(struct iio_dma_buffer_block *block)
+{
+ struct iio_dma_buffer_queue *queue = block->queue;
+
+ /* If in fileio mode buffers can't be enqueued. */
+ if (queue->fileio.enabled)
+ return -EBUSY;
+
+ switch (block->state) {
+ case IIO_BLOCK_STATE_QUEUED:
+ return -EPERM;
+ case IIO_BLOCK_STATE_DONE:
+ return 0;
+ default:
+ return -EBUSY;
+ }
+}
+
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+ struct iio_dma_buffer_block *block,
+ struct sg_table *sgt,
+ size_t size, bool cyclic)
+{
+ struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
+ int ret = 0;
+
+ mutex_lock(&queue->lock);
+ ret = iio_dma_can_enqueue_block(block);
+ if (ret < 0)
+ goto out_mutex_unlock;
+
+ block->bytes_used = size;
+ block->cyclic = cyclic;
+ block->sg_table = sgt;
+
+ iio_dma_buffer_enqueue(queue, block);
+
+out_mutex_unlock:
+ mutex_unlock(&queue->lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_enqueue_dmabuf);
+
/**
* iio_dma_buffer_set_bytes_per_datum() - DMA buffer set_bytes_per_datum callback
* @buffer: Buffer to set the bytes-per-datum for
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 18d3702fa95d..7be12a6bff5b 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -16,6 +16,8 @@
struct iio_dma_buffer_queue;
struct iio_dma_buffer_ops;
struct device;
+struct dma_buf_attachment;
+struct sg_table;

/**
* enum iio_block_state - State of a struct iio_dma_buffer_block
@@ -41,6 +43,8 @@ enum iio_block_state {
* @queue: Parent DMA buffer queue
* @kref: kref used to manage the lifetime of block
* @state: Current state of the block
+ * @cyclic: True if this is a cyclic buffer
+ * @fileio: True if this buffer is used for fileio mode
*/
struct iio_dma_buffer_block {
/* May only be accessed by the owner of the block */
@@ -63,6 +67,12 @@ struct iio_dma_buffer_block {
* queue->list_lock if the block is not owned by the core.
*/
enum iio_block_state state;
+
+ bool cyclic;
+ bool fileio;
+
+ struct dma_buf_attachment *attach;
+ struct sg_table *sg_table;
};

/**
@@ -72,6 +82,7 @@ struct iio_dma_buffer_block {
* @pos: Read offset in the active block
* @block_size: Size of each block
* @next_dequeue: index of next block that will be dequeued
+ * @enabled: Whether the buffer is operating in fileio mode
*/
struct iio_dma_buffer_queue_fileio {
struct iio_dma_buffer_block *blocks[2];
@@ -80,6 +91,7 @@ struct iio_dma_buffer_queue_fileio {
size_t block_size;

unsigned int next_dequeue;
+ bool enabled;
};

/**
@@ -95,6 +107,8 @@ struct iio_dma_buffer_queue_fileio {
* the DMA controller
* @incoming: List of buffers on the incoming queue
* @active: Whether the buffer is currently active
+ * @num_blocks: Total number of DMA blocks
+ * @num_fileio_blocks: Number of DMA blocks for fileio mode
* @fileio: FileIO state
*/
struct iio_dma_buffer_queue {
@@ -107,6 +121,8 @@ struct iio_dma_buffer_queue {
struct list_head incoming;

bool active;
+ unsigned int num_blocks;
+ unsigned int num_fileio_blocks;

struct iio_dma_buffer_queue_fileio fileio;
};
@@ -142,4 +158,14 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);

+struct iio_dma_buffer_block *
+iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer,
+ struct dma_buf_attachment *attach);
+void iio_dma_buffer_detach_dmabuf(struct iio_buffer *buffer,
+ struct iio_dma_buffer_block *block);
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+ struct iio_dma_buffer_block *block,
+ struct sg_table *sgt,
+ size_t size, bool cyclic);
+
#endif
--
2.43.0


2023-12-19 17:57:28

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 7/8] iio: buffer-dmaengine: Support new DMABUF based userspace API

Use the functions provided by the buffer-dma core to implement the
DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.

Since we want to be able to transfer an arbitrary number of bytes and
not necesarily the full DMABUF, the associated scatterlist is converted
to an array of DMA addresses + lengths, which is then passed to
dmaengine_prep_slave_dma_array().

Signed-off-by: Paul Cercueil <[email protected]>

---
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to
work with the new functions introduced in industrialio-buffer-dma.c.

v5: - Use the new dmaengine_prep_slave_dma_vec().
- Restrict to input buffers, since output buffers are not yet
supported by IIO buffers.
---
.../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++---
1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 5f85ba38e6f6..825d76a24a67 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
struct dmaengine_buffer *dmaengine_buffer =
iio_buffer_to_dmaengine_buffer(&queue->buffer);
struct dma_async_tx_descriptor *desc;
+ unsigned int i, nents;
+ struct scatterlist *sgl;
+ struct dma_vec *vecs;
+ size_t max_size;
dma_cookie_t cookie;
+ size_t len_total;

- block->bytes_used = min(block->size, dmaengine_buffer->max_size);
- block->bytes_used = round_down(block->bytes_used,
- dmaengine_buffer->align);
+ if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
+ /* We do not yet support output buffers. */
+ return -EINVAL;
+ }

- desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
- block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
- DMA_PREP_INTERRUPT);
+ if (block->sg_table) {
+ sgl = block->sg_table->sgl;
+ nents = sg_nents_for_len(sgl, block->bytes_used);
+
+ vecs = kmalloc_array(nents, sizeof(*vecs), GFP_KERNEL);
+ if (!vecs)
+ return -ENOMEM;
+
+ len_total = block->bytes_used;
+
+ for (i = 0; i < nents; i++) {
+ vecs[i].addr = sg_dma_address(sgl);
+ vecs[i].len = min(sg_dma_len(sgl), len_total);
+ len_total -= vecs[i].len;
+
+ sgl = sg_next(sgl);
+ }
+
+ desc = dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
+ vecs, nents, DMA_DEV_TO_MEM,
+ DMA_PREP_INTERRUPT);
+ kfree(vecs);
+ } else {
+ max_size = min(block->size, dmaengine_buffer->max_size);
+ max_size = round_down(max_size, dmaengine_buffer->align);
+ block->bytes_used = max_size;
+
+ desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
+ block->phys_addr,
+ block->bytes_used,
+ DMA_DEV_TO_MEM,
+ DMA_PREP_INTERRUPT);
+ }
if (!desc)
return -ENOMEM;

@@ -120,6 +156,10 @@ static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
.data_available = iio_dma_buffer_data_available,
.release = iio_dmaengine_buffer_release,

+ .enqueue_dmabuf = iio_dma_buffer_enqueue_dmabuf,
+ .attach_dmabuf = iio_dma_buffer_attach_dmabuf,
+ .detach_dmabuf = iio_dma_buffer_detach_dmabuf,
+
.modes = INDIO_BUFFER_HARDWARE,
.flags = INDIO_BUFFER_FLAG_FIXED_WATERMARK,
};
--
2.43.0


2023-12-19 17:57:52

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 8/8] Documentation: iio: Document high-speed DMABUF based API

Document the new DMABUF based API.

Signed-off-by: Paul Cercueil <[email protected]>

---
v2: - Explicitly state that the new interface is optional and is
not implemented by all drivers.
- The IOCTLs can now only be called on the buffer FD returned by
IIO_BUFFER_GET_FD_IOCTL.
- Move the page up a bit in the index since it is core stuff and not
driver-specific.

v3: Update the documentation to reflect the new API.

v5: Use description lists for the documentation of the three new IOCTLs
instead of abusing subsections.
---
Documentation/iio/dmabuf_api.rst | 54 ++++++++++++++++++++++++++++++++
Documentation/iio/index.rst | 2 ++
2 files changed, 56 insertions(+)
create mode 100644 Documentation/iio/dmabuf_api.rst

diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst
new file mode 100644
index 000000000000..1cd6cd51a582
--- /dev/null
+++ b/Documentation/iio/dmabuf_api.rst
@@ -0,0 +1,54 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+High-speed DMABUF interface for IIO
+===================================
+
+1. Overview
+===========
+
+The Industrial I/O subsystem supports access to buffers through a
+file-based interface, with read() and write() access calls through the
+IIO device's dev node.
+
+It additionally supports a DMABUF based interface, where the userspace
+can attach DMABUF objects (externally created) to a IIO buffer, and
+subsequently use them for data transfers.
+
+A userspace application can then use this interface to share DMABUF
+objects between several interfaces, allowing it to transfer data in a
+zero-copy fashion, for instance between IIO and the USB stack.
+
+The userspace application can also memory-map the DMABUF objects, and
+access the sample data directly. The advantage of doing this vs. the
+read() interface is that it avoids an extra copy of the data between the
+kernel and userspace. This is particularly useful for high-speed devices
+which produce several megabytes or even gigabytes of data per second.
+It does however increase the userspace-kernelspace synchronization
+overhead, as the DMA_BUF_SYNC_START and DMA_BUF_SYNC_END IOCTLs have to
+be used for data integrity.
+
+2. User API
+===========
+
+As part of this interface, three new IOCTLs have been added. These three
+IOCTLs have to be performed on the IIO buffer's file descriptor,
+obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
+
+ ``IIO_BUFFER_DMABUF_ATTACH_IOCTL(int)``
+ Attach the DMABUF object, identified by its file descriptor, to the
+ IIO buffer. Returns zero on success, and a negative errno value on
+ error.
+
+ ``IIO_BUFFER_DMABUF_DETACH_IOCTL(int)``
+ Detach the given DMABUF object, identified by its file descriptor,
+ from the IIO buffer. Returns zero on success, and a negative errno
+ value on error.
+
+ Note that closing the IIO buffer's file descriptor will
+ automatically detach all previously attached DMABUF objects.
+
+ ``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *iio_dmabuf)``
+ Enqueue a previously attached DMABUF object to the buffer queue.
+ Enqueued DMABUFs will be read from (if output buffer) or written to
+ (if input buffer) as long as the buffer is enabled.
diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst
index 1b7292c58cd0..3eae8fcb1938 100644
--- a/Documentation/iio/index.rst
+++ b/Documentation/iio/index.rst
@@ -9,6 +9,8 @@ Industrial I/O

iio_configfs

+ dmabuf_api
+
ep93xx_adc

bno055
--
2.43.0


2023-12-19 18:07:03

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 4/8] dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec

Add implementation of the .device_prep_slave_dma_vec() callback.

Signed-off-by: Paul Cercueil <[email protected]>

---
v3: New patch

v5: Implement .device_prep_slave_dma_vec() instead of v3's
.device_prep_slave_dma_array().
---
drivers/dma/dma-axi-dmac.c | 40 ++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
index 2457a420c13d..363808088cc5 100644
--- a/drivers/dma/dma-axi-dmac.c
+++ b/drivers/dma/dma-axi-dmac.c
@@ -535,6 +535,45 @@ static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct axi_dmac_chan *chan,
return sg;
}

+static struct dma_async_tx_descriptor *
+axi_dmac_prep_slave_dma_vec(struct dma_chan *c, const struct dma_vec *vecs,
+ size_t nb, enum dma_transfer_direction direction,
+ unsigned long flags)
+{
+ struct axi_dmac_chan *chan = to_axi_dmac_chan(c);
+ struct axi_dmac_desc *desc;
+ unsigned int num_sgs = 0;
+ struct axi_dmac_sg *dsg;
+ size_t i;
+
+ if (direction != chan->direction)
+ return NULL;
+
+ for (i = 0; i < nb; i++)
+ num_sgs += DIV_ROUND_UP(vecs[i].len, chan->max_length);
+
+ desc = axi_dmac_alloc_desc(num_sgs);
+ if (!desc)
+ return NULL;
+
+ dsg = desc->sg;
+
+ for (i = 0; i < nb; i++) {
+ if (!axi_dmac_check_addr(chan, vecs[i].addr) ||
+ !axi_dmac_check_len(chan, vecs[i].len)) {
+ kfree(desc);
+ return NULL;
+ }
+
+ dsg = axi_dmac_fill_linear_sg(chan, direction, vecs[i].addr, 1,
+ vecs[i].len, dsg);
+ }
+
+ desc->cyclic = false;
+
+ return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
+}
+
static struct dma_async_tx_descriptor *axi_dmac_prep_slave_sg(
struct dma_chan *c, struct scatterlist *sgl,
unsigned int sg_len, enum dma_transfer_direction direction,
@@ -957,6 +996,7 @@ static int axi_dmac_probe(struct platform_device *pdev)
dma_dev->device_tx_status = dma_cookie_status;
dma_dev->device_issue_pending = axi_dmac_issue_pending;
dma_dev->device_prep_slave_sg = axi_dmac_prep_slave_sg;
+ dma_dev->device_prep_slave_dma_vec = axi_dmac_prep_slave_dma_vec;
dma_dev->device_prep_dma_cyclic = axi_dmac_prep_dma_cyclic;
dma_dev->device_prep_interleaved_dma = axi_dmac_prep_interleaved;
dma_dev->device_terminate_all = axi_dmac_terminate_all;
--
2.43.0


2023-12-21 11:29:17

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 1/8] iio: buffer-dma: Get rid of outgoing queue

On Tue, 19 Dec 2023 18:50:02 +0100
Paul Cercueil <[email protected]> wrote:

> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
>
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
>
> Since the new DMABUF based API wouldn't use the outgoing queue anyway,
> getting rid of it now makes the upcoming changes simpler.
>
> With this change, the IIO_BLOCK_STATE_DEQUEUED is now useless, and can
> be removed.
>
> Signed-off-by: Paul Cercueil <[email protected]>
>

I've applied this in interests in reducing the outstanding set of patches
and because it stands fine as on its own.

Applied to the togreg branch of iio.git and pushed out as testing.
Note this is now almost certainly 6.9 material given timing.

Jonathan

2023-12-21 11:32:07

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 2/8] iio: buffer-dma: split iio_dma_buffer_fileio_free() function

On Tue, 19 Dec 2023 18:50:03 +0100
Paul Cercueil <[email protected]> wrote:

> From: Alexandru Ardelean <[email protected]>
>
> This change splits the logic into a separate function, which will be
> re-used later.
>
> Signed-off-by: Alexandru Ardelean <[email protected]>
> Cc: Alexandru Ardelean <[email protected]>
> Signed-off-by: Paul Cercueil <[email protected]>
Harmless refactor so I'm fine taking this now to reduce noise on any
v6

Applied

Jonathan

2023-12-21 11:41:01

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 3/8] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

On Tue, 19 Dec 2023 18:50:04 +0100
Paul Cercueil <[email protected]> wrote:

> This function can be used to initiate a scatter-gather DMA transfer,
> where the address and size of each segment is located in one entry of
> the dma_vec array.
>
> The major difference with dmaengine_prep_slave_sg() is that it supports
> specifying the lengths of each DMA transfer; as trying to override the
> length of the transfer with dmaengine_prep_slave_sg() is a very tedious
> process. The introduction of a new API function is also justified by the
> fact that scatterlists are on their way out.
>
> Note that dmaengine_prep_interleaved_dma() is not helpful either in that
> case, as it assumes that the address of each segment will be higher than
> the one of the previous segment, which we just cannot guarantee in case
> of a scatter-gather transfer.
>
> Signed-off-by: Paul Cercueil <[email protected]>

This and the next patch look fine to me as clearly simplify things for
our usecases, but they are really something for the dmaengine maintainers
to comment on.

Jonathan

2023-12-21 12:07:05

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

On Tue, 19 Dec 2023 18:50:06 +0100
Paul Cercueil <[email protected]> wrote:

> Add the necessary infrastructure to the IIO core to support a new
> optional DMABUF based interface.
>
> With this new interface, DMABUF objects (externally created) can be
> attached to a IIO buffer, and subsequently used for data transfer.
>
> A userspace application can then use this interface to share DMABUF
> objects between several interfaces, allowing it to transfer data in a
> zero-copy fashion, for instance between IIO and the USB stack.
>
> The userspace application can also memory-map the DMABUF objects, and
> access the sample data directly. The advantage of doing this vs. the
> read() interface is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.
>
> As part of the interface, 3 new IOCTLs have been added:
>
> IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
> Attach the DMABUF object identified by the given file descriptor to the
> buffer.
>
> IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
> Detach the DMABUF object identified by the given file descriptor from
> the buffer. Note that closing the IIO buffer's file descriptor will
> automatically detach all previously attached DMABUF objects.
>
> IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
> Request a data transfer to/from the given DMABUF object. Its file
> descriptor, as well as the transfer size and flags are provided in the
> "iio_dmabuf" structure.
>
> These three IOCTLs have to be performed on the IIO buffer's file
> descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
>

Fair enough - so they don't apply to the 'legacy' buffer which simplifies
things but in one place you assume that logic is used (given error return
values).

> Signed-off-by: Paul Cercueil <[email protected]>
>
This is big and complex and I'm out of time for now, so I've made some
comments but should revisit it.
I'm also looking for review from those more familiar with dmabuf side
of things than I am!

Jonathan


>
> +static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock)
> +{
> + int ret;
> +
> + ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
> + if (ret) {
> + if (ret != -EDEADLK)
> + goto out;
> + if (nonblock) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + ret = dma_resv_lock_slow_interruptible(dmabuf->resv, NULL);
> + }
> +
> +out:
> + return ret;

I'm not a fan gotos that do nothing. Just return in appropriate places above.

> +}
>
> +static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req)
> +{
> + struct dma_buf_attachment *attach;
> + struct iio_dmabuf_priv *priv;
> + struct dma_buf *dmabuf;
> + int dmabuf_fd, ret = 0;
> +
> + if (copy_from_user(&dmabuf_fd, user_req, sizeof(dmabuf_fd)))
> + return -EFAULT;
> +
> + dmabuf = dma_buf_get(dmabuf_fd);
> + if (IS_ERR(dmabuf))
> + return PTR_ERR(dmabuf);
> +
> + attach = iio_buffer_find_attachment(ib->indio_dev, dmabuf);
> + if (IS_ERR(attach)) {
> + ret = PTR_ERR(attach);
> + goto out_dmabuf_put;
> + }
> +
> + priv = attach->importer_priv;
> + list_del_init(&priv->entry);
> +
> + /*
> + * Unref twice to release the reference obtained with
> + * iio_buffer_find_attachment() above, and the one obtained in
> + * iio_buffer_attach_dmabuf().
> + */
> + iio_buffer_dmabuf_put(attach);
> + iio_buffer_dmabuf_put(attach);
> +
> +out_dmabuf_put:
> + dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.

> +
> + return ret;
> +}
> +
> +static const char *
> +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
> +{
> + return "iio";
> +}
> +
> +static void iio_buffer_dma_fence_release(struct dma_fence *fence)
> +{
> + struct iio_dma_fence *iio_fence =
> + container_of(fence, struct iio_dma_fence, base);
> +
> + kfree(iio_fence);
> +}
> +
> +static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
> + .get_driver_name = iio_buffer_dma_fence_get_driver_name,
> + .get_timeline_name = iio_buffer_dma_fence_get_driver_name,
> + .release = iio_buffer_dma_fence_release,
> +};
> +
> +static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
> + struct iio_dmabuf __user *iio_dmabuf_req,
> + bool nonblock)
> +{
> + struct iio_dev *indio_dev = ib->indio_dev;
> + struct iio_buffer *buffer = ib->buffer;
> + struct iio_dmabuf iio_dmabuf;
> + struct dma_buf_attachment *attach;
> + struct iio_dmabuf_priv *priv;
> + enum dma_data_direction dir;
> + struct iio_dma_fence *fence;
> + struct dma_buf *dmabuf;
> + struct sg_table *sgt;
> + unsigned long timeout;
> + bool dma_to_ram;
> + bool cyclic;
> + int ret;
> +
> + if (copy_from_user(&iio_dmabuf, iio_dmabuf_req, sizeof(iio_dmabuf)))
> + return -EFAULT;
> +
> + if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
> + return -EINVAL;
> +
> + cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
> +
> + /* Cyclic flag is only supported on output buffers */
> + if (cyclic && buffer->direction != IIO_BUFFER_DIRECTION_OUT)
> + return -EINVAL;
> +
> + dmabuf = dma_buf_get(iio_dmabuf.fd);
> + if (IS_ERR(dmabuf))
> + return PTR_ERR(dmabuf);
> +
> + if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used > dmabuf->size) {
> + ret = -EINVAL;
> + goto err_dmabuf_put;
> + }
> +
> + attach = iio_buffer_find_attachment(indio_dev, dmabuf);
> + if (IS_ERR(attach)) {
> + ret = PTR_ERR(attach);
> + goto err_dmabuf_put;

Might be worth some cleanup.h magic given this put happens in all exit paths.

> + }
> +
> + priv = attach->importer_priv;
> +
> + dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
> + dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> + sgt = dma_buf_map_attachment(attach, dir);
> + if (IS_ERR(sgt)) {
> + ret = PTR_ERR(sgt);
> + dev_err(&indio_dev->dev, "Unable to map attachment: %d\n", ret);
> + goto err_attachment_put;
> + }
> +
> + fence = kmalloc(sizeof(*fence), GFP_KERNEL);
> + if (!fence) {
> + ret = -ENOMEM;
> + goto err_unmap_attachment;
> + }
> +
> + fence->priv = priv;
> + fence->sgt = sgt;
> + fence->dir = dir;
> + priv->fence = fence;
> +
> + dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
> + &priv->lock, priv->context, 0);
> +
> + ret = iio_dma_resv_lock(dmabuf, nonblock);
> + if (ret)
> + goto err_fence_put;
> +
> + timeout = nonblock ? 0 : msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
> +
> + /* Make sure we don't have writers */
> + ret = (int) dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_WRITE,
> + true, timeout);

I'd handle this and similar cases as long rather than adding the odd looking cast and making
me think too much about it.

> + if (ret == 0)
> + ret = -EBUSY;
> + if (ret < 0)
> + goto err_resv_unlock;
> +
> + if (dma_to_ram) {
> + /*
> + * If we're writing to the DMABUF, make sure we don't have
> + * readers
> + */
> + ret = (int) dma_resv_wait_timeout(dmabuf->resv,
> + DMA_RESV_USAGE_READ, true,
> + timeout);
> + if (ret == 0)
> + ret = -EBUSY;
> + if (ret < 0)
> + goto err_resv_unlock;
> + }
> +
> + ret = dma_resv_reserve_fences(dmabuf->resv, 1);
> + if (ret)
> + goto err_resv_unlock;
> +
> + dma_resv_add_fence(dmabuf->resv, &fence->base,
> + dma_resv_usage_rw(dma_to_ram));
> + dma_resv_unlock(dmabuf->resv);
> +
> + ret = buffer->access->enqueue_dmabuf(buffer, priv->block, sgt,
> + iio_dmabuf.bytes_used, cyclic);
> + if (ret)
> + iio_buffer_signal_dmabuf_done(attach, ret);

I'd like a comment on why using the 'successful' path cleanup makes sense in this
error case. It's possible to figure it out, but reviewers are lazy and generally
we like the cleanup to be obvious and local on error paths.

> +
> + dma_buf_put(dmabuf);
> +
> + return ret;
> +
> +err_resv_unlock:
> + dma_resv_unlock(dmabuf->resv);
> +err_fence_put:
> + dma_fence_put(&fence->base);
> +err_unmap_attachment:
> + dma_buf_unmap_attachment(attach, sgt, dir);
> +err_attachment_put:
> + iio_buffer_dmabuf_put(attach);
> +err_dmabuf_put:
> + dma_buf_put(dmabuf);
> +
> + return ret;
> +}
> +
> +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret)
> +{
> + struct iio_dmabuf_priv *priv = attach->importer_priv;
> + struct iio_dma_fence *fence = priv->fence;
> + enum dma_data_direction dir = fence->dir;
> + struct sg_table *sgt = fence->sgt;
> +
> + dma_fence_get(&fence->base);

I don't know much about dma_fence, but is it valid to access
contents of it (sgt, etc) before getting a reference?
Ultimately dma_fence_put() can result in a kfree_rcu() so seems
unlikely to be safe and definitely fails the 'obviously correct'
test. Given those are I assume trivial accesses just do them
down here perhaps?


> + fence->base.error = ret;
> + dma_fence_signal(&fence->base);
> + dma_fence_put(&fence->base);
> +
> + dma_buf_unmap_attachment(attach, sgt, dir);
> + iio_buffer_dmabuf_put(attach);
> +}
> +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
> +

> +static long iio_buffer_chrdev_ioctl(struct file *filp,
> + unsigned int cmd, unsigned long arg)
> +{
> + struct iio_dev_buffer_pair *ib = filp->private_data;
> + void __user *_arg = (void __user *)arg;
> +
> + switch (cmd) {
> + case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
> + return iio_buffer_attach_dmabuf(ib, _arg);
> + case IIO_BUFFER_DMABUF_DETACH_IOCTL:
> + return iio_buffer_detach_dmabuf(ib, _arg);
> + case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
> + return iio_buffer_enqueue_dmabuf(ib, _arg,
> + filp->f_flags & O_NONBLOCK);
> + default:
> + return IIO_IOCTL_UNHANDLED;

Given you aren't using the ioctl handling on the legacy buffer, I think this
should simply return an error code, not the magic value we use to indicate
'all fine, but it's not mine'.
Probably -EINVAL or similar. Note that the wrapper around the legacy buffer
ioctls translates this to -ENODEV; rather than returning from the ioctl.



> + }
> +}
> +
> static const struct file_operations iio_buffer_chrdev_fileops = {
> .owner = THIS_MODULE,
> .llseek = noop_llseek,
> .read = iio_buffer_read,
> .write = iio_buffer_write,
> + .unlocked_ioctl = iio_buffer_chrdev_ioctl,
> + .compat_ioctl = compat_ptr_ioctl,
> .poll = iio_buffer_poll,
> .release = iio_buffer_chrdev_release,
> };


2023-12-21 15:15:14

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH v5 3/8] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

On 19-12-23, 18:50, Paul Cercueil wrote:
> This function can be used to initiate a scatter-gather DMA transfer,
> where the address and size of each segment is located in one entry of
> the dma_vec array.
>
> The major difference with dmaengine_prep_slave_sg() is that it supports
> specifying the lengths of each DMA transfer; as trying to override the
> length of the transfer with dmaengine_prep_slave_sg() is a very tedious
> process. The introduction of a new API function is also justified by the
> fact that scatterlists are on their way out.
>
> Note that dmaengine_prep_interleaved_dma() is not helpful either in that
> case, as it assumes that the address of each segment will be higher than
> the one of the previous segment, which we just cannot guarantee in case
> of a scatter-gather transfer.
>
> Signed-off-by: Paul Cercueil <[email protected]>
>
> ---
> v3: New patch
>
> v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct
> 'dma_vec'.
> Note that at some point we will need to support cyclic transfers
> using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> parameter to the function?
> ---
> include/linux/dmaengine.h | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 3df70d6131c8..ee5931ddb42f 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -160,6 +160,16 @@ struct dma_interleaved_template {
> struct data_chunk sgl[];
> };
>
> +/**
> + * struct dma_vec - DMA vector
> + * @addr: Bus address of the start of the vector
> + * @len: Length in bytes of the DMA vector
> + */
> +struct dma_vec {
> + dma_addr_t addr;
> + size_t len;
> +};

so you want to transfer multiple buffers, right? why not use
dmaengine_prep_slave_sg(). If there is reason for not using that one?

Furthermore I missed replying to your email earlier on use of
dmaengine_prep_interleaved_dma(), my apologies.
That can be made to work for you as well. Please see the notes where icg
can be ignored and it does not need icg value to be set

Infact, interleaved api can be made to work in most of these cases I can
think of...


> +
> /**
> * enum dma_ctrl_flags - DMA flags to augment operation preparation,
> * control completion, and communicate status.
> @@ -910,6 +920,10 @@ struct dma_device {
> struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)(
> struct dma_chan *chan, unsigned long flags);
>
> + struct dma_async_tx_descriptor *(*device_prep_slave_dma_vec)(
> + struct dma_chan *chan, const struct dma_vec *vecs,
> + size_t nents, enum dma_transfer_direction direction,
> + unsigned long flags);
> struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
> struct dma_chan *chan, struct scatterlist *sgl,
> unsigned int sg_len, enum dma_transfer_direction direction,
> @@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single(
> dir, flags, NULL);
> }
>
> +static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
> + struct dma_chan *chan, const struct dma_vec *vecs, size_t nents,
> + enum dma_transfer_direction dir, unsigned long flags)
> +{
> + if (!chan || !chan->device || !chan->device->device_prep_slave_dma_vec)
> + return NULL;
> +
> + return chan->device->device_prep_slave_dma_vec(chan, vecs, nents,
> + dir, flags);
> +}
> +
> static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
> struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len,
> enum dma_transfer_direction dir, unsigned long flags)
> --
> 2.43.0

--
~Vinod

2023-12-21 15:30:44

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 3/8] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

Hi Vinod,

Le jeudi 21 décembre 2023 à 20:44 +0530, Vinod Koul a écrit :
> On 19-12-23, 18:50, Paul Cercueil wrote:
> > This function can be used to initiate a scatter-gather DMA
> > transfer,
> > where the address and size of each segment is located in one entry
> > of
> > the dma_vec array.
> >
> > The major difference with dmaengine_prep_slave_sg() is that it
> > supports
> > specifying the lengths of each DMA transfer; as trying to override
> > the
> > length of the transfer with dmaengine_prep_slave_sg() is a very
> > tedious
> > process. The introduction of a new API function is also justified
> > by the
> > fact that scatterlists are on their way out.
> >
> > Note that dmaengine_prep_interleaved_dma() is not helpful either in
> > that
> > case, as it assumes that the address of each segment will be higher
> > than
> > the one of the previous segment, which we just cannot guarantee in
> > case
> > of a scatter-gather transfer.
> >
> > Signed-off-by: Paul Cercueil <[email protected]>
> >
> > ---
> > v3: New patch
> >
> > v5: Replace with function dmaengine_prep_slave_dma_vec(), and
> > struct
> >     'dma_vec'.
> >     Note that at some point we will need to support cyclic
> > transfers
> >     using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> >     parameter to the function?
> > ---
> >  include/linux/dmaengine.h | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index 3df70d6131c8..ee5931ddb42f 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -160,6 +160,16 @@ struct dma_interleaved_template {
> >   struct data_chunk sgl[];
> >  };
> >  
> > +/**
> > + * struct dma_vec - DMA vector
> > + * @addr: Bus address of the start of the vector
> > + * @len: Length in bytes of the DMA vector
> > + */
> > +struct dma_vec {
> > + dma_addr_t addr;
> > + size_t len;
> > +};
>
> so you want to transfer multiple buffers, right? why not use
> dmaengine_prep_slave_sg(). If there is reason for not using that one?

Well I think I answer that in the commit message, don't I?

> Furthermore I missed replying to your email earlier on use of
> dmaengine_prep_interleaved_dma(), my apologies.
> That can be made to work for you as well. Please see the notes where
> icg
> can be ignored and it does not need icg value to be set
>
> Infact, interleaved api can be made to work in most of these cases I
> can
> think of...

So if I want to transfer 16 bytes from 0x10, then 16 bytes from 0x0,
then 16 bytes from 0x20, how should I configure the
dma_interleaved_template?

Cheers,
-Paul

> > +
> >  /**
> >   * enum dma_ctrl_flags - DMA flags to augment operation
> > preparation,
> >   *  control completion, and communicate status.
> > @@ -910,6 +920,10 @@ struct dma_device {
> >   struct dma_async_tx_descriptor
> > *(*device_prep_dma_interrupt)(
> >   struct dma_chan *chan, unsigned long flags);
> >  
> > + struct dma_async_tx_descriptor
> > *(*device_prep_slave_dma_vec)(
> > + struct dma_chan *chan, const struct dma_vec *vecs,
> > + size_t nents, enum dma_transfer_direction
> > direction,
> > + unsigned long flags);
> >   struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
> >   struct dma_chan *chan, struct scatterlist *sgl,
> >   unsigned int sg_len, enum dma_transfer_direction
> > direction,
> > @@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor
> > *dmaengine_prep_slave_single(
> >     dir, flags,
> > NULL);
> >  }
> >  
> > +static inline struct dma_async_tx_descriptor
> > *dmaengine_prep_slave_dma_vec(
> > + struct dma_chan *chan, const struct dma_vec *vecs, size_t
> > nents,
> > + enum dma_transfer_direction dir, unsigned long flags)
> > +{
> > + if (!chan || !chan->device || !chan->device-
> > >device_prep_slave_dma_vec)
> > + return NULL;
> > +
> > + return chan->device->device_prep_slave_dma_vec(chan, vecs,
> > nents,
> > +        dir,
> > flags);
> > +}
> > +
> >  static inline struct dma_async_tx_descriptor
> > *dmaengine_prep_slave_sg(
> >   struct dma_chan *chan, struct scatterlist
> > *sgl, unsigned int sg_len,
> >   enum dma_transfer_direction dir, unsigned long flags)
> > --
> > 2.43.0
>


2023-12-21 16:05:29

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 6/8] iio: buffer-dma: Enable support for DMABUFs

On Tue, 19 Dec 2023 18:50:07 +0100
Paul Cercueil <[email protected]> wrote:

> Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf()
> and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO
> DMA buffer implementations.
>
> Signed-off-by: Paul Cercueil <[email protected]>
>
Hi Paul,

A few comments in here. Mostly places where the cleanup.h guard() stuff
can simplify error paths.

Overall this looks reasonable to me.

Jonathan

> ---
> v3: Update code to provide the functions that will be used as callbacks
> for the new IOCTLs.
> ---
> drivers/iio/buffer/industrialio-buffer-dma.c | 157 +++++++++++++++++--
> include/linux/iio/buffer-dma.h | 26 +++
> 2 files changed, 170 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index 5610ba67925e..adb64f975064 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -14,6 +14,7 @@
> #include <linux/poll.h>
> #include <linux/iio/buffer_impl.h>
> #include <linux/iio/buffer-dma.h>
> +#include <linux/dma-buf.h>
> #include <linux/dma-mapping.h>
> #include <linux/sizes.h>
>
> @@ -94,14 +95,24 @@ static void iio_buffer_block_release(struct kref *kref)
> {
> struct iio_dma_buffer_block *block = container_of(kref,
> struct iio_dma_buffer_block, kref);
> + struct iio_dma_buffer_queue *queue = block->queue;
>
> - WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
> + WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);
>
> - dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
> - block->vaddr, block->phys_addr);
> + mutex_lock(&queue->lock);
>
> - iio_buffer_put(&block->queue->buffer);
> + if (block->fileio) {
> + dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> + block->vaddr, block->phys_addr);
> + queue->num_fileio_blocks--;
> + }
> +
> + queue->num_blocks--;
> kfree(block);
> +
> + mutex_unlock(&queue->lock);
> +
> + iio_buffer_put(&queue->buffer);
> }
>
> static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
> @@ -163,7 +174,7 @@ static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf)
> }
>
> static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> - struct iio_dma_buffer_queue *queue, size_t size)
> + struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
> {
> struct iio_dma_buffer_block *block;
>
> @@ -171,13 +182,16 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> if (!block)
> return NULL;
>
> - block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> - &block->phys_addr, GFP_KERNEL);
> - if (!block->vaddr) {
> - kfree(block);
> - return NULL;
> + if (fileio) {
> + block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> + &block->phys_addr, GFP_KERNEL);
> + if (!block->vaddr) {
> + kfree(block);
> + return NULL;
> + }
> }
>
> + block->fileio = fileio;
> block->size = size;
> block->state = IIO_BLOCK_STATE_DONE;
> block->queue = queue;
> @@ -186,6 +200,9 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>
> iio_buffer_get(&queue->buffer);
>
> + queue->num_blocks++;
> + queue->num_fileio_blocks += fileio;
Adding a boolean is non intuitive.

if (fileio)
queue->num_fileio_blocks++;

probably easier to read and compiler should be able to figure out how to
optimise the condition away.

> +
> return block;
> }
>
> @@ -211,6 +228,9 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
> _iio_dma_buffer_block_done(block);
> spin_unlock_irqrestore(&queue->list_lock, flags);
>
> + if (!block->fileio)
> + iio_buffer_signal_dmabuf_done(block->attach, 0);
> +
> iio_buffer_block_put_atomic(block);
> wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> }
> @@ -237,10 +257,14 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
> list_del(&block->head);
> block->bytes_used = 0;
> _iio_dma_buffer_block_done(block);
> +
> + if (!block->fileio)
> + iio_buffer_signal_dmabuf_done(block->attach, -EINTR);
> iio_buffer_block_put_atomic(block);
> }
> spin_unlock_irqrestore(&queue->list_lock, flags);
>
> + queue->fileio.enabled = false;

While this obviously doesn't need to be conditional if it can already be false
it might be easier to follow the code flow it if didn't check if we were doing
fileio or not before disabling it.

> wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
> @@ -261,6 +285,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block)
> }
> }
>
> +static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue)
> +{
> + return queue->fileio.enabled ||
> + queue->num_blocks == queue->num_fileio_blocks;
This is a little odd. So would be good have a comment on why this condition.
Or rename the function to imply it's checking if enabled, or can be enabled.

At first glanced I expected a function with this name to just be an accessor
function. e.g.
return queue->fileio.enabled;

> +}
> +
> /**
> * iio_dma_buffer_request_update() - DMA buffer request_update callback
> * @buffer: The buffer which to request an update
> @@ -287,6 +317,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>
> mutex_lock(&queue->lock);
>
> + queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
> +
> + /* If DMABUFs were created, disable fileio interface */
> + if (!queue->fileio.enabled)
> + goto out_unlock;
> +
> /* Allocations are page aligned */
> if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size))
> try_reuse = true;
> @@ -317,7 +353,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> block = queue->fileio.blocks[i];
> if (block->state == IIO_BLOCK_STATE_DEAD) {
> /* Could not reuse it */
> - iio_buffer_block_put(block);
> + iio_buffer_block_put_atomic(block);
> block = NULL;
> } else {
> block->size = size;
> @@ -327,7 +363,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> }
>
> if (!block) {
> - block = iio_dma_buffer_alloc_block(queue, size);
> + block = iio_dma_buffer_alloc_block(queue, size, true);
> if (!block) {
> ret = -ENOMEM;
> goto out_unlock;
> @@ -363,7 +399,7 @@ static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
> for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> if (!queue->fileio.blocks[i])
> continue;
> - iio_buffer_block_put(queue->fileio.blocks[i]);
> + iio_buffer_block_put_atomic(queue->fileio.blocks[i]);

For these cases that are atomic or not, it might be worth calling out why they need to be
atomic.

> queue->fileio.blocks[i] = NULL;
> }
> queue->fileio.active_block = NULL;
> @@ -384,8 +420,12 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
>
> block->state = IIO_BLOCK_STATE_ACTIVE;
> iio_buffer_block_get(block);
> +
> ret = queue->ops->submit(queue, block);
> if (ret) {
> + if (!block->fileio)
> + iio_buffer_signal_dmabuf_done(block->attach, ret);
> +
> /*
> * This is a bit of a problem and there is not much we can do
> * other then wait for the buffer to be disabled and re-enabled
> @@ -588,6 +628,97 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
> }
> EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available);
>
> +struct iio_dma_buffer_block *
> +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer,
> + struct dma_buf_attachment *attach)
> +{
> + struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> + struct iio_dma_buffer_block *block;
> + int err;
> +
> + mutex_lock(&queue->lock);

guard(mutex)(&queue->lock);
> +
> + /*
> + * If the buffer is enabled and in fileio mode new blocks can't be
> + * allocated.
> + */
> + if (queue->fileio.enabled) {
> + err = -EBUSY;
return ERR_PTR(-EBUSY);
> + goto err_unlock;
> + }
> +
> + block = iio_dma_buffer_alloc_block(queue, attach->dmabuf->size, false);
> + if (!block) {
> + err = -ENOMEM;

return

> + goto err_unlock;
> + }
> +
> + block->attach = attach;
> +
> + /* Free memory that might be in use for fileio mode */
> + iio_dma_buffer_fileio_free(queue);
> +
> + mutex_unlock(&queue->lock);

Drop this as unneeded if you use guard()

> +
> + return block;
> +
> +err_unlock:
> + mutex_unlock(&queue->lock);
> + return ERR_PTR(err);
> +}
> +EXPORT_SYMBOL_GPL(iio_dma_buffer_attach_dmabuf);


> +static int iio_dma_can_enqueue_block(struct iio_dma_buffer_block *block)
> +{
> + struct iio_dma_buffer_queue *queue = block->queue;
> +
> + /* If in fileio mode buffers can't be enqueued. */
> + if (queue->fileio.enabled)
> + return -EBUSY;
> +
> + switch (block->state) {
> + case IIO_BLOCK_STATE_QUEUED:
> + return -EPERM;
> + case IIO_BLOCK_STATE_DONE:
> + return 0;
> + default:
> + return -EBUSY;

Is this a real condition or just avoiding a compile warning? If it's real
I'd like the various states that lead to it be listed here just so we
can more easily understand why -EBUSY is appropriate.

> + }
> +}
> +
> +int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
> + struct iio_dma_buffer_block *block,
> + struct sg_table *sgt,
> + size_t size, bool cyclic)
> +{
> + struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> + int ret = 0;

No need to init as it's always set.


> +
> + mutex_lock(&queue->lock);

guard(mutex)(&queue->lock);

Then can do direct returns on error and not bother with the manual
unlock.

> + ret = iio_dma_can_enqueue_block(block);
> + if (ret < 0)
> + goto out_mutex_unlock;
> +
> + block->bytes_used = size;
> + block->cyclic = cyclic;
> + block->sg_table = sgt;
> +
> + iio_dma_buffer_enqueue(queue, block);
> +
> +out_mutex_unlock:
> + mutex_unlock(&queue->lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iio_dma_buffer_enqueue_dmabuf);



2023-12-21 16:15:19

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] iio: buffer-dmaengine: Support new DMABUF based userspace API

On Tue, 19 Dec 2023 18:50:08 +0100
Paul Cercueil <[email protected]> wrote:

> Use the functions provided by the buffer-dma core to implement the
> DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.
>
> Since we want to be able to transfer an arbitrary number of bytes and
> not necesarily the full DMABUF, the associated scatterlist is converted
> to an array of DMA addresses + lengths, which is then passed to
> dmaengine_prep_slave_dma_array().
>
> Signed-off-by: Paul Cercueil <[email protected]>
One question inline. Otherwise looks fine to me.

J
>
> ---
> v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to
> work with the new functions introduced in industrialio-buffer-dma.c.
>
> v5: - Use the new dmaengine_prep_slave_dma_vec().
> - Restrict to input buffers, since output buffers are not yet
> supported by IIO buffers.
> ---
> .../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++---
> 1 file changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index 5f85ba38e6f6..825d76a24a67 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> struct dmaengine_buffer *dmaengine_buffer =
> iio_buffer_to_dmaengine_buffer(&queue->buffer);
> struct dma_async_tx_descriptor *desc;
> + unsigned int i, nents;
> + struct scatterlist *sgl;
> + struct dma_vec *vecs;
> + size_t max_size;
> dma_cookie_t cookie;
> + size_t len_total;
>
> - block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> - block->bytes_used = round_down(block->bytes_used,
> - dmaengine_buffer->align);
> + if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
> + /* We do not yet support output buffers. */
> + return -EINVAL;
> + }
>
> - desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> - block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
> - DMA_PREP_INTERRUPT);
> + if (block->sg_table) {
> + sgl = block->sg_table->sgl;
> + nents = sg_nents_for_len(sgl, block->bytes_used);

Are we guaranteed the length in the sglist is enough? If not this
can return an error code.


> +
> + vecs = kmalloc_array(nents, sizeof(*vecs), GFP_KERNEL);
> + if (!vecs)
> + return -ENOMEM;
> +
> + len_total = block->bytes_used;
> +
> + for (i = 0; i < nents; i++) {
> + vecs[i].addr = sg_dma_address(sgl);
> + vecs[i].len = min(sg_dma_len(sgl), len_total);
> + len_total -= vecs[i].len;
> +
> + sgl = sg_next(sgl);
> + }
> +
> + desc = dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
> + vecs, nents, DMA_DEV_TO_MEM,
> + DMA_PREP_INTERRUPT);
> + kfree(vecs);
> + } else {
> + max_size = min(block->size, dmaengine_buffer->max_size);
> + max_size = round_down(max_size, dmaengine_buffer->align);
> + block->bytes_used = max_size;
> +
> + desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> + block->phys_addr,
> + block->bytes_used,
> + DMA_DEV_TO_MEM,
> + DMA_PREP_INTERRUPT);
> + }
> if (!desc)
> return -ENOMEM;
>


2023-12-21 16:16:27

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] Documentation: iio: Document high-speed DMABUF based API

On Tue, 19 Dec 2023 18:50:09 +0100
Paul Cercueil <[email protected]> wrote:

> Document the new DMABUF based API.
>
> Signed-off-by: Paul Cercueil <[email protected]>
One minor comment inline.

>
> ---
> v2: - Explicitly state that the new interface is optional and is
> not implemented by all drivers.
> - The IOCTLs can now only be called on the buffer FD returned by
> IIO_BUFFER_GET_FD_IOCTL.
> - Move the page up a bit in the index since it is core stuff and not
> driver-specific.
>
> v3: Update the documentation to reflect the new API.
>
> v5: Use description lists for the documentation of the three new IOCTLs
> instead of abusing subsections.
> ---
> Documentation/iio/dmabuf_api.rst | 54 ++++++++++++++++++++++++++++++++
> Documentation/iio/index.rst | 2 ++
> 2 files changed, 56 insertions(+)
> create mode 100644 Documentation/iio/dmabuf_api.rst
>
> diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst
> new file mode 100644
> index 000000000000..1cd6cd51a582
> --- /dev/null
> +++ b/Documentation/iio/dmabuf_api.rst
> @@ -0,0 +1,54 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===================================
> +High-speed DMABUF interface for IIO
> +===================================
> +
> +1. Overview
> +===========
> +
> +The Industrial I/O subsystem supports access to buffers through a
> +file-based interface, with read() and write() access calls through the
> +IIO device's dev node.
> +
> +It additionally supports a DMABUF based interface, where the userspace
> +can attach DMABUF objects (externally created) to a IIO buffer, and
> +subsequently use them for data transfers.
> +
> +A userspace application can then use this interface to share DMABUF
> +objects between several interfaces, allowing it to transfer data in a
> +zero-copy fashion, for instance between IIO and the USB stack.
> +
> +The userspace application can also memory-map the DMABUF objects, and
> +access the sample data directly. The advantage of doing this vs. the
> +read() interface is that it avoids an extra copy of the data between the
> +kernel and userspace. This is particularly useful for high-speed devices
> +which produce several megabytes or even gigabytes of data per second.
> +It does however increase the userspace-kernelspace synchronization
> +overhead, as the DMA_BUF_SYNC_START and DMA_BUF_SYNC_END IOCTLs have to
> +be used for data integrity.
> +
> +2. User API
> +===========
> +
> +As part of this interface, three new IOCTLs have been added. These three
> +IOCTLs have to be performed on the IIO buffer's file descriptor,
> +obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.

I would call out that they do not work on the main file descriptor (which
is arguably also a IIO buffer file descriptor).

> +
> + ``IIO_BUFFER_DMABUF_ATTACH_IOCTL(int)``
> + Attach the DMABUF object, identified by its file descriptor, to the
> + IIO buffer. Returns zero on success, and a negative errno value on
> + error.
> +
> + ``IIO_BUFFER_DMABUF_DETACH_IOCTL(int)``
> + Detach the given DMABUF object, identified by its file descriptor,
> + from the IIO buffer. Returns zero on success, and a negative errno
> + value on error.
> +
> + Note that closing the IIO buffer's file descriptor will
> + automatically detach all previously attached DMABUF objects.
> +
> + ``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *iio_dmabuf)``
> + Enqueue a previously attached DMABUF object to the buffer queue.
> + Enqueued DMABUFs will be read from (if output buffer) or written to
> + (if input buffer) as long as the buffer is enabled.


2023-12-21 16:32:58

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

On Tue, 19 Dec 2023 18:50:01 +0100
Paul Cercueil <[email protected]> wrote:

> [V4 was: "iio: Add buffer write() support"][1]
>
> Hi Jonathan,
>
Hi Paul,

> This is a respin of the V3 of my patchset that introduced a new
> interface based on DMABUF objects [2].

Great to see this moving forwards.

>
> The V4 was a split of the patchset, to attempt to upstream buffer
> write() support first. But since there is no current user upstream, it
> was not merged. This V5 is about doing the opposite, and contains the
> new DMABUF interface, without adding the buffer write() support. It can
> already be used with the upstream adi-axi-adc driver.

Seems like a sensible path in the short term.

>
> In user-space, Libiio uses it to transfer back and forth blocks of
> samples between the hardware and the applications, without having to
> copy the data.
>
> On a ZCU102 with a FMComms3 daughter board, running Libiio from the
> pcercuei/dev-new-dmabuf-api branch [3], compiled with
> WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> Throughput: 116 MiB/s
>
> Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> Throughput: 475 MiB/s
>
> This benchmark only measures the speed at which the data can be fetched
> to iio_rwdev's internal buffers, and does not actually try to read the
> data (e.g. to pipe it to stdout). It shows that fetching the data is
> more than 4x faster using the new interface.
>
> When actually reading the data, the performance difference isn't that
> impressive (maybe because in case of DMABUF the data is not in cache):

This needs a bit more investigation ideally. Perhaps perf counters can be
used to establish that cache misses are the main different between
dropping it on the floor and actually reading the data.

>
> WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress
> 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
>
> WITH_LOCAL_DMABUF_API=ON:
> sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress
> 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
>
> One interesting thing to note is that fileio is (currently) actually
> faster than the DMABUF interface if you increase a lot the buffer size.
> My explanation is that the cache invalidation routine takes more and
> more time the bigger the DMABUF gets. This is because the DMABUF is
> backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up
> to 16 thousands pages, that have to be invalidated one by one. This can
> be addressed by using huge pages, but the udmabuf driver does not (yet)
> support creating DMABUFs backed by huge pages.

I'd imagine folios of reasonable size will help sort of a huge page
as then hopefully it will use the flush by va range instructions if
available.

>
> Anyway, the real benefits happen when the DMABUFs are either shared
> between IIO devices, or between the IIO subsystem and another
> filesystem. In that case, the DMABUFs are simply passed around drivers,
> without the data being copied at any moment.
>
> We use that feature to transfer samples from our transceivers to USB,
> using a DMABUF interface to FunctionFS [4].
>
> This drastically increases the throughput, to about 274 MiB/s over a
> USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the
> FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).

This is a nice example. Where are you with getting the patch merged?

Overall, this code looks fine to me, though there are some parts that
need review by other maintainers (e.g. Vinod for the dmaengine callback)
and I'd like a 'looks fine' at least form those who know a lot more
about dmabuf than I do.

To actually make this useful sounds like either udmabuf needs some perf
improvements, or there has to be an upstream case of sharing it without
something else (e.g your functionfs patches). So what do we need to
get in before the positive benefit becomes worth carrying this extra
complexity? (which isn't too bad so I'm fine with a small benefit and
promises of riches :)

Jonathan

>
> Based on linux-next/next-20231219.
>
> Cheers,
> -Paul
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
> [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> [4] https://lore.kernel.org/all/[email protected]/
>
> ---
> Changelog:
> - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
> dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct.
> Note that at some point we will need to support cyclic transfers
> using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> parameter to the function?
>
> - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> .device_prep_slave_dma_array().
>
> @Vinod: this patch will cause a small conflict with my other
> patchset adding scatter-gather support to the axi-dmac driver.
> This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
> prototype of this function changed in my other patchset - it would
> have to be passed the "chan" variable. I don't know how you prefer it
> to be resolved. Worst case scenario (and if @Jonathan is okay with
> that) this one patch can be re-sent later, but it would make this
> patchset less "atomic".
>
> - [5/8]:
> - Use dev_err() instead of pr_err()
> - Inline to_iio_dma_fence()
> - Add comment to explain why we unref twice when detaching dmabuf
> - Remove TODO comment. It is actually safe to free the file's
> private data even when transfers are still pending because it
> won't be accessed.
> - Fix documentation of new fields in struct iio_buffer_access_funcs
> - iio_dma_resv_lock() does not need to be exported, make it static
>
> - [7/8]:
> - Use the new dmaengine_prep_slave_dma_vec().
> - Restrict to input buffers, since output buffers are not yet
> supported by IIO buffers.
>
> - [8/8]:
> Use description lists for the documentation of the three new IOCTLs
> instead of abusing subsections.
>
> ---
> Alexandru Ardelean (1):
> iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>
> Paul Cercueil (7):
> iio: buffer-dma: Get rid of outgoing queue
> dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
> iio: core: Add new DMABUF interface infrastructure
> iio: buffer-dma: Enable support for DMABUFs
> iio: buffer-dmaengine: Support new DMABUF based userspace API
> Documentation: iio: Document high-speed DMABUF based API
>
> Documentation/iio/dmabuf_api.rst | 54 +++
> Documentation/iio/index.rst | 2 +
> drivers/dma/dma-axi-dmac.c | 40 ++
> drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++---
> .../buffer/industrialio-buffer-dmaengine.c | 52 ++-
> drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++
> include/linux/dmaengine.h | 25 ++
> include/linux/iio/buffer-dma.h | 33 +-
> include/linux/iio/buffer_impl.h | 26 ++
> include/uapi/linux/iio/buffer.h | 22 +
> 10 files changed, 836 insertions(+), 62 deletions(-)
> create mode 100644 Documentation/iio/dmabuf_api.rst
>


2023-12-21 17:22:10

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Hi Jonathan,

Le jeudi 21 décembre 2023 à 12:06 +0000, Jonathan Cameron a écrit :
> On Tue, 19 Dec 2023 18:50:06 +0100
> Paul Cercueil <[email protected]> wrote:
>
> > Add the necessary infrastructure to the IIO core to support a new
> > optional DMABUF based interface.
> >
> > With this new interface, DMABUF objects (externally created) can be
> > attached to a IIO buffer, and subsequently used for data transfer.
> >
> > A userspace application can then use this interface to share DMABUF
> > objects between several interfaces, allowing it to transfer data in
> > a
> > zero-copy fashion, for instance between IIO and the USB stack.
> >
> > The userspace application can also memory-map the DMABUF objects,
> > and
> > access the sample data directly. The advantage of doing this vs.
> > the
> > read() interface is that it avoids an extra copy of the data
> > between the
> > kernel and userspace. This is particularly userful for high-speed
> > devices which produce several megabytes or even gigabytes of data
> > per
> > second.
> >
> > As part of the interface, 3 new IOCTLs have been added:
> >
> > IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
> >  Attach the DMABUF object identified by the given file descriptor
> > to the
> >  buffer.
> >
> > IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
> >  Detach the DMABUF object identified by the given file descriptor
> > from
> >  the buffer. Note that closing the IIO buffer's file descriptor
> > will
> >  automatically detach all previously attached DMABUF objects.
> >
> > IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
> >  Request a data transfer to/from the given DMABUF object. Its file
> >  descriptor, as well as the transfer size and flags are provided in
> > the
> >  "iio_dmabuf" structure.
> >
> > These three IOCTLs have to be performed on the IIO buffer's file
> > descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
> >
>
> Fair enough - so they don't apply to the 'legacy' buffer which
> simplifies
> things but in one place you assume that logic is used (given error
> return
> values).
>
> > Signed-off-by: Paul Cercueil <[email protected]>
> >
> This is big and complex and I'm out of time for now, so I've made
> some
> comments but should revisit it.
> I'm also looking for review from those more familiar with dmabuf side
> of things than I am!
>
> Jonathan
>
>
> >  
> > +static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool
> > nonblock)
> > +{
> > + int ret;
> > +
> > + ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
> > + if (ret) {
> > + if (ret != -EDEADLK)
> > + goto out;
> > + if (nonblock) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + ret = dma_resv_lock_slow_interruptible(dmabuf-
> > >resv, NULL);
> > + }
> > +
> > +out:
> > + return ret;
>
> I'm not a fan gotos that do nothing.  Just return in appropriate
> places above.
>
> > +}
> >
> > +static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair
> > *ib, int *user_req)
> > +{
> > + struct dma_buf_attachment *attach;
> > + struct iio_dmabuf_priv *priv;
> > + struct dma_buf *dmabuf;
> > + int dmabuf_fd, ret = 0;
> > +
> > + if (copy_from_user(&dmabuf_fd, user_req,
> > sizeof(dmabuf_fd)))
> > + return -EFAULT;
> > +
> > + dmabuf = dma_buf_get(dmabuf_fd);
> > + if (IS_ERR(dmabuf))
> > + return PTR_ERR(dmabuf);
> > +
> > + attach = iio_buffer_find_attachment(ib->indio_dev,
> > dmabuf);
> > + if (IS_ERR(attach)) {
> > + ret = PTR_ERR(attach);
> > + goto out_dmabuf_put;
> > + }
> > +
> > + priv = attach->importer_priv;
> > + list_del_init(&priv->entry);
> > +
> > + /*
> > + * Unref twice to release the reference obtained with
> > + * iio_buffer_find_attachment() above, and the one
> > obtained in
> > + * iio_buffer_attach_dmabuf().
> > + */
> > + iio_buffer_dmabuf_put(attach);
> > + iio_buffer_dmabuf_put(attach);
> > +
> > +out_dmabuf_put:
> > + dma_buf_put(dmabuf);
> As below. Feels like a __free(dma_buf_put) bit of magic would be a
> nice to have.

Whoa, never heard about this. That looks great!

>
> > +
> > + return ret;
> > +}
> > +
> > +static const char *
> > +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
> > +{
> > + return "iio";
> > +}
> > +
> > +static void iio_buffer_dma_fence_release(struct dma_fence *fence)
> > +{
> > + struct iio_dma_fence *iio_fence =
> > + container_of(fence, struct iio_dma_fence, base);
> > +
> > + kfree(iio_fence);
> > +}
> > +
> > +static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
> > + .get_driver_name =
> > iio_buffer_dma_fence_get_driver_name,
> > + .get_timeline_name =
> > iio_buffer_dma_fence_get_driver_name,
> > + .release = iio_buffer_dma_fence_release,
> > +};
> > +
> > +static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair
> > *ib,
> > +      struct iio_dmabuf __user
> > *iio_dmabuf_req,
> > +      bool nonblock)
> > +{
> > + struct iio_dev *indio_dev = ib->indio_dev;
> > + struct iio_buffer *buffer = ib->buffer;
> > + struct iio_dmabuf iio_dmabuf;
> > + struct dma_buf_attachment *attach;
> > + struct iio_dmabuf_priv *priv;
> > + enum dma_data_direction dir;
> > + struct iio_dma_fence *fence;
> > + struct dma_buf *dmabuf;
> > + struct sg_table *sgt;
> > + unsigned long timeout;
> > + bool dma_to_ram;
> > + bool cyclic;
> > + int ret;
> > +
> > + if (copy_from_user(&iio_dmabuf, iio_dmabuf_req,
> > sizeof(iio_dmabuf)))
> > + return -EFAULT;
> > +
> > + if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
> > + return -EINVAL;
> > +
> > + cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
> > +
> > + /* Cyclic flag is only supported on output buffers */
> > + if (cyclic && buffer->direction !=
> > IIO_BUFFER_DIRECTION_OUT)
> > + return -EINVAL;
> > +
> > + dmabuf = dma_buf_get(iio_dmabuf.fd);
> > + if (IS_ERR(dmabuf))
> > + return PTR_ERR(dmabuf);
> > +
> > + if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used >
> > dmabuf->size) {
> > + ret = -EINVAL;
> > + goto err_dmabuf_put;
> > + }
> > +
> > + attach = iio_buffer_find_attachment(indio_dev, dmabuf);
> > + if (IS_ERR(attach)) {
> > + ret = PTR_ERR(attach);
> > + goto err_dmabuf_put;
>
> Might be worth some cleanup.h magic given this put happens in all
> exit paths.
>
> > + }
> > +
> > + priv = attach->importer_priv;
> > +
> > + dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
> > + dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > + sgt = dma_buf_map_attachment(attach, dir);
> > + if (IS_ERR(sgt)) {
> > + ret = PTR_ERR(sgt);
> > + dev_err(&indio_dev->dev, "Unable to map
> > attachment: %d\n", ret);
> > + goto err_attachment_put;
> > + }
> > +
> > + fence = kmalloc(sizeof(*fence), GFP_KERNEL);
> > + if (!fence) {
> > + ret = -ENOMEM;
> > + goto err_unmap_attachment;
> > + }
> > +
> > + fence->priv = priv;
> > + fence->sgt = sgt;
> > + fence->dir = dir;
> > + priv->fence = fence;
> > +
> > + dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
> > +        &priv->lock, priv->context, 0);
> > +
> > + ret = iio_dma_resv_lock(dmabuf, nonblock);
> > + if (ret)
> > + goto err_fence_put;
> > +
> > + timeout = nonblock ? 0 :
> > msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
> > +
> > + /* Make sure we don't have writers */
> > + ret = (int) dma_resv_wait_timeout(dmabuf->resv,
> > DMA_RESV_USAGE_WRITE,
> > +   true, timeout);
>
> I'd handle this and similar cases as long rather than adding the odd
> looking cast and making
> me think too much about it.
>
> > + if (ret == 0)
> > + ret = -EBUSY;
> > + if (ret < 0)
> > + goto err_resv_unlock;
> > +
> > + if (dma_to_ram) {
> > + /*
> > + * If we're writing to the DMABUF, make sure we
> > don't have
> > + * readers
> > + */
> > + ret = (int) dma_resv_wait_timeout(dmabuf->resv,
> > +  
> > DMA_RESV_USAGE_READ, true,
> > +   timeout);
> > + if (ret == 0)
> > + ret = -EBUSY;
> > + if (ret < 0)
> > + goto err_resv_unlock;
> > + }
> > +
> > + ret = dma_resv_reserve_fences(dmabuf->resv, 1);
> > + if (ret)
> > + goto err_resv_unlock;
> > +
> > + dma_resv_add_fence(dmabuf->resv, &fence->base,
> > +    dma_resv_usage_rw(dma_to_ram));
> > + dma_resv_unlock(dmabuf->resv);
> > +
> > + ret = buffer->access->enqueue_dmabuf(buffer, priv->block,
> > sgt,
> > +     
> > iio_dmabuf.bytes_used, cyclic);
> > + if (ret)
> > + iio_buffer_signal_dmabuf_done(attach, ret);
>
> I'd like a comment on why using the 'successful' path cleanup makes
> sense in this
> error case.  It's possible to figure it out, but reviewers are lazy
> and generally
> we like the cleanup to be obvious and local on error paths.
>
> > +
> > + dma_buf_put(dmabuf);
> > +
> > + return ret;
> > +
> > +err_resv_unlock:
> > + dma_resv_unlock(dmabuf->resv);
> > +err_fence_put:
> > + dma_fence_put(&fence->base);
> > +err_unmap_attachment:
> > + dma_buf_unmap_attachment(attach, sgt, dir);
> > +err_attachment_put:
> > + iio_buffer_dmabuf_put(attach);
> > +err_dmabuf_put:
> > + dma_buf_put(dmabuf);
> > +
> > + return ret;
> > +}
> > +
> > +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment
> > *attach, int ret)
> > +{
> > + struct iio_dmabuf_priv *priv = attach->importer_priv;
> > + struct iio_dma_fence *fence = priv->fence;
> > + enum dma_data_direction dir = fence->dir;
> > + struct sg_table *sgt = fence->sgt;
> > +
> > + dma_fence_get(&fence->base);
>
> I don't know much about dma_fence, but is it valid to access
> contents of it (sgt, etc) before getting a reference?
> Ultimately dma_fence_put() can result in a kfree_rcu() so seems
> unlikely to be safe and definitely fails the 'obviously correct'
> test.  Given those are I assume trivial accesses just do them
> down here perhaps?

It is valid to access the fence before getting a reference - the fence
won't be freed before it is signaled down below. It would be illegal to
access it after the dma_fence_put() though, which I'm not doing here.

>
>
> > + fence->base.error = ret;
> > + dma_fence_signal(&fence->base);
> > + dma_fence_put(&fence->base);
> > +
> > + dma_buf_unmap_attachment(attach, sgt, dir);
> > + iio_buffer_dmabuf_put(attach);
> > +}
> > +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
> > +
>
> > +static long iio_buffer_chrdev_ioctl(struct file *filp,
> > +     unsigned int cmd, unsigned
> > long arg)
> > +{
> > + struct iio_dev_buffer_pair *ib = filp->private_data;
> > + void __user *_arg = (void __user *)arg;
> > +
> > + switch (cmd) {
> > + case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
> > + return iio_buffer_attach_dmabuf(ib, _arg);
> > + case IIO_BUFFER_DMABUF_DETACH_IOCTL:
> > + return iio_buffer_detach_dmabuf(ib, _arg);
> > + case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
> > + return iio_buffer_enqueue_dmabuf(ib, _arg,
> > + filp->f_flags &
> > O_NONBLOCK);
> > + default:
> > + return IIO_IOCTL_UNHANDLED;
>
> Given you aren't using the ioctl handling on the legacy buffer, I
> think this
> should simply return an error code, not the magic value we use to
> indicate
> 'all fine, but it's not mine'.
> Probably -EINVAL or similar.  Note that the wrapper around the legacy
> buffer
> ioctls translates this to -ENODEV; rather than returning from the
> ioctl.

ACK for this and the other comments.

Thanks for reviewing!

Cheers,
-Paul

> > + }
> > +}
> > +
> >  static const struct file_operations iio_buffer_chrdev_fileops = {
> >   .owner = THIS_MODULE,
> >   .llseek = noop_llseek,
> >   .read = iio_buffer_read,
> >   .write = iio_buffer_write,
> > + .unlocked_ioctl = iio_buffer_chrdev_ioctl,
> > + .compat_ioctl = compat_ptr_ioctl,
> >   .poll = iio_buffer_poll,
> >   .release = iio_buffer_chrdev_release,
> >  };
>


2023-12-21 17:32:10

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] iio: buffer-dmaengine: Support new DMABUF based userspace API

Hi Jonathan,

Le jeudi 21 décembre 2023 à 16:12 +0000, Jonathan Cameron a écrit :
> On Tue, 19 Dec 2023 18:50:08 +0100
> Paul Cercueil <[email protected]> wrote:
>
> > Use the functions provided by the buffer-dma core to implement the
> > DMABUF userspace API in the buffer-dmaengine IIO buffer
> > implementation.
> >
> > Since we want to be able to transfer an arbitrary number of bytes
> > and
> > not necesarily the full DMABUF, the associated scatterlist is
> > converted
> > to an array of DMA addresses + lengths, which is then passed to
> > dmaengine_prep_slave_dma_array().
> >
> > Signed-off-by: Paul Cercueil <[email protected]>
> One question inline. Otherwise looks fine to me.
>
> J
> >
> > ---
> > v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the
> > code to
> >     work with the new functions introduced in industrialio-buffer-
> > dma.c.
> >
> > v5: - Use the new dmaengine_prep_slave_dma_vec().
> >     - Restrict to input buffers, since output buffers are not yet
> >       supported by IIO buffers.
> > ---
> >  .../buffer/industrialio-buffer-dmaengine.c    | 52
> > ++++++++++++++++---
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > index 5f85ba38e6f6..825d76a24a67 100644
> > --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > @@ -64,15 +64,51 @@ static int
> > iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue
> > *queue,
> >   struct dmaengine_buffer *dmaengine_buffer =
> >   iio_buffer_to_dmaengine_buffer(&queue->buffer);
> >   struct dma_async_tx_descriptor *desc;
> > + unsigned int i, nents;
> > + struct scatterlist *sgl;
> > + struct dma_vec *vecs;
> > + size_t max_size;
> >   dma_cookie_t cookie;
> > + size_t len_total;
> >  
> > - block->bytes_used = min(block->size, dmaengine_buffer-
> > >max_size);
> > - block->bytes_used = round_down(block->bytes_used,
> > - dmaengine_buffer->align);
> > + if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
> > + /* We do not yet support output buffers. */
> > + return -EINVAL;
> > + }
> >  
> > - desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> > - block->phys_addr, block->bytes_used,
> > DMA_DEV_TO_MEM,
> > - DMA_PREP_INTERRUPT);
> > + if (block->sg_table) {
> > + sgl = block->sg_table->sgl;
> > + nents = sg_nents_for_len(sgl, block->bytes_used);
>
> Are we guaranteed the length in the sglist is enough?  If not this
> can return an error code.

The length of the sglist will always be enough, the
iio_buffer_enqueue_dmabuf() function already checks that block-
>bytes_used is equal or smaller than the size of the DMABUF.

It is quite a few functions above in the call stack though, so I can
handle the errors of sg_nents_for_len() here if you think makes sense.

Cheers,
-Paul

> > +
> > + vecs = kmalloc_array(nents, sizeof(*vecs),
> > GFP_KERNEL);
> > + if (!vecs)
> > + return -ENOMEM;
> > +
> > + len_total = block->bytes_used;
> > +
> > + for (i = 0; i < nents; i++) {
> > + vecs[i].addr = sg_dma_address(sgl);
> > + vecs[i].len = min(sg_dma_len(sgl),
> > len_total);
> > + len_total -= vecs[i].len;
> > +
> > + sgl = sg_next(sgl);
> > + }
> > +
> > + desc =
> > dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
> > +     vecs, nents,
> > DMA_DEV_TO_MEM,
> > +    
> > DMA_PREP_INTERRUPT);
> > + kfree(vecs);
> > + } else {
> > + max_size = min(block->size, dmaengine_buffer-
> > >max_size);
> > + max_size = round_down(max_size, dmaengine_buffer-
> > >align);
> > + block->bytes_used = max_size;
> > +
> > + desc =
> > dmaengine_prep_slave_single(dmaengine_buffer->chan,
> > +    block-
> > >phys_addr,
> > +    block-
> > >bytes_used,
> > +    DMA_DEV_TO_MEM,
> > +   
> > DMA_PREP_INTERRUPT);
> > + }
> >   if (!desc)
> >   return -ENOMEM;
> >  
>


2023-12-21 17:57:15

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

Hi Jonathan,

Le jeudi 21 décembre 2023 à 16:30 +0000, Jonathan Cameron a écrit :
> On Tue, 19 Dec 2023 18:50:01 +0100
> Paul Cercueil <[email protected]> wrote:
>
> > [V4 was: "iio: Add buffer write() support"][1]
> >
> > Hi Jonathan,
> >
> Hi Paul,
>
> > This is a respin of the V3 of my patchset that introduced a new
> > interface based on DMABUF objects [2].
>
> Great to see this moving forwards.
>
> >
> > The V4 was a split of the patchset, to attempt to upstream buffer
> > write() support first. But since there is no current user upstream,
> > it
> > was not merged. This V5 is about doing the opposite, and contains
> > the
> > new DMABUF interface, without adding the buffer write() support. It
> > can
> > already be used with the upstream adi-axi-adc driver.
>
> Seems like a sensible path in the short term.
>
> >
> > In user-space, Libiio uses it to transfer back and forth blocks of
> > samples between the hardware and the applications, without having
> > to
> > copy the data.
> >
> > On a ZCU102 with a FMComms3 daughter board, running Libiio from the
> > pcercuei/dev-new-dmabuf-api branch [3], compiled with
> > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> >   sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> >   Throughput: 116 MiB/s
> >
> > Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> >   sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> >   Throughput: 475 MiB/s
> >
> > This benchmark only measures the speed at which the data can be
> > fetched
> > to iio_rwdev's internal buffers, and does not actually try to read
> > the
> > data (e.g. to pipe it to stdout). It shows that fetching the data
> > is
> > more than 4x faster using the new interface.
> >
> > When actually reading the data, the performance difference isn't
> > that
> > impressive (maybe because in case of DMABUF the data is not in
> > cache):
>
> This needs a bit more investigation ideally. Perhaps perf counters
> can be
> used to establish that cache misses are the main different between
> dropping it on the floor and actually reading the data.

Yes, we'll work on it. The other big difference is that fileio uses
dma_alloc_coherent() while the DMABUFs use non-coherent mappings. I
guess coherent memory is faster for the typical access pattern (which
is "read/write everything sequentially once").

> >
> > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> >   sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > status=progress
> >   2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
> >
> > WITH_LOCAL_DMABUF_API=ON:
> >   sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > status=progress
> >   2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
> >
> > One interesting thing to note is that fileio is (currently)
> > actually
> > faster than the DMABUF interface if you increase a lot the buffer
> > size.
> > My explanation is that the cache invalidation routine takes more
> > and
> > more time the bigger the DMABUF gets. This is because the DMABUF is
> > backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by
> > up
> > to 16 thousands pages, that have to be invalidated one by one. This
> > can
> > be addressed by using huge pages, but the udmabuf driver does not
> > (yet)
> > support creating DMABUFs backed by huge pages.
>
> I'd imagine folios of reasonable size will help sort of a huge page
> as then hopefully it will use the flush by va range instructions if
> available.
>
> >
> > Anyway, the real benefits happen when the DMABUFs are either shared
> > between IIO devices, or between the IIO subsystem and another
> > filesystem. In that case, the DMABUFs are simply passed around
> > drivers,
> > without the data being copied at any moment.
> >
> > We use that feature to transfer samples from our transceivers to
> > USB,
> > using a DMABUF interface to FunctionFS [4].
> >
> > This drastically increases the throughput, to about 274 MiB/s over
> > a
> > USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to
> > the
> > FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load
> > avg.).
>
> This is a nice example.  Where are you with getting the patch merged?

I'll send a new version (mostly a [RESEND]...) in the coming days. As
you can see from the review on my last attempt, the main blocker is
that nobody wants to merge a new interface if the rest of the kernel
bits aren't upstream yet. Kind of a chicken-and-egg problem :)

> Overall, this code looks fine to me, though there are some parts that
> need review by other maintainers (e.g. Vinod for the dmaengine
> callback)
> and I'd like a 'looks fine' at least form those who know a lot more
> about dmabuf than I do.
>
> To actually make this useful sounds like either udmabuf needs some
> perf
> improvements, or there has to be an upstream case of sharing it
> without
> something else (e.g your functionfs patches).  So what do we need to
> get in before the positive benefit becomes worth carrying this extra
> complexity? (which isn't too bad so I'm fine with a small benefit and
> promises of riches :)

I think the FunctionFS DMABUF interface can be pushed as well for 5.9,
in parallel of this one, as the feedback on the V1 was good. I might
just need some help pushing it forward (kind of a "I merge it if you
merge it" guarantee).

Cheers,
-Paul

>
> Jonathan
>
> >
> > Based on linux-next/next-20231219.
> >
> > Cheers,
> > -Paul
> >
> > [1]
> > https://lore.kernel.org/all/[email protected]/
> > [2]
> > https://lore.kernel.org/all/[email protected]/
> > [3]
> > https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> > [4]
> > https://lore.kernel.org/all/[email protected]/
> >
> > ---
> > Changelog:
> > - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
> >   dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec'
> > struct.
> >   Note that at some point we will need to support cyclic transfers
> >   using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> >   parameter to the function?
> >
> > - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> >   .device_prep_slave_dma_array().
> >
> >   @Vinod: this patch will cause a small conflict with my other
> >   patchset adding scatter-gather support to the axi-dmac driver.
> >   This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
> >   prototype of this function changed in my other patchset - it
> > would
> >   have to be passed the "chan" variable. I don't know how you
> > prefer it
> >   to be resolved. Worst case scenario (and if @Jonathan is okay
> > with
> >   that) this one patch can be re-sent later, but it would make this
> >   patchset less "atomic".
> >
> > - [5/8]:
> >   - Use dev_err() instead of pr_err()
> >   - Inline to_iio_dma_fence()
> >   - Add comment to explain why we unref twice when detaching dmabuf
> >   - Remove TODO comment. It is actually safe to free the file's
> >     private data even when transfers are still pending because it
> >     won't be accessed.
> >   - Fix documentation of new fields in struct
> > iio_buffer_access_funcs
> >   - iio_dma_resv_lock() does not need to be exported, make it
> > static
> >
> > - [7/8]:
> >   - Use the new dmaengine_prep_slave_dma_vec().
> >   - Restrict to input buffers, since output buffers are not yet
> >     supported by IIO buffers.
> >
> > - [8/8]:
> >   Use description lists for the documentation of the three new
> > IOCTLs
> >   instead of abusing subsections.
> >
> > ---
> > Alexandru Ardelean (1):
> >   iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> >
> > Paul Cercueil (7):
> >   iio: buffer-dma: Get rid of outgoing queue
> >   dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> >   dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
> >   iio: core: Add new DMABUF interface infrastructure
> >   iio: buffer-dma: Enable support for DMABUFs
> >   iio: buffer-dmaengine: Support new DMABUF based userspace API
> >   Documentation: iio: Document high-speed DMABUF based API
> >
> >  Documentation/iio/dmabuf_api.rst              |  54 +++
> >  Documentation/iio/index.rst                   |   2 +
> >  drivers/dma/dma-axi-dmac.c                    |  40 ++
> >  drivers/iio/buffer/industrialio-buffer-dma.c  | 242 ++++++++---
> >  .../buffer/industrialio-buffer-dmaengine.c    |  52 ++-
> >  drivers/iio/industrialio-buffer.c             | 402
> > ++++++++++++++++++
> >  include/linux/dmaengine.h                     |  25 ++
> >  include/linux/iio/buffer-dma.h                |  33 +-
> >  include/linux/iio/buffer_impl.h               |  26 ++
> >  include/uapi/linux/iio/buffer.h               |  22 +
> >  10 files changed, 836 insertions(+), 62 deletions(-)
> >  create mode 100644 Documentation/iio/dmabuf_api.rst
> >
>


2023-12-22 08:56:59

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v5 6/8] iio: buffer-dma: Enable support for DMABUFs

On Thu, 2023-12-21 at 16:04 +0000, Jonathan Cameron wrote:
> On Tue, 19 Dec 2023 18:50:07 +0100
> Paul Cercueil <[email protected]> wrote:
>
> > Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf()
> > and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO
> > DMA buffer implementations.
> >
> > Signed-off-by: Paul Cercueil <[email protected]>
> >
> Hi Paul,
>
> A few comments in here. Mostly places where the cleanup.h guard() stuff
> can simplify error paths.
>
> Overall this looks reasonable to me.
>
> Jonathan
>
> > ---
> > v3: Update code to provide the functions that will be used as callbacks
> >     for the new IOCTLs.
> > ---
> >  drivers/iio/buffer/industrialio-buffer-dma.c | 157 +++++++++++++++++--
> >  include/linux/iio/buffer-dma.h               |  26 +++
> >  2 files changed, 170 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c
> > b/drivers/iio/buffer/industrialio-buffer-dma.c
> > index 5610ba67925e..adb64f975064 100644
> > --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> > +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/poll.h>
> >  #include <linux/iio/buffer_impl.h>
> >  #include <linux/iio/buffer-dma.h>
> > +#include <linux/dma-buf.h>
> >  #include <linux/dma-mapping.h>
> >  #include <linux/sizes.h>
> >  
> > @@ -94,14 +95,24 @@ static void iio_buffer_block_release(struct kref *kref)
> >  {
> >         struct iio_dma_buffer_block *block = container_of(kref,
> >                 struct iio_dma_buffer_block, kref);
> > +       struct iio_dma_buffer_queue *queue = block->queue;
> >  
> > -       WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
> > +       WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);
> >  
> > -       dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
> > -                                       block->vaddr, block->phys_addr);
> > +       mutex_lock(&queue->lock);
> >  
> > -       iio_buffer_put(&block->queue->buffer);
> > +       if (block->fileio) {
> > +               dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> > +                                 block->vaddr, block->phys_addr);
> > +               queue->num_fileio_blocks--;
> > +       }
> > +
> > +       queue->num_blocks--;
> >         kfree(block);
> > +
> > +       mutex_unlock(&queue->lock);
> > +
> > +       iio_buffer_put(&queue->buffer);
> >  }
> >  
> >  static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
> > @@ -163,7 +174,7 @@ static struct iio_dma_buffer_queue
> > *iio_buffer_to_queue(struct iio_buffer *buf)
> >  }
> >  
> >  static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> > -       struct iio_dma_buffer_queue *queue, size_t size)
> > +       struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
> >  {
> >         struct iio_dma_buffer_block *block;
> >  
> > @@ -171,13 +182,16 @@ static struct iio_dma_buffer_block
> > *iio_dma_buffer_alloc_block(
> >         if (!block)
> >                 return NULL;
> >  
> > -       block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> > -               &block->phys_addr, GFP_KERNEL);
> > -       if (!block->vaddr) {
> > -               kfree(block);
> > -               return NULL;
> > +       if (fileio) {
> > +               block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> > +                                                 &block->phys_addr, GFP_KERNEL);
> > +               if (!block->vaddr) {
> > +                       kfree(block);
> > +                       return NULL;
> > +               }
> >         }
> >  
> > +       block->fileio = fileio;
> >         block->size = size;
> >         block->state = IIO_BLOCK_STATE_DONE;
> >         block->queue = queue;
> > @@ -186,6 +200,9 @@ static struct iio_dma_buffer_block
> > *iio_dma_buffer_alloc_block(
> >  
> >         iio_buffer_get(&queue->buffer);
> >  
> > +       queue->num_blocks++;
> > +       queue->num_fileio_blocks += fileio;
> Adding a boolean is non intuitive.
>
>         if (fileio)
>                 queue->num_fileio_blocks++;
>
> probably easier to read and compiler should be able to figure out how to
> optimise the condition away.
>
> > +
> >         return block;
> >  }
> >  
> > @@ -211,6 +228,9 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block
> > *block)
> >         _iio_dma_buffer_block_done(block);
> >         spin_unlock_irqrestore(&queue->list_lock, flags);
> >  
> > +       if (!block->fileio)
> > +               iio_buffer_signal_dmabuf_done(block->attach, 0);
> > +
> >         iio_buffer_block_put_atomic(block);
> >         wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> >  }
> > @@ -237,10 +257,14 @@ void iio_dma_buffer_block_list_abort(struct
> > iio_dma_buffer_queue *queue,
> >                 list_del(&block->head);
> >                 block->bytes_used = 0;
> >                 _iio_dma_buffer_block_done(block);
> > +
> > +               if (!block->fileio)
> > +                       iio_buffer_signal_dmabuf_done(block->attach, -EINTR);
> >                 iio_buffer_block_put_atomic(block);
> >         }
> >         spin_unlock_irqrestore(&queue->list_lock, flags);
> >  
> > +       queue->fileio.enabled = false;
>
> While this obviously doesn't need to be conditional if it can already be false
> it might be easier to follow the code flow it if didn't check if we were doing
> fileio or not before disabling it.
>
> >         wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> >  }
> >  EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
> > @@ -261,6 +285,12 @@ static bool iio_dma_block_reusable(struct
> > iio_dma_buffer_block *block)
> >         }
> >  }
> >  
> > +static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue)
> > +{
> > +       return queue->fileio.enabled ||
> > +               queue->num_blocks == queue->num_fileio_blocks;
> This is a little odd. So would be good have a comment on why this condition.
> Or rename the function to imply it's checking if enabled, or can be enabled.
>
> At first glanced I expected a function with this name to just be an accessor
> function. e.g.
>         return queue->fileio.enabled;
>
> > +}
> > +
> >  /**
> >   * iio_dma_buffer_request_update() - DMA buffer request_update callback
> >   * @buffer: The buffer which to request an update
> > @@ -287,6 +317,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> >  
> >         mutex_lock(&queue->lock);
> >  
> > +       queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
> > +
> > +       /* If DMABUFs were created, disable fileio interface */
> > +       if (!queue->fileio.enabled)
> > +               goto out_unlock;
> > +
> >         /* Allocations are page aligned */
> >         if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size))
> >                 try_reuse = true;
> > @@ -317,7 +353,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> >                         block = queue->fileio.blocks[i];
> >                         if (block->state == IIO_BLOCK_STATE_DEAD) {
> >                                 /* Could not reuse it */
> > -                               iio_buffer_block_put(block);
> > +                               iio_buffer_block_put_atomic(block);
> >                                 block = NULL;
> >                         } else {
> >                                 block->size = size;
> > @@ -327,7 +363,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> >                 }
> >  
> >                 if (!block) {
> > -                       block = iio_dma_buffer_alloc_block(queue, size);
> > +                       block = iio_dma_buffer_alloc_block(queue, size, true);
> >                         if (!block) {
> >                                 ret = -ENOMEM;
> >                                 goto out_unlock;
> > @@ -363,7 +399,7 @@ static void iio_dma_buffer_fileio_free(struct
> > iio_dma_buffer_queue *queue)
> >         for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> >                 if (!queue->fileio.blocks[i])
> >                         continue;
> > -               iio_buffer_block_put(queue->fileio.blocks[i]);
> > +               iio_buffer_block_put_atomic(queue->fileio.blocks[i]);
>
> For these cases that are atomic or not, it might be worth calling out why they need
> to be
> atomic.
>
> >                 queue->fileio.blocks[i] = NULL;
> >         }
> >         queue->fileio.active_block = NULL;
> > @@ -384,8 +420,12 @@ static void iio_dma_buffer_submit_block(struct
> > iio_dma_buffer_queue *queue,
> >  
> >         block->state = IIO_BLOCK_STATE_ACTIVE;
> >         iio_buffer_block_get(block);
> > +
> >         ret = queue->ops->submit(queue, block);
> >         if (ret) {
> > +               if (!block->fileio)
> > +                       iio_buffer_signal_dmabuf_done(block->attach, ret);
> > +
> >                 /*
> >                  * This is a bit of a problem and there is not much we can do
> >                  * other then wait for the buffer to be disabled and re-enabled
> > @@ -588,6 +628,97 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
> >  }
> >  EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available);
> >  
> > +struct iio_dma_buffer_block *
> > +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer,
> > +                            struct dma_buf_attachment *attach)
> > +{
> > +       struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> > +       struct iio_dma_buffer_block *block;
> > +       int err;
> > +
> > +       mutex_lock(&queue->lock);
>
>         guard(mutex)(&queue->lock);

I'm also a big fan of this new stuff but shouldn't we have a prep patch making the
transition to that? Otherwise we'll end up with a mix of styles. I'm happy to clean
up stuff with follow up patches (even some coding style could be improved IIRC) but
typically that's not how we handle things like this I believe...

- Nuno Sá
>

2023-12-22 08:58:51

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] iio: buffer-dmaengine: Support new DMABUF based userspace API

On Thu, 2023-12-21 at 18:30 +0100, Paul Cercueil wrote:
> Hi Jonathan,
>
> Le jeudi 21 décembre 2023 à 16:12 +0000, Jonathan Cameron a écrit :
> > On Tue, 19 Dec 2023 18:50:08 +0100
> > Paul Cercueil <[email protected]> wrote:
> >
> > > Use the functions provided by the buffer-dma core to implement the
> > > DMABUF userspace API in the buffer-dmaengine IIO buffer
> > > implementation.
> > >
> > > Since we want to be able to transfer an arbitrary number of bytes
> > > and
> > > not necesarily the full DMABUF, the associated scatterlist is
> > > converted
> > > to an array of DMA addresses + lengths, which is then passed to
> > > dmaengine_prep_slave_dma_array().
> > >
> > > Signed-off-by: Paul Cercueil <[email protected]>
> > One question inline. Otherwise looks fine to me.
> >
> > J
> > >
> > > ---
> > > v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the
> > > code to
> > >     work with the new functions introduced in industrialio-buffer-
> > > dma.c.
> > >
> > > v5: - Use the new dmaengine_prep_slave_dma_vec().
> > >     - Restrict to input buffers, since output buffers are not yet
> > >       supported by IIO buffers.
> > > ---
> > >  .../buffer/industrialio-buffer-dmaengine.c    | 52
> > > ++++++++++++++++---
> > >  1 file changed, 46 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > index 5f85ba38e6f6..825d76a24a67 100644
> > > --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > @@ -64,15 +64,51 @@ static int
> > > iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue
> > > *queue,
> > >         struct dmaengine_buffer *dmaengine_buffer =
> > >                 iio_buffer_to_dmaengine_buffer(&queue->buffer);
> > >         struct dma_async_tx_descriptor *desc;
> > > +       unsigned int i, nents;
> > > +       struct scatterlist *sgl;
> > > +       struct dma_vec *vecs;
> > > +       size_t max_size;
> > >         dma_cookie_t cookie;
> > > +       size_t len_total;
> > >  
> > > -       block->bytes_used = min(block->size, dmaengine_buffer-
> > > > max_size);
> > > -       block->bytes_used = round_down(block->bytes_used,
> > > -                       dmaengine_buffer->align);
> > > +       if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
> > > +               /* We do not yet support output buffers. */
> > > +               return -EINVAL;
> > > +       }
> > >  
> > > -       desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> > > -               block->phys_addr, block->bytes_used,
> > > DMA_DEV_TO_MEM,
> > > -               DMA_PREP_INTERRUPT);
> > > +       if (block->sg_table) {
> > > +               sgl = block->sg_table->sgl;
> > > +               nents = sg_nents_for_len(sgl, block->bytes_used);
> >
> > Are we guaranteed the length in the sglist is enough?  If not this
> > can return an error code.
>
> The length of the sglist will always be enough, the
> iio_buffer_enqueue_dmabuf() function already checks that block-
> > bytes_used is equal or smaller than the size of the DMABUF.
>
> It is quite a few functions above in the call stack though, so I can
> handle the errors of sg_nents_for_len() here if you think makes sense.

Maybe putting something like the above in a comment?

- Nuno Sá



2023-12-26 15:30:56

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 6/8] iio: buffer-dma: Enable support for DMABUFs


> > > +struct iio_dma_buffer_block *
> > > +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer,
> > > +                            struct dma_buf_attachment *attach)
> > > +{
> > > +       struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> > > +       struct iio_dma_buffer_block *block;
> > > +       int err;
> > > +
> > > +       mutex_lock(&queue->lock);
> >
> >         guard(mutex)(&queue->lock);
>
> I'm also a big fan of this new stuff but shouldn't we have a prep patch making the
> transition to that? Otherwise we'll end up with a mix of styles. I'm happy to clean
> up stuff with follow up patches (even some coding style could be improved IIRC) but
> typically that's not how we handle things like this I believe...

Ideally yes, I think a prep patch would make sense - but I'm not that fussed
about it and follow up patches would be fine here.

There are cases for using this that are much easier to justify than
others, so I don't really mind if simple

mutex_lock();
do_something
mutex_unlock();

cases are left for ever not using guard(), though I also don't mind if people want to use
scoped_guard() or guard for these just to be consistent. Either way is pretty
easy to read. We'll probably also continue to gain new uses of this logic such
as the conditional guard stuff that is queued up for the next merge window and
whilst that is going on we are going to have a bit of mixed style.

Jonathan


>
> - Nuno Sá
> >
>


2023-12-26 15:31:53

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] iio: buffer-dmaengine: Support new DMABUF based userspace API

On Fri, 22 Dec 2023 09:58:29 +0100
Nuno Sá <[email protected]> wrote:

> On Thu, 2023-12-21 at 18:30 +0100, Paul Cercueil wrote:
> > Hi Jonathan,
> >
> > Le jeudi 21 décembre 2023 à 16:12 +0000, Jonathan Cameron a écrit :
> > > On Tue, 19 Dec 2023 18:50:08 +0100
> > > Paul Cercueil <[email protected]> wrote:
> > >
> > > > Use the functions provided by the buffer-dma core to implement the
> > > > DMABUF userspace API in the buffer-dmaengine IIO buffer
> > > > implementation.
> > > >
> > > > Since we want to be able to transfer an arbitrary number of bytes
> > > > and
> > > > not necesarily the full DMABUF, the associated scatterlist is
> > > > converted
> > > > to an array of DMA addresses + lengths, which is then passed to
> > > > dmaengine_prep_slave_dma_array().
> > > >
> > > > Signed-off-by: Paul Cercueil <[email protected]>
> > > One question inline. Otherwise looks fine to me.
> > >
> > > J
> > > >
> > > > ---
> > > > v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the
> > > > code to
> > > >     work with the new functions introduced in industrialio-buffer-
> > > > dma.c.
> > > >
> > > > v5: - Use the new dmaengine_prep_slave_dma_vec().
> > > >     - Restrict to input buffers, since output buffers are not yet
> > > >       supported by IIO buffers.
> > > > ---
> > > >  .../buffer/industrialio-buffer-dmaengine.c    | 52
> > > > ++++++++++++++++---
> > > >  1 file changed, 46 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > > b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > > index 5f85ba38e6f6..825d76a24a67 100644
> > > > --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > > +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> > > > @@ -64,15 +64,51 @@ static int
> > > > iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue
> > > > *queue,
> > > >         struct dmaengine_buffer *dmaengine_buffer =
> > > >                 iio_buffer_to_dmaengine_buffer(&queue->buffer);
> > > >         struct dma_async_tx_descriptor *desc;
> > > > +       unsigned int i, nents;
> > > > +       struct scatterlist *sgl;
> > > > +       struct dma_vec *vecs;
> > > > +       size_t max_size;
> > > >         dma_cookie_t cookie;
> > > > +       size_t len_total;
> > > >  
> > > > -       block->bytes_used = min(block->size, dmaengine_buffer-
> > > > > max_size);
> > > > -       block->bytes_used = round_down(block->bytes_used,
> > > > -                       dmaengine_buffer->align);
> > > > +       if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
> > > > +               /* We do not yet support output buffers. */
> > > > +               return -EINVAL;
> > > > +       }
> > > >  
> > > > -       desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> > > > -               block->phys_addr, block->bytes_used,
> > > > DMA_DEV_TO_MEM,
> > > > -               DMA_PREP_INTERRUPT);
> > > > +       if (block->sg_table) {
> > > > +               sgl = block->sg_table->sgl;
> > > > +               nents = sg_nents_for_len(sgl, block->bytes_used);
> > >
> > > Are we guaranteed the length in the sglist is enough?  If not this
> > > can return an error code.
> >
> > The length of the sglist will always be enough, the
> > iio_buffer_enqueue_dmabuf() function already checks that block-
> > > bytes_used is equal or smaller than the size of the DMABUF.
> >
> > It is quite a few functions above in the call stack though, so I can
> > handle the errors of sg_nents_for_len() here if you think makes sense.
>
> Maybe putting something like the above in a comment?
Either comment, or an explicit check so we don't need the comment is
fine as far as I'm concerned.

Jonathan

>
> - Nuno Sá
>
>


2023-12-26 15:38:16

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

On Thu, 21 Dec 2023 18:56:52 +0100
Paul Cercueil <[email protected]> wrote:

> Hi Jonathan,
>
> Le jeudi 21 décembre 2023 à 16:30 +0000, Jonathan Cameron a écrit :
> > On Tue, 19 Dec 2023 18:50:01 +0100
> > Paul Cercueil <[email protected]> wrote:
> >
> > > [V4 was: "iio: Add buffer write() support"][1]
> > >
> > > Hi Jonathan,
> > >
> > Hi Paul,
> >
> > > This is a respin of the V3 of my patchset that introduced a new
> > > interface based on DMABUF objects [2].
> >
> > Great to see this moving forwards.
> >
> > >
> > > The V4 was a split of the patchset, to attempt to upstream buffer
> > > write() support first. But since there is no current user upstream,
> > > it
> > > was not merged. This V5 is about doing the opposite, and contains
> > > the
> > > new DMABUF interface, without adding the buffer write() support. It
> > > can
> > > already be used with the upstream adi-axi-adc driver.
> >
> > Seems like a sensible path in the short term.
> >
> > >
> > > In user-space, Libiio uses it to transfer back and forth blocks of
> > > samples between the hardware and the applications, without having
> > > to
> > > copy the data.
> > >
> > > On a ZCU102 with a FMComms3 daughter board, running Libiio from the
> > > pcercuei/dev-new-dmabuf-api branch [3], compiled with
> > > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> > >   sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> > >   Throughput: 116 MiB/s
> > >
> > > Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> > >   sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> > >   Throughput: 475 MiB/s
> > >
> > > This benchmark only measures the speed at which the data can be
> > > fetched
> > > to iio_rwdev's internal buffers, and does not actually try to read
> > > the
> > > data (e.g. to pipe it to stdout). It shows that fetching the data
> > > is
> > > more than 4x faster using the new interface.
> > >
> > > When actually reading the data, the performance difference isn't
> > > that
> > > impressive (maybe because in case of DMABUF the data is not in
> > > cache):
> >
> > This needs a bit more investigation ideally. Perhaps perf counters
> > can be
> > used to establish that cache misses are the main different between
> > dropping it on the floor and actually reading the data.
>
> Yes, we'll work on it. The other big difference is that fileio uses
> dma_alloc_coherent() while the DMABUFs use non-coherent mappings. I
> guess coherent memory is faster for the typical access pattern (which
> is "read/write everything sequentially once").

Long time since I last worked much with a platform that wasn't always
IO coherent, so I've forgotten how all this works (all ends up as no-ops
on platforms I tend to use these days!) Good luck, I'll be interested
to see what this turns out to be.

>
> > >
> > > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> > >   sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > > status=progress
> > >   2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
> > >
> > > WITH_LOCAL_DMABUF_API=ON:
> > >   sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > > status=progress
> > >   2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
> > >
> > > One interesting thing to note is that fileio is (currently)
> > > actually
> > > faster than the DMABUF interface if you increase a lot the buffer
> > > size.
> > > My explanation is that the cache invalidation routine takes more
> > > and
> > > more time the bigger the DMABUF gets. This is because the DMABUF is
> > > backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by
> > > up
> > > to 16 thousands pages, that have to be invalidated one by one. This
> > > can
> > > be addressed by using huge pages, but the udmabuf driver does not
> > > (yet)
> > > support creating DMABUFs backed by huge pages.
> >
> > I'd imagine folios of reasonable size will help sort of a huge page
> > as then hopefully it will use the flush by va range instructions if
> > available.
> >
> > >
> > > Anyway, the real benefits happen when the DMABUFs are either shared
> > > between IIO devices, or between the IIO subsystem and another
> > > filesystem. In that case, the DMABUFs are simply passed around
> > > drivers,
> > > without the data being copied at any moment.
> > >
> > > We use that feature to transfer samples from our transceivers to
> > > USB,
> > > using a DMABUF interface to FunctionFS [4].
> > >
> > > This drastically increases the throughput, to about 274 MiB/s over
> > > a
> > > USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to
> > > the
> > > FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load
> > > avg.).
> >
> > This is a nice example.  Where are you with getting the patch merged?
>
> I'll send a new version (mostly a [RESEND]...) in the coming days. As
> you can see from the review on my last attempt, the main blocker is
> that nobody wants to merge a new interface if the rest of the kernel
> bits aren't upstream yet. Kind of a chicken-and-egg problem :)
>
> > Overall, this code looks fine to me, though there are some parts that
> > need review by other maintainers (e.g. Vinod for the dmaengine
> > callback)
> > and I'd like a 'looks fine' at least form those who know a lot more
> > about dmabuf than I do.
> >
> > To actually make this useful sounds like either udmabuf needs some
> > perf
> > improvements, or there has to be an upstream case of sharing it
> > without
> > something else (e.g your functionfs patches).  So what do we need to
> > get in before the positive benefit becomes worth carrying this extra
> > complexity? (which isn't too bad so I'm fine with a small benefit and
> > promises of riches :)
>
> I think the FunctionFS DMABUF interface can be pushed as well for 5.9,
> in parallel of this one, as the feedback on the V1 was good. I might
> just need some help pushing it forward (kind of a "I merge it if you
> merge it" guarantee).

Ok. If we get a 'fine by us' from DMABUF folk I'd be happy to make
that commitment for the IIO parts.

Jonathan

>
> Cheers,
> -Paul
>
> >
> > Jonathan
> >
> > >
> > > Based on linux-next/next-20231219.
> > >
> > > Cheers,
> > > -Paul
> > >
> > > [1]
> > > https://lore.kernel.org/all/[email protected]/
> > > [2]
> > > https://lore.kernel.org/all/[email protected]/
> > > [3]
> > > https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> > > [4]
> > > https://lore.kernel.org/all/[email protected]/
> > >
> > > ---
> > > Changelog:
> > > - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
> > >   dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec'
> > > struct.
> > >   Note that at some point we will need to support cyclic transfers
> > >   using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> > >   parameter to the function?
> > >
> > > - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> > >   .device_prep_slave_dma_array().
> > >
> > >   @Vinod: this patch will cause a small conflict with my other
> > >   patchset adding scatter-gather support to the axi-dmac driver.
> > >   This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
> > >   prototype of this function changed in my other patchset - it
> > > would
> > >   have to be passed the "chan" variable. I don't know how you
> > > prefer it
> > >   to be resolved. Worst case scenario (and if @Jonathan is okay
> > > with
> > >   that) this one patch can be re-sent later, but it would make this
> > >   patchset less "atomic".
> > >
> > > - [5/8]:
> > >   - Use dev_err() instead of pr_err()
> > >   - Inline to_iio_dma_fence()
> > >   - Add comment to explain why we unref twice when detaching dmabuf
> > >   - Remove TODO comment. It is actually safe to free the file's
> > >     private data even when transfers are still pending because it
> > >     won't be accessed.
> > >   - Fix documentation of new fields in struct
> > > iio_buffer_access_funcs
> > >   - iio_dma_resv_lock() does not need to be exported, make it
> > > static
> > >
> > > - [7/8]:
> > >   - Use the new dmaengine_prep_slave_dma_vec().
> > >   - Restrict to input buffers, since output buffers are not yet
> > >     supported by IIO buffers.
> > >
> > > - [8/8]:
> > >   Use description lists for the documentation of the three new
> > > IOCTLs
> > >   instead of abusing subsections.
> > >
> > > ---
> > > Alexandru Ardelean (1):
> > >   iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> > >
> > > Paul Cercueil (7):
> > >   iio: buffer-dma: Get rid of outgoing queue
> > >   dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> > >   dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
> > >   iio: core: Add new DMABUF interface infrastructure
> > >   iio: buffer-dma: Enable support for DMABUFs
> > >   iio: buffer-dmaengine: Support new DMABUF based userspace API
> > >   Documentation: iio: Document high-speed DMABUF based API
> > >
> > >  Documentation/iio/dmabuf_api.rst              |  54 +++
> > >  Documentation/iio/index.rst                   |   2 +
> > >  drivers/dma/dma-axi-dmac.c                    |  40 ++
> > >  drivers/iio/buffer/industrialio-buffer-dma.c  | 242 ++++++++---
> > >  .../buffer/industrialio-buffer-dmaengine.c    |  52 ++-
> > >  drivers/iio/industrialio-buffer.c             | 402
> > > ++++++++++++++++++
> > >  include/linux/dmaengine.h                     |  25 ++
> > >  include/linux/iio/buffer-dma.h                |  33 +-
> > >  include/linux/iio/buffer_impl.h               |  26 ++
> > >  include/uapi/linux/iio/buffer.h               |  22 +
> > >  10 files changed, 836 insertions(+), 62 deletions(-)
> > >  create mode 100644 Documentation/iio/dmabuf_api.rst
> > >
> >
>
>


2024-01-08 12:21:04

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 3/8] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

Hi Vinod,

Le jeudi 21 décembre 2023 à 20:44 +0530, Vinod Koul a écrit :
> On 19-12-23, 18:50, Paul Cercueil wrote:
> > This function can be used to initiate a scatter-gather DMA
> > transfer,
> > where the address and size of each segment is located in one entry
> > of
> > the dma_vec array.
> >
> > The major difference with dmaengine_prep_slave_sg() is that it
> > supports
> > specifying the lengths of each DMA transfer; as trying to override
> > the
> > length of the transfer with dmaengine_prep_slave_sg() is a very
> > tedious
> > process. The introduction of a new API function is also justified
> > by the
> > fact that scatterlists are on their way out.
> >
> > Note that dmaengine_prep_interleaved_dma() is not helpful either in
> > that
> > case, as it assumes that the address of each segment will be higher
> > than
> > the one of the previous segment, which we just cannot guarantee in
> > case
> > of a scatter-gather transfer.
> >
> > Signed-off-by: Paul Cercueil <[email protected]>
> >
> > ---
> > v3: New patch
> >
> > v5: Replace with function dmaengine_prep_slave_dma_vec(), and
> > struct
> >     'dma_vec'.
> >     Note that at some point we will need to support cyclic
> > transfers
> >     using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> >     parameter to the function?
> > ---
> >  include/linux/dmaengine.h | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index 3df70d6131c8..ee5931ddb42f 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -160,6 +160,16 @@ struct dma_interleaved_template {
> >   struct data_chunk sgl[];
> >  };
> >  
> > +/**
> > + * struct dma_vec - DMA vector
> > + * @addr: Bus address of the start of the vector
> > + * @len: Length in bytes of the DMA vector
> > + */
> > +struct dma_vec {
> > + dma_addr_t addr;
> > + size_t len;
> > +};

I don't want to be pushy, but I'd like to know how to solve this now,
otherwise I'll just send the same patches for my v6.

> so you want to transfer multiple buffers, right? why not use
> dmaengine_prep_slave_sg(). If there is reason for not using that one?

The reason is that we want to have the possibility to transfer less
than the total size of the scatterlist, and that's currently very hard
to do - scatterlists were designed to not be tampered with.

Christian König then suggested to introduce a "dma_vec" which had been
on his TODO list for a while now.

> Furthermore I missed replying to your email earlier on use of
> dmaengine_prep_interleaved_dma(), my apologies.
> That can be made to work for you as well. Please see the notes where
> icg
> can be ignored and it does not need icg value to be set
>
> Infact, interleaved api can be made to work in most of these cases I
> can
> think of...

Interleaved API only supports incrementing addresses, I see no way to
decrement the address (without using crude hacks e.g. overflowing
size_t). I can't guarantee that my DMABUF's pages are ordered in
memory.

Cheers,
-Paul

> > +
> >  /**
> >   * enum dma_ctrl_flags - DMA flags to augment operation
> > preparation,
> >   *  control completion, and communicate status.
> > @@ -910,6 +920,10 @@ struct dma_device {
> >   struct dma_async_tx_descriptor
> > *(*device_prep_dma_interrupt)(
> >   struct dma_chan *chan, unsigned long flags);
> >  
> > + struct dma_async_tx_descriptor
> > *(*device_prep_slave_dma_vec)(
> > + struct dma_chan *chan, const struct dma_vec *vecs,
> > + size_t nents, enum dma_transfer_direction
> > direction,
> > + unsigned long flags);
> >   struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
> >   struct dma_chan *chan, struct scatterlist *sgl,
> >   unsigned int sg_len, enum dma_transfer_direction
> > direction,
> > @@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor
> > *dmaengine_prep_slave_single(
> >     dir, flags,
> > NULL);
> >  }
> >  
> > +static inline struct dma_async_tx_descriptor
> > *dmaengine_prep_slave_dma_vec(
> > + struct dma_chan *chan, const struct dma_vec *vecs, size_t
> > nents,
> > + enum dma_transfer_direction dir, unsigned long flags)
> > +{
> > + if (!chan || !chan->device || !chan->device-
> > >device_prep_slave_dma_vec)
> > + return NULL;
> > +
> > + return chan->device->device_prep_slave_dma_vec(chan, vecs,
> > nents,
> > +        dir,
> > flags);
> > +}
> > +
> >  static inline struct dma_async_tx_descriptor
> > *dmaengine_prep_slave_sg(
> >   struct dma_chan *chan, struct scatterlist
> > *sgl, unsigned int sg_len,
> >   enum dma_transfer_direction dir, unsigned long flags)
> > --
> > 2.43.0
>


2024-01-08 13:21:20

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

On Tue, Dec 19, 2023 at 06:50:06PM +0100, Paul Cercueil wrote:
> Add the necessary infrastructure to the IIO core to support a new
> optional DMABUF based interface.
>
> With this new interface, DMABUF objects (externally created) can be
> attached to a IIO buffer, and subsequently used for data transfer.
>
> A userspace application can then use this interface to share DMABUF
> objects between several interfaces, allowing it to transfer data in a
> zero-copy fashion, for instance between IIO and the USB stack.
>
> The userspace application can also memory-map the DMABUF objects, and
> access the sample data directly. The advantage of doing this vs. the
> read() interface is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.
>
> As part of the interface, 3 new IOCTLs have been added:
>
> IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
> Attach the DMABUF object identified by the given file descriptor to the
> buffer.
>
> IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
> Detach the DMABUF object identified by the given file descriptor from
> the buffer. Note that closing the IIO buffer's file descriptor will
> automatically detach all previously attached DMABUF objects.
>
> IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
> Request a data transfer to/from the given DMABUF object. Its file
> descriptor, as well as the transfer size and flags are provided in the
> "iio_dmabuf" structure.
>
> These three IOCTLs have to be performed on the IIO buffer's file
> descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
>
> Signed-off-by: Paul Cercueil <[email protected]>
>
> ---
> v2: Only allow the new IOCTLs on the buffer FD created with
> IIO_BUFFER_GET_FD_IOCTL().
>
> v3: - Get rid of the old IOCTLs. The IIO subsystem does not create or
> manage DMABUFs anymore, and only attaches/detaches externally
> created DMABUFs.
> - Add IIO_BUFFER_DMABUF_CYCLIC to the supported flags.
>
> v5: - Use dev_err() instead of pr_err()
> - Inline to_iio_dma_fence()
> - Add comment to explain why we unref twice when detaching dmabuf
> - Remove TODO comment. It is actually safe to free the file's
> private data even when transfers are still pending because it
> won't be accessed.
> - Fix documentation of new fields in struct iio_buffer_access_funcs
> - iio_dma_resv_lock() does not need to be exported, make it static
> ---
> drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++++++++++++++
> include/linux/iio/buffer_impl.h | 26 ++
> include/uapi/linux/iio/buffer.h | 22 ++
> 3 files changed, 450 insertions(+)
>
> diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
> index 09c41e9ccf87..24c040e073a7 100644
> --- a/drivers/iio/industrialio-buffer.c
> +++ b/drivers/iio/industrialio-buffer.c
> @@ -13,10 +13,14 @@
> #include <linux/kernel.h>
> #include <linux/export.h>
> #include <linux/device.h>
> +#include <linux/dma-buf.h>
> +#include <linux/dma-fence.h>
> +#include <linux/dma-resv.h>
> #include <linux/file.h>
> #include <linux/fs.h>
> #include <linux/cdev.h>
> #include <linux/slab.h>
> +#include <linux/mm.h>
> #include <linux/poll.h>
> #include <linux/sched/signal.h>
>
> @@ -28,6 +32,31 @@
> #include <linux/iio/buffer.h>
> #include <linux/iio/buffer_impl.h>
>
> +#define DMABUF_ENQUEUE_TIMEOUT_MS 5000
> +
> +struct iio_dma_fence;
> +
> +struct iio_dmabuf_priv {
> + struct list_head entry;
> + struct kref ref;
> +
> + struct iio_buffer *buffer;
> + struct iio_dma_buffer_block *block;
> +
> + u64 context;
> + spinlock_t lock;
> +
> + struct dma_buf_attachment *attach;
> + struct iio_dma_fence *fence;
> +};
> +
> +struct iio_dma_fence {
> + struct dma_fence base;
> + struct iio_dmabuf_priv *priv;
> + struct sg_table *sgt;
> + enum dma_data_direction dir;
> +};
> +
> static const char * const iio_endian_prefix[] = {
> [IIO_BE] = "be",
> [IIO_LE] = "le",
> @@ -332,6 +361,7 @@ void iio_buffer_init(struct iio_buffer *buffer)
> {
> INIT_LIST_HEAD(&buffer->demux_list);
> INIT_LIST_HEAD(&buffer->buffer_list);
> + INIT_LIST_HEAD(&buffer->dmabufs);
> init_waitqueue_head(&buffer->pollq);
> kref_init(&buffer->ref);
> if (!buffer->watermark)
> @@ -1519,14 +1549,54 @@ static void iio_buffer_unregister_legacy_sysfs_groups(struct iio_dev *indio_dev)
> kfree(iio_dev_opaque->legacy_scan_el_group.attrs);
> }
>
> +static void iio_buffer_dmabuf_release(struct kref *ref)
> +{
> + struct iio_dmabuf_priv *priv = container_of(ref, struct iio_dmabuf_priv, ref);
> + struct dma_buf_attachment *attach = priv->attach;
> + struct iio_buffer *buffer = priv->buffer;
> + struct dma_buf *dmabuf = attach->dmabuf;
> +
> + buffer->access->detach_dmabuf(buffer, priv->block);
> +
> + dma_buf_detach(attach->dmabuf, attach);
> + dma_buf_put(dmabuf);
> + kfree(priv);
> +}
> +
> +void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach)
> +{
> + struct iio_dmabuf_priv *priv = attach->importer_priv;
> +
> + kref_get(&priv->ref);
> +}
> +EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_get);
> +
> +void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach)
> +{
> + struct iio_dmabuf_priv *priv = attach->importer_priv;
> +
> + kref_put(&priv->ref, iio_buffer_dmabuf_release);
> +}
> +EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_put);
> +
> static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep)
> {
> struct iio_dev_buffer_pair *ib = filep->private_data;
> struct iio_dev *indio_dev = ib->indio_dev;
> struct iio_buffer *buffer = ib->buffer;
> + struct iio_dmabuf_priv *priv, *tmp;
>
> wake_up(&buffer->pollq);
>
> + /* Close all attached DMABUFs */
> + list_for_each_entry_safe(priv, tmp, &buffer->dmabufs, entry) {
> + list_del_init(&priv->entry);
> + iio_buffer_dmabuf_put(priv->attach);
> + }
> +
> + if (!list_empty(&buffer->dmabufs))
> + dev_warn(&indio_dev->dev, "Buffer FD closed with active transfers\n");
> +
> kfree(ib);
> clear_bit(IIO_BUSY_BIT_POS, &buffer->flags);
> iio_device_put(indio_dev);
> @@ -1534,11 +1604,343 @@ static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep)
> return 0;
> }
>
> +static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock)
> +{
> + int ret;
> +
> + ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
> + if (ret) {
> + if (ret != -EDEADLK)
> + goto out;
> + if (nonblock) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + ret = dma_resv_lock_slow_interruptible(dmabuf->resv, NULL);

This is overkill, without a reservation context you never get -EDEADLK and
so never have to go into the slowpath locking mode. You can check this
with the CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y build option.

> + }
> +
> +out:
> + return ret;
> +}
> +
> +static struct dma_buf_attachment *
> +iio_buffer_find_attachment(struct iio_dev *indio_dev, struct dma_buf *dmabuf)
> +{
> + struct dma_buf_attachment *elm, *attach = NULL;
> + int ret;
> +
> + ret = iio_dma_resv_lock(dmabuf, false);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + list_for_each_entry(elm, &dmabuf->attachments, node) {
> + if (elm->dev == indio_dev->dev.parent) {
> + attach = elm;
> + break;
> + }
> + }
> +
> + if (attach)
> + iio_buffer_dmabuf_get(elm);

Same comment as on your usb gagdet support: This must be a
kref_get_unless_zero, and I'd really prefer if you use your own
list+locking instead of digging around in dma-buf internals in a
lifetime-relevant way.

> +
> + dma_resv_unlock(dmabuf->resv);
> +
> + return attach ?: ERR_PTR(-EPERM);
> +}
> +
> +static int iio_buffer_attach_dmabuf(struct iio_dev_buffer_pair *ib,
> + int __user *user_fd)
> +{
> + struct iio_dev *indio_dev = ib->indio_dev;
> + struct iio_buffer *buffer = ib->buffer;
> + struct dma_buf_attachment *attach;
> + struct iio_dmabuf_priv *priv;
> + struct dma_buf *dmabuf;
> + int err, fd;
> +
> + if (!buffer->access->attach_dmabuf
> + || !buffer->access->detach_dmabuf
> + || !buffer->access->enqueue_dmabuf)
> + return -EPERM;
> +
> + if (copy_from_user(&fd, user_fd, sizeof(fd)))
> + return -EFAULT;
> +
> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + spin_lock_init(&priv->lock);
> + priv->context = dma_fence_context_alloc(1);
> +
> + dmabuf = dma_buf_get(fd);
> + if (IS_ERR(dmabuf)) {
> + err = PTR_ERR(dmabuf);
> + goto err_free_priv;
> + }
> +
> + attach = dma_buf_attach(dmabuf, indio_dev->dev.parent);
> + if (IS_ERR(attach)) {
> + err = PTR_ERR(attach);
> + goto err_dmabuf_put;
> + }
> +
> + kref_init(&priv->ref);
> + priv->buffer = buffer;
> + priv->attach = attach;
> + attach->importer_priv = priv;
> +
> + priv->block = buffer->access->attach_dmabuf(buffer, attach);
> + if (IS_ERR(priv->block)) {
> + err = PTR_ERR(priv->block);
> + goto err_dmabuf_detach;
> + }
> +
> + list_add(&priv->entry, &buffer->dmabufs);

This list seems to have no locking. And I think you want to tie the attach
refcount 1:1 to this list, to make sure userspace can't double-detach and
hence underrun any refcount here. Would also address my concern with your
find_attachment() function.

> +
> + return 0;
> +
> +err_dmabuf_detach:
> + dma_buf_detach(dmabuf, attach);
> +err_dmabuf_put:
> + dma_buf_put(dmabuf);
> +err_free_priv:
> + kfree(priv);
> +
> + return err;
> +}
> +
> +static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req)
> +{
> + struct dma_buf_attachment *attach;
> + struct iio_dmabuf_priv *priv;
> + struct dma_buf *dmabuf;
> + int dmabuf_fd, ret = 0;
> +
> + if (copy_from_user(&dmabuf_fd, user_req, sizeof(dmabuf_fd)))
> + return -EFAULT;
> +
> + dmabuf = dma_buf_get(dmabuf_fd);
> + if (IS_ERR(dmabuf))
> + return PTR_ERR(dmabuf);
> +
> + attach = iio_buffer_find_attachment(ib->indio_dev, dmabuf);
> + if (IS_ERR(attach)) {
> + ret = PTR_ERR(attach);
> + goto out_dmabuf_put;
> + }
> +
> + priv = attach->importer_priv;
> + list_del_init(&priv->entry);
> +
> + /*
> + * Unref twice to release the reference obtained with
> + * iio_buffer_find_attachment() above, and the one obtained in
> + * iio_buffer_attach_dmabuf().
> + */

Again like in the usb gagdet code, this looks like it's exploitable to
provoke a refcount underflow by userspace.

> + iio_buffer_dmabuf_put(attach);
> + iio_buffer_dmabuf_put(attach);
> +
> +out_dmabuf_put:
> + dma_buf_put(dmabuf);
> +
> + return ret;
> +}
> +
> +static const char *
> +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
> +{
> + return "iio";
> +}
> +
> +static void iio_buffer_dma_fence_release(struct dma_fence *fence)
> +{
> + struct iio_dma_fence *iio_fence =
> + container_of(fence, struct iio_dma_fence, base);
> +
> + kfree(iio_fence);
> +}
> +
> +static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
> + .get_driver_name = iio_buffer_dma_fence_get_driver_name,
> + .get_timeline_name = iio_buffer_dma_fence_get_driver_name,
> + .release = iio_buffer_dma_fence_release,
> +};
> +
> +static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
> + struct iio_dmabuf __user *iio_dmabuf_req,
> + bool nonblock)
> +{
> + struct iio_dev *indio_dev = ib->indio_dev;
> + struct iio_buffer *buffer = ib->buffer;
> + struct iio_dmabuf iio_dmabuf;
> + struct dma_buf_attachment *attach;
> + struct iio_dmabuf_priv *priv;
> + enum dma_data_direction dir;
> + struct iio_dma_fence *fence;
> + struct dma_buf *dmabuf;
> + struct sg_table *sgt;
> + unsigned long timeout;
> + bool dma_to_ram;
> + bool cyclic;
> + int ret;
> +
> + if (copy_from_user(&iio_dmabuf, iio_dmabuf_req, sizeof(iio_dmabuf)))
> + return -EFAULT;
> +
> + if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
> + return -EINVAL;
> +
> + cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
> +
> + /* Cyclic flag is only supported on output buffers */
> + if (cyclic && buffer->direction != IIO_BUFFER_DIRECTION_OUT)
> + return -EINVAL;
> +
> + dmabuf = dma_buf_get(iio_dmabuf.fd);
> + if (IS_ERR(dmabuf))
> + return PTR_ERR(dmabuf);
> +
> + if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used > dmabuf->size) {
> + ret = -EINVAL;
> + goto err_dmabuf_put;
> + }
> +
> + attach = iio_buffer_find_attachment(indio_dev, dmabuf);
> + if (IS_ERR(attach)) {
> + ret = PTR_ERR(attach);
> + goto err_dmabuf_put;
> + }
> +
> + priv = attach->importer_priv;
> +
> + dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
> + dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> + sgt = dma_buf_map_attachment(attach, dir);
> + if (IS_ERR(sgt)) {
> + ret = PTR_ERR(sgt);
> + dev_err(&indio_dev->dev, "Unable to map attachment: %d\n", ret);
> + goto err_attachment_put;
> + }
> +
> + fence = kmalloc(sizeof(*fence), GFP_KERNEL);
> + if (!fence) {
> + ret = -ENOMEM;
> + goto err_unmap_attachment;
> + }
> +
> + fence->priv = priv;
> + fence->sgt = sgt;
> + fence->dir = dir;
> + priv->fence = fence;
> +
> + dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
> + &priv->lock, priv->context, 0);

Same comment as for the usb gadget patch: You need a real seqno here (and
iio must then guarantee that all transactions are ordered), or a new
unordered dma_fence (meaning a new context for each fence, currently
there's no non-hackish way to make that happen).

> +
> + ret = iio_dma_resv_lock(dmabuf, nonblock);
> + if (ret)
> + goto err_fence_put;
> +
> + timeout = nonblock ? 0 : msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
> +
> + /* Make sure we don't have writers */
> + ret = (int) dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_WRITE,
> + true, timeout);
> + if (ret == 0)
> + ret = -EBUSY;
> + if (ret < 0)
> + goto err_resv_unlock;
> +
> + if (dma_to_ram) {
> + /*
> + * If we're writing to the DMABUF, make sure we don't have
> + * readers
> + */
> + ret = (int) dma_resv_wait_timeout(dmabuf->resv,
> + DMA_RESV_USAGE_READ, true,
> + timeout);
> + if (ret == 0)
> + ret = -EBUSY;
> + if (ret < 0)
> + goto err_resv_unlock;
> + }
> +
> + ret = dma_resv_reserve_fences(dmabuf->resv, 1);
> + if (ret)
> + goto err_resv_unlock;
> +
> + dma_resv_add_fence(dmabuf->resv, &fence->base,
> + dma_resv_usage_rw(dma_to_ram));
> + dma_resv_unlock(dmabuf->resv);

Please add dma_buf_begin/end_signalling annotations here to make sure the
locking/memory allocation rules for dma_fence are followed. Same
suggestion would also be good in the usb gadget code.
> +
> + ret = buffer->access->enqueue_dmabuf(buffer, priv->block, sgt,
> + iio_dmabuf.bytes_used, cyclic);
> + if (ret)
> + iio_buffer_signal_dmabuf_done(attach, ret);
> +
> + dma_buf_put(dmabuf);
> +
> + return ret;
> +
> +err_resv_unlock:
> + dma_resv_unlock(dmabuf->resv);
> +err_fence_put:
> + dma_fence_put(&fence->base);
> +err_unmap_attachment:
> + dma_buf_unmap_attachment(attach, sgt, dir);
> +err_attachment_put:
> + iio_buffer_dmabuf_put(attach);
> +err_dmabuf_put:
> + dma_buf_put(dmabuf);
> +
> + return ret;
> +}
> +
> +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret)
> +{
> + struct iio_dmabuf_priv *priv = attach->importer_priv;
> + struct iio_dma_fence *fence = priv->fence;
> + enum dma_data_direction dir = fence->dir;
> + struct sg_table *sgt = fence->sgt;
> +
> + dma_fence_get(&fence->base);
> + fence->base.error = ret;
> + dma_fence_signal(&fence->base);
> + dma_fence_put(&fence->base);
> +
> + dma_buf_unmap_attachment(attach, sgt, dir);
> + iio_buffer_dmabuf_put(attach);

Like with the usb gadget code I have concerns that you might hold up the
entire completion machinery here by taking the wrong locks. You probably
want to add dma_fence_begin/end_signalling annotations here, and also
split out the final attachment unref with all the refcount unravelling
into a preallocated worker.

The other issue is dma_buf_unmap_attachment here. That must be called with
dma_resv_lock held, but you can't do that here because this is dma_fence
completion code.

Usually drivers cache this stuff, but I guess you could also just put that
into your unref worker. The more gnarly issue is that if you need this for
cache coherency maintainance, then that should be _before_ the
dma_fence_signal(), but currently we don't have a dma_buf function which
does only the cache maintenance (which wouldn't need dma_resv_lock) and
not also the unmapping.

I think to make sure we don't have a big design issue here we need:

- dma_fence_begin/end_signalling critical section annotations in both iio
and usb gadget code, for anything that could potentially hold up the
dma_fence_signal at any point after a fence has been installed into the
dma_resv object.

- probably dma-api debugging or a platform that needs cache flushes to
make sure this works. For gpu dma-buf sharing we pretty much side-step
this all by assuming everyone does only write-combined mappings ever, or
at least the interconnect fabric is reasonable enough that flushing only
around cpu access is enough. This assumption very much does not hold in
general, and it's fallen apart enough times even for gpu dma-buf sharing
in the past.

Otherwise I think we might end up opening pandoras box here a bit and
merge code that works in tech demo mode, but which would need serious
amounts of subsystem rework in iio or usb gadget to make it work correctly
across the board.

Cheers, Sima

> +}
> +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
> +
> +static long iio_buffer_chrdev_ioctl(struct file *filp,
> + unsigned int cmd, unsigned long arg)
> +{
> + struct iio_dev_buffer_pair *ib = filp->private_data;
> + void __user *_arg = (void __user *)arg;
> +
> + switch (cmd) {
> + case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
> + return iio_buffer_attach_dmabuf(ib, _arg);
> + case IIO_BUFFER_DMABUF_DETACH_IOCTL:
> + return iio_buffer_detach_dmabuf(ib, _arg);
> + case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
> + return iio_buffer_enqueue_dmabuf(ib, _arg,
> + filp->f_flags & O_NONBLOCK);
> + default:
> + return IIO_IOCTL_UNHANDLED;
> + }
> +}
> +
> static const struct file_operations iio_buffer_chrdev_fileops = {
> .owner = THIS_MODULE,
> .llseek = noop_llseek,
> .read = iio_buffer_read,
> .write = iio_buffer_write,
> + .unlocked_ioctl = iio_buffer_chrdev_ioctl,
> + .compat_ioctl = compat_ptr_ioctl,
> .poll = iio_buffer_poll,
> .release = iio_buffer_chrdev_release,
> };
> diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h
> index 89c3fd7c29ca..55d93705c96b 100644
> --- a/include/linux/iio/buffer_impl.h
> +++ b/include/linux/iio/buffer_impl.h
> @@ -9,8 +9,11 @@
> #include <uapi/linux/iio/buffer.h>
> #include <linux/iio/buffer.h>
>
> +struct dma_buf_attachment;
> struct iio_dev;
> +struct iio_dma_buffer_block;
> struct iio_buffer;
> +struct sg_table;
>
> /**
> * INDIO_BUFFER_FLAG_FIXED_WATERMARK - Watermark level of the buffer can not be
> @@ -39,6 +42,13 @@ struct iio_buffer;
> * device stops sampling. Calles are balanced with @enable.
> * @release: called when the last reference to the buffer is dropped,
> * should free all resources allocated by the buffer.
> + * @attach_dmabuf: called from userspace via ioctl to attach one external
> + * DMABUF.
> + * @detach_dmabuf: called from userspace via ioctl to detach one previously
> + * attached DMABUF.
> + * @enqueue_dmabuf: called from userspace via ioctl to queue this DMABUF
> + * object to this buffer. Requires a valid DMABUF fd, that
> + * was previouly attached to this buffer.
> * @modes: Supported operating modes by this buffer type
> * @flags: A bitmask combination of INDIO_BUFFER_FLAG_*
> *
> @@ -68,6 +78,14 @@ struct iio_buffer_access_funcs {
>
> void (*release)(struct iio_buffer *buffer);
>
> + struct iio_dma_buffer_block * (*attach_dmabuf)(struct iio_buffer *buffer,
> + struct dma_buf_attachment *attach);
> + void (*detach_dmabuf)(struct iio_buffer *buffer,
> + struct iio_dma_buffer_block *block);
> + int (*enqueue_dmabuf)(struct iio_buffer *buffer,
> + struct iio_dma_buffer_block *block,
> + struct sg_table *sgt, size_t size, bool cyclic);
> +
> unsigned int modes;
> unsigned int flags;
> };
> @@ -136,6 +154,9 @@ struct iio_buffer {
>
> /* @ref: Reference count of the buffer. */
> struct kref ref;
> +
> + /* @dmabufs: List of DMABUF attachments */
> + struct list_head dmabufs;
> };
>
> /**
> @@ -156,9 +177,14 @@ int iio_update_buffers(struct iio_dev *indio_dev,
> **/
> void iio_buffer_init(struct iio_buffer *buffer);
>
> +void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach);
> +void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach);
> +
> struct iio_buffer *iio_buffer_get(struct iio_buffer *buffer);
> void iio_buffer_put(struct iio_buffer *buffer);
>
> +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret);
> +
> #else /* CONFIG_IIO_BUFFER */
>
> static inline void iio_buffer_get(struct iio_buffer *buffer) {}
> diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
> index 13939032b3f6..c666aa95e532 100644
> --- a/include/uapi/linux/iio/buffer.h
> +++ b/include/uapi/linux/iio/buffer.h
> @@ -5,6 +5,28 @@
> #ifndef _UAPI_IIO_BUFFER_H_
> #define _UAPI_IIO_BUFFER_H_
>
> +#include <linux/types.h>
> +
> +/* Flags for iio_dmabuf.flags */
> +#define IIO_BUFFER_DMABUF_CYCLIC (1 << 0)
> +#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS 0x00000001
> +
> +/**
> + * struct iio_dmabuf - Descriptor for a single IIO DMABUF object
> + * @fd: file descriptor of the DMABUF object
> + * @flags: one or more IIO_BUFFER_DMABUF_* flags
> + * @bytes_used: number of bytes used in this DMABUF for the data transfer.
> + * Should generally be set to the DMABUF's size.
> + */
> +struct iio_dmabuf {
> + __u32 fd;
> + __u32 flags;
> + __u64 bytes_used;
> +};
> +
> #define IIO_BUFFER_GET_FD_IOCTL _IOWR('i', 0x91, int)
> +#define IIO_BUFFER_DMABUF_ATTACH_IOCTL _IOW('i', 0x92, int)
> +#define IIO_BUFFER_DMABUF_DETACH_IOCTL _IOW('i', 0x93, int)
> +#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL _IOW('i', 0x94, struct iio_dmabuf)
>
> #endif /* _UAPI_IIO_BUFFER_H_ */
> --
> 2.43.0
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-01-08 21:13:24

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

On 12/19/23 11:50 AM, Paul Cercueil wrote:
> [V4 was: "iio: Add buffer write() support"][1]
>
> Hi Jonathan,
>
> This is a respin of the V3 of my patchset that introduced a new
> interface based on DMABUF objects [2].
>
> The V4 was a split of the patchset, to attempt to upstream buffer
> write() support first. But since there is no current user upstream, it
> was not merged. This V5 is about doing the opposite, and contains the
> new DMABUF interface, without adding the buffer write() support. It can
> already be used with the upstream adi-axi-adc driver.
>
> In user-space, Libiio uses it to transfer back and forth blocks of
> samples between the hardware and the applications, without having to
> copy the data.
>
> On a ZCU102 with a FMComms3 daughter board, running Libiio from the
> pcercuei/dev-new-dmabuf-api branch [3], compiled with
> WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> Throughput: 116 MiB/s
>
> Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> Throughput: 475 MiB/s
>
> This benchmark only measures the speed at which the data can be fetched
> to iio_rwdev's internal buffers, and does not actually try to read the
> data (e.g. to pipe it to stdout). It shows that fetching the data is
> more than 4x faster using the new interface.
>
> When actually reading the data, the performance difference isn't that
> impressive (maybe because in case of DMABUF the data is not in cache):
>
> WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress
> 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
>
> WITH_LOCAL_DMABUF_API=ON:
> sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress
> 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
>
> One interesting thing to note is that fileio is (currently) actually
> faster than the DMABUF interface if you increase a lot the buffer size.
> My explanation is that the cache invalidation routine takes more and
> more time the bigger the DMABUF gets. This is because the DMABUF is
> backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up
> to 16 thousands pages, that have to be invalidated one by one. This can
> be addressed by using huge pages, but the udmabuf driver does not (yet)
> support creating DMABUFs backed by huge pages.
>

Have you tried DMABUFs created using the DMABUF System heap exporter?
(drivers/dma-buf/heaps/system_heap.c) It should be able to handle
larger allocation better here, and if you don't have any active
mmaps or vmaps then it can skip CPU-side coherency maintenance
(useful for device to device transfers).

Allocating DMABUFs out of user pages has a bunch of other issues you
might run into also. I'd argue udmabuf is now completely superseded
by DMABUF system heaps. Try it out :)

Andrew

> Anyway, the real benefits happen when the DMABUFs are either shared
> between IIO devices, or between the IIO subsystem and another
> filesystem. In that case, the DMABUFs are simply passed around drivers,
> without the data being copied at any moment.
>
> We use that feature to transfer samples from our transceivers to USB,
> using a DMABUF interface to FunctionFS [4].
>
> This drastically increases the throughput, to about 274 MiB/s over a
> USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the
> FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
>
> Based on linux-next/next-20231219.
>
> Cheers,
> -Paul
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
> [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> [4] https://lore.kernel.org/all/[email protected]/
>
> ---
> Changelog:
> - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
> dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct.
> Note that at some point we will need to support cyclic transfers
> using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> parameter to the function?
>
> - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> .device_prep_slave_dma_array().
>
> @Vinod: this patch will cause a small conflict with my other
> patchset adding scatter-gather support to the axi-dmac driver.
> This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
> prototype of this function changed in my other patchset - it would
> have to be passed the "chan" variable. I don't know how you prefer it
> to be resolved. Worst case scenario (and if @Jonathan is okay with
> that) this one patch can be re-sent later, but it would make this
> patchset less "atomic".
>
> - [5/8]:
> - Use dev_err() instead of pr_err()
> - Inline to_iio_dma_fence()
> - Add comment to explain why we unref twice when detaching dmabuf
> - Remove TODO comment. It is actually safe to free the file's
> private data even when transfers are still pending because it
> won't be accessed.
> - Fix documentation of new fields in struct iio_buffer_access_funcs
> - iio_dma_resv_lock() does not need to be exported, make it static
>
> - [7/8]:
> - Use the new dmaengine_prep_slave_dma_vec().
> - Restrict to input buffers, since output buffers are not yet
> supported by IIO buffers.
>
> - [8/8]:
> Use description lists for the documentation of the three new IOCTLs
> instead of abusing subsections.
>
> ---
> Alexandru Ardelean (1):
> iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>
> Paul Cercueil (7):
> iio: buffer-dma: Get rid of outgoing queue
> dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
> iio: core: Add new DMABUF interface infrastructure
> iio: buffer-dma: Enable support for DMABUFs
> iio: buffer-dmaengine: Support new DMABUF based userspace API
> Documentation: iio: Document high-speed DMABUF based API
>
> Documentation/iio/dmabuf_api.rst | 54 +++
> Documentation/iio/index.rst | 2 +
> drivers/dma/dma-axi-dmac.c | 40 ++
> drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++---
> .../buffer/industrialio-buffer-dmaengine.c | 52 ++-
> drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++
> include/linux/dmaengine.h | 25 ++
> include/linux/iio/buffer-dma.h | 33 +-
> include/linux/iio/buffer_impl.h | 26 ++
> include/uapi/linux/iio/buffer.h | 22 +
> 10 files changed, 836 insertions(+), 62 deletions(-)
> create mode 100644 Documentation/iio/dmabuf_api.rst
>

2024-01-11 09:24:14

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

Hi Andrew,

Le lundi 08 janvier 2024 à 15:12 -0600, Andrew Davis a écrit :
> On 12/19/23 11:50 AM, Paul Cercueil wrote:
> > [V4 was: "iio: Add buffer write() support"][1]
> >
> > Hi Jonathan,
> >
> > This is a respin of the V3 of my patchset that introduced a new
> > interface based on DMABUF objects [2].
> >
> > The V4 was a split of the patchset, to attempt to upstream buffer
> > write() support first. But since there is no current user upstream,
> > it
> > was not merged. This V5 is about doing the opposite, and contains
> > the
> > new DMABUF interface, without adding the buffer write() support. It
> > can
> > already be used with the upstream adi-axi-adc driver.
> >
> > In user-space, Libiio uses it to transfer back and forth blocks of
> > samples between the hardware and the applications, without having
> > to
> > copy the data.
> >
> > On a ZCU102 with a FMComms3 daughter board, running Libiio from the
> > pcercuei/dev-new-dmabuf-api branch [3], compiled with
> > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> >    sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> >    Throughput: 116 MiB/s
> >
> > Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> >    sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> >    Throughput: 475 MiB/s
> >
> > This benchmark only measures the speed at which the data can be
> > fetched
> > to iio_rwdev's internal buffers, and does not actually try to read
> > the
> > data (e.g. to pipe it to stdout). It shows that fetching the data
> > is
> > more than 4x faster using the new interface.
> >
> > When actually reading the data, the performance difference isn't
> > that
> > impressive (maybe because in case of DMABUF the data is not in
> > cache):
> >
> > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> >    sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > status=progress
> >    2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
> >
> > WITH_LOCAL_DMABUF_API=ON:
> >    sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > status=progress
> >    2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
> >
> > One interesting thing to note is that fileio is (currently)
> > actually
> > faster than the DMABUF interface if you increase a lot the buffer
> > size.
> > My explanation is that the cache invalidation routine takes more
> > and
> > more time the bigger the DMABUF gets. This is because the DMABUF is
> > backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by
> > up
> > to 16 thousands pages, that have to be invalidated one by one. This
> > can
> > be addressed by using huge pages, but the udmabuf driver does not
> > (yet)
> > support creating DMABUFs backed by huge pages.
> >
>
> Have you tried DMABUFs created using the DMABUF System heap exporter?
> (drivers/dma-buf/heaps/system_heap.c) It should be able to handle
> larger allocation better here, and if you don't have any active
> mmaps or vmaps then it can skip CPU-side coherency maintenance
> (useful for device to device transfers).

I didn't know about it!

But udmabuf also allows you to skip CPU-side coherency maintenance,
since DMABUFs have two ioctls to start/finish CPU access anyway.

> Allocating DMABUFs out of user pages has a bunch of other issues you
> might run into also. I'd argue udmabuf is now completely superseded
> by DMABUF system heaps. Try it out :)

I'm curious, what other issues?

The good thing about udmabuf is that the memory is backed by pages, so
we can use MSG_ZEROCOPY on sockets to transfer the mmapped data over
the network (having a DMABUF interface to the network stack would be
better, but I'm not opening that can of worms).

> Andrew

Cheers,
-Paul

> > Anyway, the real benefits happen when the DMABUFs are either shared
> > between IIO devices, or between the IIO subsystem and another
> > filesystem. In that case, the DMABUFs are simply passed around
> > drivers,
> > without the data being copied at any moment.
> >
> > We use that feature to transfer samples from our transceivers to
> > USB,
> > using a DMABUF interface to FunctionFS [4].
> >
> > This drastically increases the throughput, to about 274 MiB/s over
> > a
> > USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to
> > the
> > FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load
> > avg.).
> >
> > Based on linux-next/next-20231219.
> >
> > Cheers,
> > -Paul
> >
> > [1]
> > https://lore.kernel.org/all/[email protected]/
> > [2]
> > https://lore.kernel.org/all/[email protected]/
> > [3]
> > https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> > [4]
> > https://lore.kernel.org/all/[email protected]/
> >
> > ---
> > Changelog:
> > - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
> >    dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec'
> > struct.
> >    Note that at some point we will need to support cyclic transfers
> >    using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> >    parameter to the function?
> >
> > - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> >    .device_prep_slave_dma_array().
> >
> >    @Vinod: this patch will cause a small conflict with my other
> >    patchset adding scatter-gather support to the axi-dmac driver.
> >    This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
> >    prototype of this function changed in my other patchset - it
> > would
> >    have to be passed the "chan" variable. I don't know how you
> > prefer it
> >    to be resolved. Worst case scenario (and if @Jonathan is okay
> > with
> >    that) this one patch can be re-sent later, but it would make
> > this
> >    patchset less "atomic".
> >
> > - [5/8]:
> >    - Use dev_err() instead of pr_err()
> >    - Inline to_iio_dma_fence()
> >    - Add comment to explain why we unref twice when detaching
> > dmabuf
> >    - Remove TODO comment. It is actually safe to free the file's
> >      private data even when transfers are still pending because it
> >      won't be accessed.
> >    - Fix documentation of new fields in struct
> > iio_buffer_access_funcs
> >    - iio_dma_resv_lock() does not need to be exported, make it
> > static
> >
> > - [7/8]:
> >    - Use the new dmaengine_prep_slave_dma_vec().
> >    - Restrict to input buffers, since output buffers are not yet
> >      supported by IIO buffers.
> >
> > - [8/8]:
> >    Use description lists for the documentation of the three new
> > IOCTLs
> >    instead of abusing subsections.
> >
> > ---
> > Alexandru Ardelean (1):
> >    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> >
> > Paul Cercueil (7):
> >    iio: buffer-dma: Get rid of outgoing queue
> >    dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> >    dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
> >    iio: core: Add new DMABUF interface infrastructure
> >    iio: buffer-dma: Enable support for DMABUFs
> >    iio: buffer-dmaengine: Support new DMABUF based userspace API
> >    Documentation: iio: Document high-speed DMABUF based API
> >
> >   Documentation/iio/dmabuf_api.rst              |  54 +++
> >   Documentation/iio/index.rst                   |   2 +
> >   drivers/dma/dma-axi-dmac.c                    |  40 ++
> >   drivers/iio/buffer/industrialio-buffer-dma.c  | 242 ++++++++---
> >   .../buffer/industrialio-buffer-dmaengine.c    |  52 ++-
> >   drivers/iio/industrialio-buffer.c             | 402
> > ++++++++++++++++++
> >   include/linux/dmaengine.h                     |  25 ++
> >   include/linux/iio/buffer-dma.h                |  33 +-
> >   include/linux/iio/buffer_impl.h               |  26 ++
> >   include/uapi/linux/iio/buffer.h               |  22 +
> >   10 files changed, 836 insertions(+), 62 deletions(-)
> >   create mode 100644 Documentation/iio/dmabuf_api.rst
> >


2024-01-11 17:32:29

by Andrew Davis

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

On 1/11/24 3:20 AM, Paul Cercueil wrote:
> Hi Andrew,
>
> Le lundi 08 janvier 2024 à 15:12 -0600, Andrew Davis a écrit :
>> On 12/19/23 11:50 AM, Paul Cercueil wrote:
>>> [V4 was: "iio: Add buffer write() support"][1]
>>>
>>> Hi Jonathan,
>>>
>>> This is a respin of the V3 of my patchset that introduced a new
>>> interface based on DMABUF objects [2].
>>>
>>> The V4 was a split of the patchset, to attempt to upstream buffer
>>> write() support first. But since there is no current user upstream,
>>> it
>>> was not merged. This V5 is about doing the opposite, and contains
>>> the
>>> new DMABUF interface, without adding the buffer write() support. It
>>> can
>>> already be used with the upstream adi-axi-adc driver.
>>>
>>> In user-space, Libiio uses it to transfer back and forth blocks of
>>> samples between the hardware and the applications, without having
>>> to
>>> copy the data.
>>>
>>> On a ZCU102 with a FMComms3 daughter board, running Libiio from the
>>> pcercuei/dev-new-dmabuf-api branch [3], compiled with
>>> WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
>>>    sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
>>>    Throughput: 116 MiB/s
>>>
>>> Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
>>>    sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
>>>    Throughput: 475 MiB/s
>>>
>>> This benchmark only measures the speed at which the data can be
>>> fetched
>>> to iio_rwdev's internal buffers, and does not actually try to read
>>> the
>>> data (e.g. to pipe it to stdout). It shows that fetching the data
>>> is
>>> more than 4x faster using the new interface.
>>>
>>> When actually reading the data, the performance difference isn't
>>> that
>>> impressive (maybe because in case of DMABUF the data is not in
>>> cache):
>>>
>>> WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
>>>    sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
>>> status=progress
>>>    2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
>>>
>>> WITH_LOCAL_DMABUF_API=ON:
>>>    sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
>>> status=progress
>>>    2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
>>>
>>> One interesting thing to note is that fileio is (currently)
>>> actually
>>> faster than the DMABUF interface if you increase a lot the buffer
>>> size.
>>> My explanation is that the cache invalidation routine takes more
>>> and
>>> more time the bigger the DMABUF gets. This is because the DMABUF is
>>> backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by
>>> up
>>> to 16 thousands pages, that have to be invalidated one by one. This
>>> can
>>> be addressed by using huge pages, but the udmabuf driver does not
>>> (yet)
>>> support creating DMABUFs backed by huge pages.
>>>
>>
>> Have you tried DMABUFs created using the DMABUF System heap exporter?
>> (drivers/dma-buf/heaps/system_heap.c) It should be able to handle
>> larger allocation better here, and if you don't have any active
>> mmaps or vmaps then it can skip CPU-side coherency maintenance
>> (useful for device to device transfers).
>
> I didn't know about it!
>
> But udmabuf also allows you to skip CPU-side coherency maintenance,
> since DMABUFs have two ioctls to start/finish CPU access anyway.
>

The only way it lets you skip that is if your application just doesn't
call those begin/end ioctls, which is wrong. That may work on a system
where CPU caches can be snooped by all devices that could attach to
a buffer(x86), but that might not work on others(ARM). So calling
those begin/end ioctls is required[0]. If maintenance is not actually
needed then the kernel will turn those calls into NOPs for you, but only
the kernel can know when that is correct (based on the running system
and the devices attached to that buffer), not userspace.

>> Allocating DMABUFs out of user pages has a bunch of other issues you
>> might run into also. I'd argue udmabuf is now completely superseded
>> by DMABUF system heaps. Try it out :)
>
> I'm curious, what other issues?
>

For starters the {begin,end}_cpu_access() callbacks don't actually
sync the pages for any of the devices attached to the DMABUF, it
only makes a fake mapping for the misc device(CPU) then syncs with
that. That probably works for the QEMU case it was designed for where
the device is always a VM instance running on the same CPU, but for
any real devices the sync never happens towards them.

I have some patches fixing the above I'll post this cycle, but it
wont help with folks doing reads/wrties on the original shmem/memfd
outside of the begin/end ioctls. So there is a fundamental issue
with the buffer's backing memory's ownership/lifecycle that makes
udmabuf broken by design.

The DMABUF System Heap owns the backing memory and manages that
memory's lifecycle as all correct DMABUF exporters must.

> The good thing about udmabuf is that the memory is backed by pages, so
> we can use MSG_ZEROCOPY on sockets to transfer the mmapped data over
> the network (having a DMABUF interface to the network stack would be
> better, but I'm not opening that can of worms).
>

Yes, having a DMABUF importer interface for the network stack would be
the best long-term solution here, and one will probably end up being
needed for zero-copy buffer passing directly between HW and network
which seems to be a growing area of interest. And would help solve
some cases where MSG_ZEROCOPY fails (such as devices without
scatter-gather) by making sure the backing buffer meets the needs
of all attached devices, etc.. But I do agree let's leave those
worm-cans for someone else to open :)

I wonder what would happen if you tried a MSG_ZEROCOPY on a buffer
that was an mmap'd address from a DMABUF.. probably nothing good
but might be worth looking into.

Andrew

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma-buf/dma-buf.c#n1323

>> Andrew
>
> Cheers,
> -Paul
>
>>> Anyway, the real benefits happen when the DMABUFs are either shared
>>> between IIO devices, or between the IIO subsystem and another
>>> filesystem. In that case, the DMABUFs are simply passed around
>>> drivers,
>>> without the data being copied at any moment.
>>>
>>> We use that feature to transfer samples from our transceivers to
>>> USB,
>>> using a DMABUF interface to FunctionFS [4].
>>>
>>> This drastically increases the throughput, to about 274 MiB/s over
>>> a
>>> USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to
>>> the
>>> FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load
>>> avg.).
>>>
>>> Based on linux-next/next-20231219.
>>>
>>> Cheers,
>>> -Paul
>>>
>>> [1]
>>> https://lore.kernel.org/all/[email protected]/
>>> [2]
>>> https://lore.kernel.org/all/[email protected]/
>>> [3]
>>> https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
>>> [4]
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>> ---
>>> Changelog:
>>> - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
>>>    dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec'
>>> struct.
>>>    Note that at some point we will need to support cyclic transfers
>>>    using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
>>>    parameter to the function?
>>>
>>> - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
>>>    .device_prep_slave_dma_array().
>>>
>>>    @Vinod: this patch will cause a small conflict with my other
>>>    patchset adding scatter-gather support to the axi-dmac driver.
>>>    This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
>>>    prototype of this function changed in my other patchset - it
>>> would
>>>    have to be passed the "chan" variable. I don't know how you
>>> prefer it
>>>    to be resolved. Worst case scenario (and if @Jonathan is okay
>>> with
>>>    that) this one patch can be re-sent later, but it would make
>>> this
>>>    patchset less "atomic".
>>>
>>> - [5/8]:
>>>    - Use dev_err() instead of pr_err()
>>>    - Inline to_iio_dma_fence()
>>>    - Add comment to explain why we unref twice when detaching
>>> dmabuf
>>>    - Remove TODO comment. It is actually safe to free the file's
>>>      private data even when transfers are still pending because it
>>>      won't be accessed.
>>>    - Fix documentation of new fields in struct
>>> iio_buffer_access_funcs
>>>    - iio_dma_resv_lock() does not need to be exported, make it
>>> static
>>>
>>> - [7/8]:
>>>    - Use the new dmaengine_prep_slave_dma_vec().
>>>    - Restrict to input buffers, since output buffers are not yet
>>>      supported by IIO buffers.
>>>
>>> - [8/8]:
>>>    Use description lists for the documentation of the three new
>>> IOCTLs
>>>    instead of abusing subsections.
>>>
>>> ---
>>> Alexandru Ardelean (1):
>>>    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>>>
>>> Paul Cercueil (7):
>>>    iio: buffer-dma: Get rid of outgoing queue
>>>    dmaengine: Add API function dmaengine_prep_slave_dma_vec()
>>>    dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
>>>    iio: core: Add new DMABUF interface infrastructure
>>>    iio: buffer-dma: Enable support for DMABUFs
>>>    iio: buffer-dmaengine: Support new DMABUF based userspace API
>>>    Documentation: iio: Document high-speed DMABUF based API
>>>
>>>   Documentation/iio/dmabuf_api.rst              |  54 +++
>>>   Documentation/iio/index.rst                   |   2 +
>>>   drivers/dma/dma-axi-dmac.c                    |  40 ++
>>>   drivers/iio/buffer/industrialio-buffer-dma.c  | 242 ++++++++---
>>>   .../buffer/industrialio-buffer-dmaengine.c    |  52 ++-
>>>   drivers/iio/industrialio-buffer.c             | 402
>>> ++++++++++++++++++
>>>   include/linux/dmaengine.h                     |  25 ++
>>>   include/linux/iio/buffer-dma.h                |  33 +-
>>>   include/linux/iio/buffer_impl.h               |  26 ++
>>>   include/uapi/linux/iio/buffer.h               |  22 +
>>>   10 files changed, 836 insertions(+), 62 deletions(-)
>>>   create mode 100644 Documentation/iio/dmabuf_api.rst
>>>
>

2024-01-12 11:34:30

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5

Hi Andrew,

Le jeudi 11 janvier 2024 à 11:30 -0600, Andrew Davis a écrit :
> On 1/11/24 3:20 AM, Paul Cercueil wrote:
> > Hi Andrew,
> >
> > Le lundi 08 janvier 2024 à 15:12 -0600, Andrew Davis a écrit :
> > > On 12/19/23 11:50 AM, Paul Cercueil wrote:
> > > > [V4 was: "iio: Add buffer write() support"][1]
> > > >
> > > > Hi Jonathan,
> > > >
> > > > This is a respin of the V3 of my patchset that introduced a new
> > > > interface based on DMABUF objects [2].
> > > >
> > > > The V4 was a split of the patchset, to attempt to upstream
> > > > buffer
> > > > write() support first. But since there is no current user
> > > > upstream,
> > > > it
> > > > was not merged. This V5 is about doing the opposite, and
> > > > contains
> > > > the
> > > > new DMABUF interface, without adding the buffer write()
> > > > support. It
> > > > can
> > > > already be used with the upstream adi-axi-adc driver.
> > > >
> > > > In user-space, Libiio uses it to transfer back and forth blocks
> > > > of
> > > > samples between the hardware and the applications, without
> > > > having
> > > > to
> > > > copy the data.
> > > >
> > > > On a ZCU102 with a FMComms3 daughter board, running Libiio from
> > > > the
> > > > pcercuei/dev-new-dmabuf-api branch [3], compiled with
> > > > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> > > >     sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> > > >     Throughput: 116 MiB/s
> > > >
> > > > Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> > > >     sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> > > >     Throughput: 475 MiB/s
> > > >
> > > > This benchmark only measures the speed at which the data can be
> > > > fetched
> > > > to iio_rwdev's internal buffers, and does not actually try to
> > > > read
> > > > the
> > > > data (e.g. to pipe it to stdout). It shows that fetching the
> > > > data
> > > > is
> > > > more than 4x faster using the new interface.
> > > >
> > > > When actually reading the data, the performance difference
> > > > isn't
> > > > that
> > > > impressive (maybe because in case of DMABUF the data is not in
> > > > cache):
> > > >
> > > > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> > > >     sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd
> > > > of=/dev/zero
> > > > status=progress
> > > >     2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
> > > >
> > > > WITH_LOCAL_DMABUF_API=ON:
> > > >     sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd
> > > > of=/dev/zero
> > > > status=progress
> > > >     2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
> > > >
> > > > One interesting thing to note is that fileio is (currently)
> > > > actually
> > > > faster than the DMABUF interface if you increase a lot the
> > > > buffer
> > > > size.
> > > > My explanation is that the cache invalidation routine takes
> > > > more
> > > > and
> > > > more time the bigger the DMABUF gets. This is because the
> > > > DMABUF is
> > > > backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed
> > > > by
> > > > up
> > > > to 16 thousands pages, that have to be invalidated one by one.
> > > > This
> > > > can
> > > > be addressed by using huge pages, but the udmabuf driver does
> > > > not
> > > > (yet)
> > > > support creating DMABUFs backed by huge pages.
> > > >
> > >
> > > Have you tried DMABUFs created using the DMABUF System heap
> > > exporter?
> > > (drivers/dma-buf/heaps/system_heap.c) It should be able to handle
> > > larger allocation better here, and if you don't have any active
> > > mmaps or vmaps then it can skip CPU-side coherency maintenance
> > > (useful for device to device transfers).
> >
> > I didn't know about it!
> >
> > But udmabuf also allows you to skip CPU-side coherency maintenance,
> > since DMABUFs have two ioctls to start/finish CPU access anyway.
> >
>
> The only way it lets you skip that is if your application just
> doesn't
> call those begin/end ioctls, which is wrong. That may work on a
> system
> where CPU caches can be snooped by all devices that could attach to
> a buffer(x86), but that might not work on others(ARM). So calling
> those begin/end ioctls is required[0]. If maintenance is not actually
> needed then the kernel will turn those calls into NOPs for you, but
> only
> the kernel can know when that is correct (based on the running system
> and the devices attached to that buffer), not userspace.

My application only calls these begin/end IOCTLs when the DMABUF's data
is accessed (through its mmapped address), and not ie. when I just pass
around the DMABUF to another device driver. In that case I don't care
that the CPU caches aren't sync'd.

>
> > > Allocating DMABUFs out of user pages has a bunch of other issues
> > > you
> > > might run into also. I'd argue udmabuf is now completely
> > > superseded
> > > by DMABUF system heaps. Try it out :)
> >
> > I'm curious, what other issues?
> >
>
> For starters the {begin,end}_cpu_access() callbacks don't actually
> sync the pages for any of the devices attached to the DMABUF, it
> only makes a fake mapping for the misc device(CPU) then syncs with
> that. That probably works for the QEMU case it was designed for where
> the device is always a VM instance running on the same CPU, but for
> any real devices the sync never happens towards them.
>
> I have some patches fixing the above I'll post this cycle, but it
> wont help with folks doing reads/wrties on the original shmem/memfd
> outside of the begin/end ioctls. So there is a fundamental issue
> with the buffer's backing memory's ownership/lifecycle that makes
> udmabuf broken by design.
>
> The DMABUF System Heap owns the backing memory and manages that
> memory's lifecycle as all correct DMABUF exporters must.

Sounds good.

One other issue I was having with udmabuf is that it does not like huge
pages, so I had to use small 4096 bytes pages. Since my DMABUFs can be
huge (half of the RAM) this caused my SGs to be very long, and in turn
cause the CPU to spend an enormous amount of time inside
dma_sg_sync_for_cpu().

At least the DMABUF system heap seems better designed in that regard.

>
> > The good thing about udmabuf is that the memory is backed by pages,
> > so
> > we can use MSG_ZEROCOPY on sockets to transfer the mmapped data
> > over
> > the network (having a DMABUF interface to the network stack would
> > be
> > better, but I'm not opening that can of worms).
> >
>
> Yes, having a DMABUF importer interface for the network stack would
> be
> the best long-term solution here, and one will probably end up being
> needed for zero-copy buffer passing directly between HW and network
> which seems to be a growing area of interest. And would help solve
> some cases where MSG_ZEROCOPY fails (such as devices without
> scatter-gather) by making sure the backing buffer meets the needs
> of all attached devices, etc.. But I do agree let's leave those
> worm-cans for someone else to open :)
>
> I wonder what would happen if you tried a MSG_ZEROCOPY on a buffer
> that was an mmap'd address from a DMABUF.. probably nothing good
> but might be worth looking into.

I'll give it a try at some point.

Cheers,
-Paul

>
> Andrew
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma-buf/dma-buf.c#n1323
>
> > > Andrew
> >
> > Cheers,
> > -Paul
> >
> > > > Anyway, the real benefits happen when the DMABUFs are either
> > > > shared
> > > > between IIO devices, or between the IIO subsystem and another
> > > > filesystem. In that case, the DMABUFs are simply passed around
> > > > drivers,
> > > > without the data being copied at any moment.
> > > >
> > > > We use that feature to transfer samples from our transceivers
> > > > to
> > > > USB,
> > > > using a DMABUF interface to FunctionFS [4].
> > > >
> > > > This drastically increases the throughput, to about 274 MiB/s
> > > > over
> > > > a
> > > > USB3 link, vs. 127 MiB/s using IIO's fileio interface + write()
> > > > to
> > > > the
> > > > FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load
> > > > avg.).
> > > >
> > > > Based on linux-next/next-20231219.
> > > >
> > > > Cheers,
> > > > -Paul
> > > >
> > > > [1]
> > > > https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillounet/
> > > > [2]
> > > > https://lore.kernel.org/all/[email protected]/
> > > > [3]
> > > > https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> > > > [4]
> > > > https://lore.kernel.org/all/[email protected]/
> > > >
> > > > ---
> > > > Changelog:
> > > > - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a
> > > > new
> > > >     dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec'
> > > > struct.
> > > >     Note that at some point we will need to support cyclic
> > > > transfers
> > > >     using dmaengine_prep_slave_dma_vec(). Maybe with a new
> > > > "flags"
> > > >     parameter to the function?
> > > >
> > > > - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> > > >     .device_prep_slave_dma_array().
> > > >
> > > >     @Vinod: this patch will cause a small conflict with my
> > > > other
> > > >     patchset adding scatter-gather support to the axi-dmac
> > > > driver.
> > > >     This patch adds a call to axi_dmac_alloc_desc(num_sgs), but
> > > > the
> > > >     prototype of this function changed in my other patchset -
> > > > it
> > > > would
> > > >     have to be passed the "chan" variable. I don't know how you
> > > > prefer it
> > > >     to be resolved. Worst case scenario (and if @Jonathan is
> > > > okay
> > > > with
> > > >     that) this one patch can be re-sent later, but it would
> > > > make
> > > > this
> > > >     patchset less "atomic".
> > > >
> > > > - [5/8]:
> > > >     - Use dev_err() instead of pr_err()
> > > >     - Inline to_iio_dma_fence()
> > > >     - Add comment to explain why we unref twice when detaching
> > > > dmabuf
> > > >     - Remove TODO comment. It is actually safe to free the
> > > > file's
> > > >       private data even when transfers are still pending
> > > > because it
> > > >       won't be accessed.
> > > >     - Fix documentation of new fields in struct
> > > > iio_buffer_access_funcs
> > > >     - iio_dma_resv_lock() does not need to be exported, make it
> > > > static
> > > >
> > > > - [7/8]:
> > > >     - Use the new dmaengine_prep_slave_dma_vec().
> > > >     - Restrict to input buffers, since output buffers are not
> > > > yet
> > > >       supported by IIO buffers.
> > > >
> > > > - [8/8]:
> > > >     Use description lists for the documentation of the three
> > > > new
> > > > IOCTLs
> > > >     instead of abusing subsections.
> > > >
> > > > ---
> > > > Alexandru Ardelean (1):
> > > >     iio: buffer-dma: split iio_dma_buffer_fileio_free()
> > > > function
> > > >
> > > > Paul Cercueil (7):
> > > >     iio: buffer-dma: Get rid of outgoing queue
> > > >     dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> > > >     dmaengine: dma-axi-dmac: Implement
> > > > device_prep_slave_dma_vec
> > > >     iio: core: Add new DMABUF interface infrastructure
> > > >     iio: buffer-dma: Enable support for DMABUFs
> > > >     iio: buffer-dmaengine: Support new DMABUF based userspace
> > > > API
> > > >     Documentation: iio: Document high-speed DMABUF based API
> > > >
> > > >    Documentation/iio/dmabuf_api.rst              |  54 +++
> > > >    Documentation/iio/index.rst                   |   2 +
> > > >    drivers/dma/dma-axi-dmac.c                    |  40 ++
> > > >    drivers/iio/buffer/industrialio-buffer-dma.c  | 242
> > > > ++++++++---
> > > >    .../buffer/industrialio-buffer-dmaengine.c    |  52 ++-
> > > >    drivers/iio/industrialio-buffer.c             | 402
> > > > ++++++++++++++++++
> > > >    include/linux/dmaengine.h                     |  25 ++
> > > >    include/linux/iio/buffer-dma.h                |  33 +-
> > > >    include/linux/iio/buffer_impl.h               |  26 ++
> > > >    include/uapi/linux/iio/buffer.h               |  22 +
> > > >    10 files changed, 836 insertions(+), 62 deletions(-)
> > > >    create mode 100644 Documentation/iio/dmabuf_api.rst
> > > >
> >


2024-01-22 11:21:44

by Vinod Koul

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH v5 3/8] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

Hi Paul,


On 08-01-24, 13:20, Paul Cercueil wrote:
> Hi Vinod,
>
> Le jeudi 21 d?cembre 2023 ? 20:44 +0530, Vinod Koul a ?crit?:
> > On 19-12-23, 18:50, Paul Cercueil wrote:
> > > This function can be used to initiate a scatter-gather DMA
> > > transfer,
> > > where the address and size of each segment is located in one entry
> > > of
> > > the dma_vec array.
> > >
> > > The major difference with dmaengine_prep_slave_sg() is that it
> > > supports
> > > specifying the lengths of each DMA transfer; as trying to override
> > > the
> > > length of the transfer with dmaengine_prep_slave_sg() is a very
> > > tedious
> > > process. The introduction of a new API function is also justified
> > > by the
> > > fact that scatterlists are on their way out.
> > >
> > > Note that dmaengine_prep_interleaved_dma() is not helpful either in
> > > that
> > > case, as it assumes that the address of each segment will be higher
> > > than
> > > the one of the previous segment, which we just cannot guarantee in
> > > case
> > > of a scatter-gather transfer.
> > >
> > > Signed-off-by: Paul Cercueil <[email protected]>
> > >
> > > ---
> > > v3: New patch
> > >
> > > v5: Replace with function dmaengine_prep_slave_dma_vec(), and
> > > struct
> > > ??? 'dma_vec'.
> > > ??? Note that at some point we will need to support cyclic
> > > transfers
> > > ??? using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> > > ??? parameter to the function?
> > > ---
> > > ?include/linux/dmaengine.h | 25 +++++++++++++++++++++++++
> > > ?1 file changed, 25 insertions(+)
> > >
> > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > > index 3df70d6131c8..ee5931ddb42f 100644
> > > --- a/include/linux/dmaengine.h
> > > +++ b/include/linux/dmaengine.h
> > > @@ -160,6 +160,16 @@ struct dma_interleaved_template {
> > > ? struct data_chunk sgl[];
> > > ?};
> > > ?
> > > +/**
> > > + * struct dma_vec - DMA vector
> > > + * @addr: Bus address of the start of the vector
> > > + * @len: Length in bytes of the DMA vector
> > > + */
> > > +struct dma_vec {
> > > + dma_addr_t addr;
> > > + size_t len;
> > > +};
>
> I don't want to be pushy, but I'd like to know how to solve this now,
> otherwise I'll just send the same patches for my v6.
>
> > so you want to transfer multiple buffers, right? why not use
> > dmaengine_prep_slave_sg(). If there is reason for not using that one?
>
> The reason is that we want to have the possibility to transfer less
> than the total size of the scatterlist, and that's currently very hard
> to do - scatterlists were designed to not be tampered with.
>
> Christian K?nig then suggested to introduce a "dma_vec" which had been
> on his TODO list for a while now.

Yeah for this interleaved seems overkill. Lets go with this api. I would
suggest change the name of the API replacing slave with peripheral
though

--
~Vinod

2024-01-25 13:58:49

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Hi Jonathan,

Le jeudi 21 décembre 2023 à 12:06 +0000, Jonathan Cameron a écrit :
> On Tue, 19 Dec 2023 18:50:06 +0100
> Paul Cercueil <[email protected]> wrote:
>
> > Add the necessary infrastructure to the IIO core to support a new
> > optional DMABUF based interface.
> >
> > With this new interface, DMABUF objects (externally created) can be
> > attached to a IIO buffer, and subsequently used for data transfer.
> >
> > A userspace application can then use this interface to share DMABUF
> > objects between several interfaces, allowing it to transfer data in
> > a
> > zero-copy fashion, for instance between IIO and the USB stack.
> >
> > The userspace application can also memory-map the DMABUF objects,
> > and
> > access the sample data directly. The advantage of doing this vs.
> > the
> > read() interface is that it avoids an extra copy of the data
> > between the
> > kernel and userspace. This is particularly userful for high-speed
> > devices which produce several megabytes or even gigabytes of data
> > per
> > second.
> >
> > As part of the interface, 3 new IOCTLs have been added:
> >
> > IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
> >  Attach the DMABUF object identified by the given file descriptor
> > to the
> >  buffer.
> >
> > IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
> >  Detach the DMABUF object identified by the given file descriptor
> > from
> >  the buffer. Note that closing the IIO buffer's file descriptor
> > will
> >  automatically detach all previously attached DMABUF objects.
> >
> > IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
> >  Request a data transfer to/from the given DMABUF object. Its file
> >  descriptor, as well as the transfer size and flags are provided in
> > the
> >  "iio_dmabuf" structure.
> >
> > These three IOCTLs have to be performed on the IIO buffer's file
> > descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
> >
>
> Fair enough - so they don't apply to the 'legacy' buffer which
> simplifies
> things but in one place you assume that logic is used (given error
> return
> values).
>
> > Signed-off-by: Paul Cercueil <[email protected]>
> >
> This is big and complex and I'm out of time for now, so I've made
> some
> comments but should revisit it.
> I'm also looking for review from those more familiar with dmabuf side
> of things than I am!
>
> Jonathan
>
>
> >  
> > +static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool
> > nonblock)
> > +{
> > + int ret;
> > +
> > + ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
> > + if (ret) {
> > + if (ret != -EDEADLK)
> > + goto out;
> > + if (nonblock) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + ret = dma_resv_lock_slow_interruptible(dmabuf-
> > >resv, NULL);
> > + }
> > +
> > +out:
> > + return ret;
>
> I'm not a fan gotos that do nothing.  Just return in appropriate
> places above.
>
> > +}
> >
> > +static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair
> > *ib, int *user_req)
> > +{
> > + struct dma_buf_attachment *attach;
> > + struct iio_dmabuf_priv *priv;
> > + struct dma_buf *dmabuf;
> > + int dmabuf_fd, ret = 0;
> > +
> > + if (copy_from_user(&dmabuf_fd, user_req,
> > sizeof(dmabuf_fd)))
> > + return -EFAULT;
> > +
> > + dmabuf = dma_buf_get(dmabuf_fd);
> > + if (IS_ERR(dmabuf))
> > + return PTR_ERR(dmabuf);
> > +
> > + attach = iio_buffer_find_attachment(ib->indio_dev,
> > dmabuf);
> > + if (IS_ERR(attach)) {
> > + ret = PTR_ERR(attach);
> > + goto out_dmabuf_put;
> > + }
> > +
> > + priv = attach->importer_priv;
> > + list_del_init(&priv->entry);
> > +
> > + /*
> > + * Unref twice to release the reference obtained with
> > + * iio_buffer_find_attachment() above, and the one
> > obtained in
> > + * iio_buffer_attach_dmabuf().
> > + */
> > + iio_buffer_dmabuf_put(attach);
> > + iio_buffer_dmabuf_put(attach);
> > +
> > +out_dmabuf_put:
> > + dma_buf_put(dmabuf);
> As below. Feels like a __free(dma_buf_put) bit of magic would be a
> nice to have.

I'm working on the patches right now, just one quick question.

Having a __free(dma_buf_put) requires that dma_buf_put is first
"registered" as a freeing function using DEFINE_FREE() in <linux/dma-
buf.h>, which has not been done yet.

That would mean carrying a dma-buf specific patch in your tree, are you
OK with that?

Cheers,
-Paul

> > +
> > + return ret;
> > +}
> > +
> > +static const char *
> > +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
> > +{
> > + return "iio";
> > +}
> > +
> > +static void iio_buffer_dma_fence_release(struct dma_fence *fence)
> > +{
> > + struct iio_dma_fence *iio_fence =
> > + container_of(fence, struct iio_dma_fence, base);
> > +
> > + kfree(iio_fence);
> > +}
> > +
> > +static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
> > + .get_driver_name =
> > iio_buffer_dma_fence_get_driver_name,
> > + .get_timeline_name =
> > iio_buffer_dma_fence_get_driver_name,
> > + .release = iio_buffer_dma_fence_release,
> > +};
> > +
> > +static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair
> > *ib,
> > +      struct iio_dmabuf __user
> > *iio_dmabuf_req,
> > +      bool nonblock)
> > +{
> > + struct iio_dev *indio_dev = ib->indio_dev;
> > + struct iio_buffer *buffer = ib->buffer;
> > + struct iio_dmabuf iio_dmabuf;
> > + struct dma_buf_attachment *attach;
> > + struct iio_dmabuf_priv *priv;
> > + enum dma_data_direction dir;
> > + struct iio_dma_fence *fence;
> > + struct dma_buf *dmabuf;
> > + struct sg_table *sgt;
> > + unsigned long timeout;
> > + bool dma_to_ram;
> > + bool cyclic;
> > + int ret;
> > +
> > + if (copy_from_user(&iio_dmabuf, iio_dmabuf_req,
> > sizeof(iio_dmabuf)))
> > + return -EFAULT;
> > +
> > + if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
> > + return -EINVAL;
> > +
> > + cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
> > +
> > + /* Cyclic flag is only supported on output buffers */
> > + if (cyclic && buffer->direction !=
> > IIO_BUFFER_DIRECTION_OUT)
> > + return -EINVAL;
> > +
> > + dmabuf = dma_buf_get(iio_dmabuf.fd);
> > + if (IS_ERR(dmabuf))
> > + return PTR_ERR(dmabuf);
> > +
> > + if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used >
> > dmabuf->size) {
> > + ret = -EINVAL;
> > + goto err_dmabuf_put;
> > + }
> > +
> > + attach = iio_buffer_find_attachment(indio_dev, dmabuf);
> > + if (IS_ERR(attach)) {
> > + ret = PTR_ERR(attach);
> > + goto err_dmabuf_put;
>
> Might be worth some cleanup.h magic given this put happens in all
> exit paths.
>
> > + }
> > +
> > + priv = attach->importer_priv;
> > +
> > + dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
> > + dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > + sgt = dma_buf_map_attachment(attach, dir);
> > + if (IS_ERR(sgt)) {
> > + ret = PTR_ERR(sgt);
> > + dev_err(&indio_dev->dev, "Unable to map
> > attachment: %d\n", ret);
> > + goto err_attachment_put;
> > + }
> > +
> > + fence = kmalloc(sizeof(*fence), GFP_KERNEL);
> > + if (!fence) {
> > + ret = -ENOMEM;
> > + goto err_unmap_attachment;
> > + }
> > +
> > + fence->priv = priv;
> > + fence->sgt = sgt;
> > + fence->dir = dir;
> > + priv->fence = fence;
> > +
> > + dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
> > +        &priv->lock, priv->context, 0);
> > +
> > + ret = iio_dma_resv_lock(dmabuf, nonblock);
> > + if (ret)
> > + goto err_fence_put;
> > +
> > + timeout = nonblock ? 0 :
> > msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
> > +
> > + /* Make sure we don't have writers */
> > + ret = (int) dma_resv_wait_timeout(dmabuf->resv,
> > DMA_RESV_USAGE_WRITE,
> > +   true, timeout);
>
> I'd handle this and similar cases as long rather than adding the odd
> looking cast and making
> me think too much about it.
>
> > + if (ret == 0)
> > + ret = -EBUSY;
> > + if (ret < 0)
> > + goto err_resv_unlock;
> > +
> > + if (dma_to_ram) {
> > + /*
> > + * If we're writing to the DMABUF, make sure we
> > don't have
> > + * readers
> > + */
> > + ret = (int) dma_resv_wait_timeout(dmabuf->resv,
> > +  
> > DMA_RESV_USAGE_READ, true,
> > +   timeout);
> > + if (ret == 0)
> > + ret = -EBUSY;
> > + if (ret < 0)
> > + goto err_resv_unlock;
> > + }
> > +
> > + ret = dma_resv_reserve_fences(dmabuf->resv, 1);
> > + if (ret)
> > + goto err_resv_unlock;
> > +
> > + dma_resv_add_fence(dmabuf->resv, &fence->base,
> > +    dma_resv_usage_rw(dma_to_ram));
> > + dma_resv_unlock(dmabuf->resv);
> > +
> > + ret = buffer->access->enqueue_dmabuf(buffer, priv->block,
> > sgt,
> > +     
> > iio_dmabuf.bytes_used, cyclic);
> > + if (ret)
> > + iio_buffer_signal_dmabuf_done(attach, ret);
>
> I'd like a comment on why using the 'successful' path cleanup makes
> sense in this
> error case.  It's possible to figure it out, but reviewers are lazy
> and generally
> we like the cleanup to be obvious and local on error paths.
>
> > +
> > + dma_buf_put(dmabuf);
> > +
> > + return ret;
> > +
> > +err_resv_unlock:
> > + dma_resv_unlock(dmabuf->resv);
> > +err_fence_put:
> > + dma_fence_put(&fence->base);
> > +err_unmap_attachment:
> > + dma_buf_unmap_attachment(attach, sgt, dir);
> > +err_attachment_put:
> > + iio_buffer_dmabuf_put(attach);
> > +err_dmabuf_put:
> > + dma_buf_put(dmabuf);
> > +
> > + return ret;
> > +}
> > +
> > +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment
> > *attach, int ret)
> > +{
> > + struct iio_dmabuf_priv *priv = attach->importer_priv;
> > + struct iio_dma_fence *fence = priv->fence;
> > + enum dma_data_direction dir = fence->dir;
> > + struct sg_table *sgt = fence->sgt;
> > +
> > + dma_fence_get(&fence->base);
>
> I don't know much about dma_fence, but is it valid to access
> contents of it (sgt, etc) before getting a reference?
> Ultimately dma_fence_put() can result in a kfree_rcu() so seems
> unlikely to be safe and definitely fails the 'obviously correct'
> test.  Given those are I assume trivial accesses just do them
> down here perhaps?
>
>
> > + fence->base.error = ret;
> > + dma_fence_signal(&fence->base);
> > + dma_fence_put(&fence->base);
> > +
> > + dma_buf_unmap_attachment(attach, sgt, dir);
> > + iio_buffer_dmabuf_put(attach);
> > +}
> > +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
> > +
>
> > +static long iio_buffer_chrdev_ioctl(struct file *filp,
> > +     unsigned int cmd, unsigned
> > long arg)
> > +{
> > + struct iio_dev_buffer_pair *ib = filp->private_data;
> > + void __user *_arg = (void __user *)arg;
> > +
> > + switch (cmd) {
> > + case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
> > + return iio_buffer_attach_dmabuf(ib, _arg);
> > + case IIO_BUFFER_DMABUF_DETACH_IOCTL:
> > + return iio_buffer_detach_dmabuf(ib, _arg);
> > + case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
> > + return iio_buffer_enqueue_dmabuf(ib, _arg,
> > + filp->f_flags &
> > O_NONBLOCK);
> > + default:
> > + return IIO_IOCTL_UNHANDLED;
>
> Given you aren't using the ioctl handling on the legacy buffer, I
> think this
> should simply return an error code, not the magic value we use to
> indicate
> 'all fine, but it's not mine'.
> Probably -EINVAL or similar.  Note that the wrapper around the legacy
> buffer
> ioctls translates this to -ENODEV; rather than returning from the
> ioctl.
>
>
>
> > + }
> > +}
> > +
> >  static const struct file_operations iio_buffer_chrdev_fileops = {
> >   .owner = THIS_MODULE,
> >   .llseek = noop_llseek,
> >   .read = iio_buffer_read,
> >   .write = iio_buffer_write,
> > + .unlocked_ioctl = iio_buffer_chrdev_ioctl,
> > + .compat_ioctl = compat_ptr_ioctl,
> >   .poll = iio_buffer_poll,
> >   .release = iio_buffer_chrdev_release,
> >  };
>


2024-01-27 16:51:19

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure


> > > + iio_buffer_dmabuf_put(attach);
> > > +
> > > +out_dmabuf_put:
> > > + dma_buf_put(dmabuf);
> > As below. Feels like a __free(dma_buf_put) bit of magic would be a
> > nice to have.
>
> I'm working on the patches right now, just one quick question.
>
> Having a __free(dma_buf_put) requires that dma_buf_put is first
> "registered" as a freeing function using DEFINE_FREE() in <linux/dma-
> buf.h>, which has not been done yet.
>
> That would mean carrying a dma-buf specific patch in your tree, are you
> OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing
so. Alternative is to circle back to this later after this code is upstream.

>
> Cheers,
> -Paul

>


2024-01-29 12:55:24

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
>>>> + iio_buffer_dmabuf_put(attach);
>>>> +
>>>> +out_dmabuf_put:
>>>> + dma_buf_put(dmabuf);
>>> As below. Feels like a __free(dma_buf_put) bit of magic would be a
>>> nice to have.
>> I'm working on the patches right now, just one quick question.
>>
>> Having a __free(dma_buf_put) requires that dma_buf_put is first
>> "registered" as a freeing function using DEFINE_FREE() in <linux/dma-
>> buf.h>, which has not been done yet.
>>
>> That would mean carrying a dma-buf specific patch in your tree, are you
>> OK with that?
> Needs an ACK from appropriate maintainer, but otherwise I'm fine doing
> so. Alternative is to circle back to this later after this code is upstream.

Separate patches for that please, the autocleanup feature is so new that
I'm not 100% convinced that everything works out smoothly from the start.

Regards,
Christian.

>
>> Cheers,
>> -Paul


2024-01-29 13:10:18

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Hi Christian,

Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > + iio_buffer_dmabuf_put(attach);
> > > > > +
> > > > > +out_dmabuf_put:
> > > > > + dma_buf_put(dmabuf);
> > > > As below. Feels like a __free(dma_buf_put) bit of magic would
> > > > be a
> > > > nice to have.
> > > I'm working on the patches right now, just one quick question.
> > >
> > > Having a __free(dma_buf_put) requires that dma_buf_put is first
> > > "registered" as a freeing function using DEFINE_FREE() in
> > > <linux/dma-
> > > buf.h>, which has not been done yet.
> > >
> > > That would mean carrying a dma-buf specific patch in your tree,
> > > are you
> > > OK with that?
> > Needs an ACK from appropriate maintainer, but otherwise I'm fine
> > doing
> > so.  Alternative is to circle back to this later after this code is
> > upstream.
>
> Separate patches for that please, the autocleanup feature is so new
> that
> I'm not 100% convinced that everything works out smoothly from the
> start.

Separate patches is a given, did you mean outside this patchset?
Because I can send a separate patchset that introduces scope-based
management for dma_fence and dma_buf, but then it won't have users.

Cheers,
-Paul

2024-01-29 13:25:21

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> Hi Christian,
>
> Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
>> Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
>>>>>> + iio_buffer_dmabuf_put(attach);
>>>>>> +
>>>>>> +out_dmabuf_put:
>>>>>> + dma_buf_put(dmabuf);
>>>>> As below. Feels like a __free(dma_buf_put) bit of magic would
>>>>> be a
>>>>> nice to have.
>>>> I'm working on the patches right now, just one quick question.
>>>>
>>>> Having a __free(dma_buf_put) requires that dma_buf_put is first
>>>> "registered" as a freeing function using DEFINE_FREE() in
>>>> <linux/dma-
>>>> buf.h>, which has not been done yet.
>>>>
>>>> That would mean carrying a dma-buf specific patch in your tree,
>>>> are you
>>>> OK with that?
>>> Needs an ACK from appropriate maintainer, but otherwise I'm fine
>>> doing
>>> so.  Alternative is to circle back to this later after this code is
>>> upstream.
>> Separate patches for that please, the autocleanup feature is so new
>> that
>> I'm not 100% convinced that everything works out smoothly from the
>> start.
> Separate patches is a given, did you mean outside this patchset?
> Because I can send a separate patchset that introduces scope-based
> management for dma_fence and dma_buf, but then it won't have users.

Outside of the patchset, this is essentially brand new stuff.

IIRC we have quite a number of dma_fence selftests and sw_sync which is
basically code inside the drivers/dma-buf directory only there for
testing DMA-buf functionality.

Convert those over as well and I'm more than happy to upstream this change.

Thanks,
Christian.

>
> Cheers,
> -Paul


2024-01-29 13:33:30

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
> Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> > Hi Christian,
> >
> > Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> > > Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > > > + iio_buffer_dmabuf_put(attach);
> > > > > > > +
> > > > > > > +out_dmabuf_put:
> > > > > > > + dma_buf_put(dmabuf);
> > > > > > As below. Feels like a __free(dma_buf_put) bit of magic
> > > > > > would
> > > > > > be a
> > > > > > nice to have.
> > > > > I'm working on the patches right now, just one quick
> > > > > question.
> > > > >
> > > > > Having a __free(dma_buf_put) requires that dma_buf_put is
> > > > > first
> > > > > "registered" as a freeing function using DEFINE_FREE() in
> > > > > <linux/dma-
> > > > > buf.h>, which has not been done yet.
> > > > >
> > > > > That would mean carrying a dma-buf specific patch in your
> > > > > tree,
> > > > > are you
> > > > > OK with that?
> > > > Needs an ACK from appropriate maintainer, but otherwise I'm
> > > > fine
> > > > doing
> > > > so.  Alternative is to circle back to this later after this
> > > > code is
> > > > upstream.
> > > Separate patches for that please, the autocleanup feature is so
> > > new
> > > that
> > > I'm not 100% convinced that everything works out smoothly from
> > > the
> > > start.
> > Separate patches is a given, did you mean outside this patchset?
> > Because I can send a separate patchset that introduces scope-based
> > management for dma_fence and dma_buf, but then it won't have users.
>
> Outside of the patchset, this is essentially brand new stuff.
>
> IIRC we have quite a number of dma_fence selftests and sw_sync which
> is
> basically code inside the drivers/dma-buf directory only there for
> testing DMA-buf functionality.
>
> Convert those over as well and I'm more than happy to upstream this
> change.

Well there is very little to convert there; you can use scope-based
management when the unref is done in all exit points of the functional
block, and the only place I could find that does that in drivers/dma-
buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.

Cheers,
-Paul

2024-01-29 14:15:29

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

Le lundi 29 janvier 2024 à 14:32 +0100, Paul Cercueil a écrit :
> Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
> > Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> > > Hi Christian,
> > >
> > > Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> > > > Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > > > > + iio_buffer_dmabuf_put(attach);
> > > > > > > > +
> > > > > > > > +out_dmabuf_put:
> > > > > > > > + dma_buf_put(dmabuf);
> > > > > > > As below. Feels like a __free(dma_buf_put) bit of magic
> > > > > > > would
> > > > > > > be a
> > > > > > > nice to have.
> > > > > > I'm working on the patches right now, just one quick
> > > > > > question.
> > > > > >
> > > > > > Having a __free(dma_buf_put) requires that dma_buf_put is
> > > > > > first
> > > > > > "registered" as a freeing function using DEFINE_FREE() in
> > > > > > <linux/dma-
> > > > > > buf.h>, which has not been done yet.
> > > > > >
> > > > > > That would mean carrying a dma-buf specific patch in your
> > > > > > tree,
> > > > > > are you
> > > > > > OK with that?
> > > > > Needs an ACK from appropriate maintainer, but otherwise I'm
> > > > > fine
> > > > > doing
> > > > > so.  Alternative is to circle back to this later after this
> > > > > code is
> > > > > upstream.
> > > > Separate patches for that please, the autocleanup feature is so
> > > > new
> > > > that
> > > > I'm not 100% convinced that everything works out smoothly from
> > > > the
> > > > start.
> > > Separate patches is a given, did you mean outside this patchset?
> > > Because I can send a separate patchset that introduces scope-
> > > based
> > > management for dma_fence and dma_buf, but then it won't have
> > > users.
> >
> > Outside of the patchset, this is essentially brand new stuff.
> >
> > IIRC we have quite a number of dma_fence selftests and sw_sync
> > which
> > is
> > basically code inside the drivers/dma-buf directory only there for
> > testing DMA-buf functionality.
> >
> > Convert those over as well and I'm more than happy to upstream this
> > change.
>
> Well there is very little to convert there; you can use scope-based
> management when the unref is done in all exit points of the
> functional
> block, and the only place I could find that does that in drivers/dma-
> buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.

Actually - not even that, since it doesn't call dma_fence_get() and
dma_fence_put() on the same fence.

So I cannot use it anywhere in drivers/dma-buf/.

-Paul