2011-06-09 12:39:43

by Raju, Sundaram

[permalink] [raw]
Subject: [RFC] dmaengine: add new api for preparing simple slave transfer

SDMA and EDMA are TI SoC specific DMA controllers. Their drivers have been maintained in the respective SoC folders till now.
arch/arm/plat-omap/dma.c
arch/arm/mach-davinci/dma.c

I have gone through the existing offload engine (DMA) drivers in drivers/dma which do slave transfers. I would like to move SDMA and EDMA also to dmaengine framework.

I believe that even though the dmaengine framework addresses and supports most of the required use cases of a client driver to a DMA controller, some extensions are required in it to make it still more generic.

Current framework contains two APIs to prepare for slave transfers:
struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_data_direction direction,
unsigned long flags);

struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
size_t period_len, enum dma_data_direction direction);

and one control API.
int (*device_control)(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
unsigned long arg);

A simple single buffer transfer (i.e. non sg transfer) can be done only as a trivial case of the device_prep_slave_sg API. The client driver is expected to prepare a sg list and provide to the dmaengine API for that single buffer also.

In a slave transfer, the client has to define all the buffer related attributes and the peripheral related attributes.

The above 2 APIs in the dmaengine framework expect all the peripheral/slave related attributes to be filled in the dma_slave_config data structure.
struct dma_slave_config {
enum dma_data_direction direction;
dma_addr_t src_addr;
dma_addr_t dst_addr;
enum dma_slave_buswidth src_addr_width;
enum dma_slave_buswidth dst_addr_width;
u32 src_maxburst;
u32 dst_maxburst;
};

This data structure is passed to the offload engine via the dma_chan data structure in its private pointer.

Now coming to the buffer related attributes, sg list is a nice way to represent a disjoint buffer; all the offload engines in drivers/dma create a descriptor for each of the contiguous chunk in the sg list buffer and pass it to the controller.

But many a times a client may want to transfer from a single buffer to the peripheral and most of the DMA controllers have the capability to transfer data in chunks/frames directly at a stretch. All the attributes of a buffer, which are required for the transfer can be programmed in single descriptor and submitted to the DMA controller.

So I think the slave transfer API must also have a provision to pass the buffer configuration. The buffer attributes have to be passed directly as an argument in the prepare API, unlike dma_slave_config as they will be different for each buffer that is going to be transferred.

It is a stretch and impractical to use a highly segmented buffer (like the one described below) in a sglist. This is because sg list itself is a representation of a disjoint buffer collection in terms of smaller buffers. Now then each of these smaller buffers can have different buffer configurations (like described below) and we are not going to go down that road now.

Hence it makes sense to pass these buffer attributes for only a single buffer transfer and not a sg list. This can be done by
OPTION #1
1. Adding a pointer of the dma_buffer_config data structure in the device_prep_slave_sg API
2. Ensuring that it will be ignored if a sg list passed. Only when a single buffer is passed (in the sg list) then this buffer configuration will be used.
3. Any client that wants to do a sg transfer can simply ignore this buffer configuration and pass NULL.
The main disadvantage of this option is that all the existing drivers need to be updated since the API signature is changed.

Now it might even be better to have a separate API for non sg transfers.
This is OPTION #2
Advantages of this option are
1. No change required in the existing drivers that use device_prep_slave_sg API.
2. If any offload engine wants to prepare a simple buffer transfer differently and not as a trivial case of a sg list, this will be useful. I know this can be done using 2 different implementations inside the device_prep_slave_sg itself, but I think it's cleaner to have different APIs.

I have provided the generic buffer configuration I can think of, here and also a new API definition, if it makes sense to have one.

This buffer configuration might not be completely generic, and hence I ask all of you to please provide comments to improve it.

Generic buffer description:
A generic buffer can be split into number of frames which contain number of chunks inside them. The frames need not be contiguous, nor do the chunks inside a frame.

-------------------------------------------------------------------
| Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 0
-------------------------------------------------------------------
| Inter Frame Gap |
-------------------------------------------------------------------
| Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 1
-------------------------------------------------------------------
| Inter Frame Gap |
-------------------------------------------------------------------
| ........ |
-------------------------------------------------------------------
| Inter Frame Gap |
-------------------------------------------------------------------
| Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame m
-------------------------------------------------------------------

Note: ICG = Inter Chunk Gap.

struct dma_buffer_config {
u32 chunk_size; /* number of bytes in a chunk */
u32 frame_size; /* number of chunks in a frame */
/* u32 n_frames; number of frames in the buffer */
u32 inter_chunk_gap; /* number of bytes between end of a chunk and the start of the next chunk */
u32 inter_frame_gap; /* number of bytes between end of a frame and the start of the next frame */
bool sync_rate; /* 0 - a sync event is required from the peripheral to transfer a chunk
1 - a sync event is required from the peripheral to transfer a frame */
};

The patch to add a new API for single buffer transfers alone:
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
@@ -491,6 +492,10 @@ struct dma_device {
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_data_direction direction,
unsigned long flags);
+ struct dma_async_tx_descriptor *(*device_prep_slave)(
+ struct dma_chan *chan, dma_addr_t buf_addr,
+ unsigned int buf_len, struct dma_buffer_config *buf_prop,
+ enum dma_data_direction direction, unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
size_t period_len, enum dma_data_direction direction);

The number of frames (n_frames) can always be determined using all other values in the dma_buffer_config along with the buffer length (buf_len) passed in the API.
n_frames = buf_len / (chunk_sixe * frame_size);

Regards,
Sundaram


2011-06-09 12:47:45

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

Can you please re-post with sensible wrapping at or before column 72.
I'm not manually reformatting your entire message just so I can find
the relevant bits to reply to.

Thanks.

2011-06-09 16:02:16

by Raju, Sundaram

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

Here it is, with proper line wrapping;

SDMA and EDMA are TI SoC specific DMA controllers. Their drivers have
been maintained in the respective SoC folders till now.
arch/arm/plat-omap/dma.c
arch/arm/mach-davinci/dma.c

I have gone through the existing offload engine (DMA) drivers in
drivers/dma which do slave transfers.
I would like to move SDMA and EDMA also to dmaengine framework.

I believe that even though the dmaengine framework addresses and
supports most of the required use cases of a client driver to a DMA
controller, some extensions are required in it to make it still more
generic.

Current framework contains two APIs to prepare for slave transfers:

struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_data_direction direction,
unsigned long flags);

struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
size_t period_len, enum dma_data_direction direction);

and one control API.
int (*device_control)(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
unsigned long arg);

A simple single buffer transfer (i.e. non sg transfer) can be done only
as a trivial case of the device_prep_slave_sg API. The client driver is
expected to prepare a sg list and provide to the dmaengine API for that
single buffer also.

In a slave transfer, the client has to define all the buffer related
attributes and the peripheral related attributes.

The above 2 APIs in the dmaengine framework expect all the
peripheral/slave related attributes to be filled in the
dma_slave_config data structure.
struct dma_slave_config {
enum dma_data_direction direction;
dma_addr_t src_addr;
dma_addr_t dst_addr;
enum dma_slave_buswidth src_addr_width;
enum dma_slave_buswidth dst_addr_width;
u32 src_maxburst;
u32 dst_maxburst;
};

This data structure is passed to the offload engine via the dma_chan
data structure in its private pointer.

Now coming to the buffer related attributes, sg list is a nice way to
represent a disjoint buffer; all the offload engines in drivers/dma
create a descriptor for each of the contiguous chunk in the sg list
buffer and pass it to the controller.

But many a times a client may want to transfer from a single buffer to
the peripheral and most of the DMA controllers have the capability to
transfer data in chunks/frames directly at a stretch.
All the attributes of a buffer, which are required for the transfer
can be programmed in single descriptor and submitted to the
DMA controller.

So I think the slave transfer API must also have a provision to pass
the buffer configuration. The buffer attributes have to be passed
directly as an argument in the prepare API, unlike dma_slave_config
as they will be different for each buffer that is going to be
transferred.

It is a stretch and impractical to use a highly segmented buffer (like
the one described below) in a sglist. This is because sg list itself
is a representation of a disjoint buffer collection in terms of
smaller buffers. Now then each of these smaller buffers can have
different buffer configurations (like described below) and we are not
going to go down that road now.

Hence it makes sense to pass these buffer attributes for only a single
buffer transfer and not a sg list.
This can be done by OPTION #1
1. Adding a pointer of the dma_buffer_config data structure in the
device_prep_slave_sg API.
2. Ensuring that it will be ignored if a sg list passed.
Only when a single buffer is passed (in the sg list) then this buffer
configuration will be used.
3. Any client that wants to do a sg transfer can simply ignore this
buffer configuration and pass NULL.
The main disadvantage of this option is that all the existing drivers
need to be updated since the API signature is changed.

It might even be better to have a separate API for non sg transfers.
This is OPTION #2
Advantages of this option are
1. No change required in the existing drivers that use
device_prep_slave_sg API.
2. If any offload engine wants to prepare a simple buffer transfer
differently and not as a trivial case of a sg list,
this will be useful.
I know this can be done using 2 different implementations inside the
device_prep_slave_sg itself, but I think it's cleaner to have
different APIs. I have provided the generic buffer configuration
I can think of, here and also a new API definition,
if it makes sense to have one. This buffer configuration might not
be completely generic, and hence I ask all of you to please provide
comments to improve it.

Generic buffer description:
A generic buffer can be split into number of frames which contain
number of chunks inside them. The frames need not be contiguous,
nor do the chunks inside a frame.

-------------------------------------------------------------------
| Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 0
-------------------------------------------------------------------
| Inter Frame Gap |
-------------------------------------------------------------------
| Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 1
-------------------------------------------------------------------
| Inter Frame Gap |
-------------------------------------------------------------------
| ........ |
-------------------------------------------------------------------
| Inter Frame Gap |
-------------------------------------------------------------------
| Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame m
-------------------------------------------------------------------

Note: ICG = Inter Chunk Gap.

struct dma_buffer_config {
u32 chunk_size; /* number of bytes in a chunk */
u32 frame_size; /* number of chunks in a frame */
/* u32 n_frames; number of frames in the buffer */
u32 inter_chunk_gap; /* number of bytes between end of a chunk
and the start of the next chunk */
u32 inter_frame_gap; /* number of bytes between end of a frame
and the start of the next frame */
bool sync_rate; /* 0 - a sync event is required from the
peripheral to transfer a chunk
1 - a sync event is required from the
peripheral to transfer a frame */
};

The patch to add a new API for single buffer transfers alone:
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
@@ -491,6 +492,10 @@ struct dma_device {
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_data_direction direction,
unsigned long flags);
+ struct dma_async_tx_descriptor *(*device_prep_slave)(
+ struct dma_chan *chan, dma_addr_t buf_addr,
+ unsigned int buf_len, void *buf_prop,
+ enum dma_data_direction direction, unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
size_t period_len, enum dma_data_direction direction);

The number of frames (n_frames) can always be determined using all
other values in the dma_buffer_config along with the
buffer length (buf_len) passed in the API.
n_frames = buf_len / (chunk_sixe * frame_size);

Regards,
Sundaram

-----Original Message-----
From: Russell King - ARM Linux [mailto:[email protected]]
Sent: Thursday, June 09, 2011 6:17 PM
To: Raju, Sundaram
Cc: [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

Can you please re-post with sensible wrapping at or before column 72.
I'm not manually reformatting your entire message just so I can find
the relevant bits to reply to.

Thanks.

2011-06-09 16:33:30

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

On Thu, Jun 09, 2011 at 09:31:56PM +0530, Raju, Sundaram wrote:
> Here it is, with proper line wrapping;

Thanks. This is much easier to reply to.

> I believe that even though the dmaengine framework addresses and
> supports most of the required use cases of a client driver to a DMA
> controller, some extensions are required in it to make it still more
> generic.
>
> Current framework contains two APIs to prepare for slave transfers:
>
> struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
> struct dma_chan *chan, struct scatterlist *sgl,
> unsigned int sg_len, enum dma_data_direction direction,
> unsigned long flags);
>
> struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
> struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> size_t period_len, enum dma_data_direction direction);
>
> and one control API.
> int (*device_control)(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
> unsigned long arg);
>
> A simple single buffer transfer (i.e. non sg transfer) can be done only
> as a trivial case of the device_prep_slave_sg API. The client driver is
> expected to prepare a sg list and provide to the dmaengine API for that
> single buffer also.

We can avoid preparing a sg list in every driver which wants this by
having dmaengine.h provide a helper for this case:

static inline dma_async_tx_descriptor *dmaengine_prep_slave_single(
struct dma_chan *chan, void *buf, size_t len,
enum dma_data_direction dir, unsigned long flags)
{
struct scatterlist sg;

sg_init_one(&sg, buf, len);

return chan->device->device_prep_slave_sg(chan, &sg, 1, dir, flags);
}

I think also providing dmaengine_prep_slave_sg() and
dmaengine_prep_dma_cyclic() as wrappers to hide the chan->device->blah
bit would also be beneficial (and helps keep that detail out of the
users of DMA engine.)

> In a slave transfer, the client has to define all the buffer related
> attributes and the peripheral related attributes.

I'd argue that it is incorrect to have drivers specify the buffer
related attributes - that makes the API unnecessarily complicated
to use, requiring two calls (one to configure the channel, and one
to prepare the transfer) each time it needs to be used.

Not only that but the memory side of the transfer should be no business
of the driver - the driver itself should only specify the attributes
for the device being driven. The attributes for the memory side of the
transfer should be a property of the DMA engine itself.

I would like to see in the long term the dma_slave_config structure
lose its 'direction' argument, and the rest of the parameters used to
define the device side parameters only.

This will allow the channel to be configured once when its requested,
and then the prepare call can configure the direction as required.

> Now coming to the buffer related attributes, sg list is a nice way to
> represent a disjoint buffer; all the offload engines in drivers/dma
> create a descriptor for each of the contiguous chunk in the sg list
> buffer and pass it to the controller.

The sg list is the standard Linux way to describe scattered buffers.

> But many a times a client may want to transfer from a single buffer to
> the peripheral and most of the DMA controllers have the capability to
> transfer data in chunks/frames directly at a stretch.

So far, I've only seen DMA controllers which operate on a linked list of
source, destination, length, attributes, and next entry pointer.

> All the attributes of a buffer, which are required for the transfer
> can be programmed in single descriptor and submitted to the
> DMA controller.

I'm not sure that this is useful - in order to make use of the data, the
data would have to be copied in between the descriptors - and doesn't that
rather negate the point of DMA if you have to memcpy() the data around?

Isn't it far more efficient to have DMA place the data exactly where it's
required in memory right from the start without any memcpy() operations?

Can you explain where and how you would use something like this:

> -------------------------------------------------------------------
> | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 0
> -------------------------------------------------------------------
> | Inter Frame Gap |
> -------------------------------------------------------------------
> | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 1
> -------------------------------------------------------------------
> | Inter Frame Gap |
> -------------------------------------------------------------------
> | ........ |
> -------------------------------------------------------------------
> | Inter Frame Gap |
> -------------------------------------------------------------------
> | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame m
> -------------------------------------------------------------------

Can you also explain what a chunk contains, what a frame contains,
how they're different and how they're processed? Where does the
data to be transferred to the device live?

Also, bear in mind that in Linux, we try hard to avoid allocating
buffers larger than one page at a time - as soon as we get to multiple
contiguous page buffers, allocations can start failing because of
memory fragmentation. This is why we much prefer scatterlists over
single contiguous buffers for DMA.

2011-06-09 18:58:30

by Jassi Brar

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

On Thu, Jun 9, 2011 at 6:09 PM, Raju, Sundaram <[email protected]> wrote:

> Generic buffer description:
> A generic buffer can be split into number of frames which contain number of chunks inside them. The frames need not be contiguous, nor do the chunks inside a frame.
>
>        -------------------------------------------------------------------
>        | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n |       Frame 0
>        -------------------------------------------------------------------
>        |                       Inter Frame Gap                     |
>        -------------------------------------------------------------------
>        | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n |       Frame 1
>        -------------------------------------------------------------------
>        |                       Inter Frame Gap                     |
>        -------------------------------------------------------------------
>        |                               ........                                    |
>        -------------------------------------------------------------------
>        |                       Inter Frame Gap                     |
>        -------------------------------------------------------------------
>        | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n |       Frame m
>        -------------------------------------------------------------------

IIUC the above figure, the work done by DMA controller remains the
same, either by
passing this as a transfer of the new type or as a normal sg-list - unless
the DMAC driver attempts to reorder the transfers or the DMAC h/w
natively supports
some form of sg-list.
For DMACs, that have no special support, different representation
wouldn't make a
difference.
And if the DMAC does support the kind of fancy scatter-gather, it
should be possible for
the dma api driver to analyze the submitted 'normal' sg-list and
program the transfers at one go.
Besides, it should be possible to have a 'template' sequence of
requests prepared
already because usually, for above mentioned scenario, the parameters
don't change across
items in a list.

2011-06-10 06:44:25

by Vinod Koul

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

On Thu, 2011-06-09 at 21:31 +0530, Raju, Sundaram wrote:
> Here it is, with proper line wrapping;
Please cc the respective MAINTAINERS, added Dan...

>
> SDMA and EDMA are TI SoC specific DMA controllers. Their drivers have
> been maintained in the respective SoC folders till now.
> arch/arm/plat-omap/dma.c
> arch/arm/mach-davinci/dma.c
>
> I have gone through the existing offload engine (DMA) drivers in
> drivers/dma which do slave transfers.
> I would like to move SDMA and EDMA also to dmaengine framework.
>
> I believe that even though the dmaengine framework addresses and
> supports most of the required use cases of a client driver to a DMA
> controller, some extensions are required in it to make it still more
> generic.
>
> Current framework contains two APIs to prepare for slave transfers:
>
> struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
> struct dma_chan *chan, struct scatterlist *sgl,
> unsigned int sg_len, enum dma_data_direction direction,
> unsigned long flags);
>
> struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
> struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> size_t period_len, enum dma_data_direction direction);
>
> and one control API.
> int (*device_control)(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
> unsigned long arg);
>
> A simple single buffer transfer (i.e. non sg transfer) can be done only
> as a trivial case of the device_prep_slave_sg API. The client driver is
> expected to prepare a sg list and provide to the dmaengine API for that
> single buffer also.
>
> In a slave transfer, the client has to define all the buffer related
> attributes and the peripheral related attributes.
>
> The above 2 APIs in the dmaengine framework expect all the
> peripheral/slave related attributes to be filled in the
> dma_slave_config data structure.
> struct dma_slave_config {
> enum dma_data_direction direction;
> dma_addr_t src_addr;
> dma_addr_t dst_addr;
> enum dma_slave_buswidth src_addr_width;
> enum dma_slave_buswidth dst_addr_width;
> u32 src_maxburst;
> u32 dst_maxburst;
> };
>
> This data structure is passed to the offload engine via the dma_chan
> data structure in its private pointer.
No, this is passed thru control API you described above. Please read
Documentation/dmaengine.txt

> Now coming to the buffer related attributes, sg list is a nice way to
> represent a disjoint buffer; all the offload engines in drivers/dma
> create a descriptor for each of the contiguous chunk in the sg list
> buffer and pass it to the controller.
>
> But many a times a client may want to transfer from a single buffer to
> the peripheral and most of the DMA controllers have the capability to
> transfer data in chunks/frames directly at a stretch.
> All the attributes of a buffer, which are required for the transfer
> can be programmed in single descriptor and submitted to the
> DMA controller.
>
> So I think the slave transfer API must also have a provision to pass
> the buffer configuration. The buffer attributes have to be passed
> directly as an argument in the prepare API, unlike dma_slave_config
> as they will be different for each buffer that is going to be
> transferred.
Can you articulate what attributes you are talking about. The
dma_slave_config parameters don't represent buffer attributes. They
represent the dma attributes on how you want to transfer. These things
like bus widths, burst lengths are usually constant for the slave
transfers, not sure why they should change per buffer?

>
> It is a stretch and impractical to use a highly segmented buffer (like
> the one described below) in a sglist. This is because sg list itself
> is a representation of a disjoint buffer collection in terms of
> smaller buffers. Now then each of these smaller buffers can have
> different buffer configurations (like described below) and we are not
> going to go down that road now.
well thats the linux idea, you use sg-list to describe this
segmentation...

>
> Hence it makes sense to pass these buffer attributes for only a single
> buffer transfer and not a sg list.
> This can be done by OPTION #1
> 1. Adding a pointer of the dma_buffer_config data structure in the
> device_prep_slave_sg API.
> 2. Ensuring that it will be ignored if a sg list passed.
> Only when a single buffer is passed (in the sg list) then this buffer
> configuration will be used.
> 3. Any client that wants to do a sg transfer can simply ignore this
> buffer configuration and pass NULL.
> The main disadvantage of this option is that all the existing drivers
> need to be updated since the API signature is changed.
>
> It might even be better to have a separate API for non sg transfers.
> This is OPTION #2
> Advantages of this option are
> 1. No change required in the existing drivers that use
> device_prep_slave_sg API.
> 2. If any offload engine wants to prepare a simple buffer transfer
> differently and not as a trivial case of a sg list,
> this will be useful.
> I know this can be done using 2 different implementations inside the
> device_prep_slave_sg itself, but I think it's cleaner to have
> different APIs. I have provided the generic buffer configuration
> I can think of, here and also a new API definition,
> if it makes sense to have one. This buffer configuration might not
> be completely generic, and hence I ask all of you to please provide
> comments to improve it.
>
> Generic buffer description:
> A generic buffer can be split into number of frames which contain
> number of chunks inside them. The frames need not be contiguous,
> nor do the chunks inside a frame.
>
> -------------------------------------------------------------------
> | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 0
> -------------------------------------------------------------------
> | Inter Frame Gap |
> -------------------------------------------------------------------
> | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 1
> -------------------------------------------------------------------
> | Inter Frame Gap |
> -------------------------------------------------------------------
> | ........ |
> -------------------------------------------------------------------
> | Inter Frame Gap |
> -------------------------------------------------------------------
> | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame m
> -------------------------------------------------------------------
>
> Note: ICG = Inter Chunk Gap.
>
> struct dma_buffer_config {
> u32 chunk_size; /* number of bytes in a chunk */
> u32 frame_size; /* number of chunks in a frame */
> /* u32 n_frames; number of frames in the buffer */
> u32 inter_chunk_gap; /* number of bytes between end of a chunk
> and the start of the next chunk */
> u32 inter_frame_gap; /* number of bytes between end of a frame
> and the start of the next frame */
> bool sync_rate; /* 0 - a sync event is required from the
> peripheral to transfer a chunk
> 1 - a sync event is required from the
> peripheral to transfer a frame */
> };
>
> The patch to add a new API for single buffer transfers alone:
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> @@ -491,6 +492,10 @@ struct dma_device {
> struct dma_chan *chan, struct scatterlist *sgl,
> unsigned int sg_len, enum dma_data_direction direction,
> unsigned long flags);
> + struct dma_async_tx_descriptor *(*device_prep_slave)(
> + struct dma_chan *chan, dma_addr_t buf_addr,
> + unsigned int buf_len, void *buf_prop,
> + enum dma_data_direction direction, unsigned long flags);
> struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
> struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> size_t period_len, enum dma_data_direction direction);
>
> The number of frames (n_frames) can always be determined using all
> other values in the dma_buffer_config along with the
> buffer length (buf_len) passed in the API.
> n_frames = buf_len / (chunk_sixe * frame_size);
>
> Regards,
> Sundaram
>
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:[email protected]]
> Sent: Thursday, June 09, 2011 6:17 PM
> To: Raju, Sundaram
> Cc: [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer
>
> Can you please re-post with sensible wrapping at or before column 72.
> I'm not manually reformatting your entire message just so I can find
> the relevant bits to reply to.
>
> Thanks.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


--
~Vinod

2011-06-10 06:48:42

by Vinod Koul

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

On Thu, 2011-06-09 at 17:32 +0100, Russell King - ARM Linux wrote:
> On Thu, Jun 09, 2011 at 09:31:56PM +0530, Raju, Sundaram wrote:
> > Here it is, with proper line wrapping;
>
> Thanks. This is much easier to reply to.
>
> > I believe that even though the dmaengine framework addresses and
> > supports most of the required use cases of a client driver to a DMA
> > controller, some extensions are required in it to make it still more
> > generic.
> >
> > Current framework contains two APIs to prepare for slave transfers:
> >
> > struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
> > struct dma_chan *chan, struct scatterlist *sgl,
> > unsigned int sg_len, enum dma_data_direction direction,
> > unsigned long flags);
> >
> > struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
> > struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> > size_t period_len, enum dma_data_direction direction);
> >
> > and one control API.
> > int (*device_control)(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
> > unsigned long arg);
> >
> > A simple single buffer transfer (i.e. non sg transfer) can be done only
> > as a trivial case of the device_prep_slave_sg API. The client driver is
> > expected to prepare a sg list and provide to the dmaengine API for that
> > single buffer also.
>
> We can avoid preparing a sg list in every driver which wants this by
> having dmaengine.h provide a helper for this case:
>
> static inline dma_async_tx_descriptor *dmaengine_prep_slave_single(
> struct dma_chan *chan, void *buf, size_t len,
> enum dma_data_direction dir, unsigned long flags)
> {
> struct scatterlist sg;
>
> sg_init_one(&sg, buf, len);
>
> return chan->device->device_prep_slave_sg(chan, &sg, 1, dir, flags);
> }
That sounds good...

>
> I think also providing dmaengine_prep_slave_sg() and
> dmaengine_prep_dma_cyclic() as wrappers to hide the chan->device->blah
> bit would also be beneficial (and helps keep that detail out of the
> users of DMA engine.)
>
> > In a slave transfer, the client has to define all the buffer related
> > attributes and the peripheral related attributes.
>
> I'd argue that it is incorrect to have drivers specify the buffer
> related attributes - that makes the API unnecessarily complicated
> to use, requiring two calls (one to configure the channel, and one
> to prepare the transfer) each time it needs to be used.
>
> Not only that but the memory side of the transfer should be no business
> of the driver - the driver itself should only specify the attributes
> for the device being driven. The attributes for the memory side of the
> transfer should be a property of the DMA engine itself.
>
> I would like to see in the long term the dma_slave_config structure
> lose its 'direction' argument, and the rest of the parameters used to
> define the device side parameters only.
I am not sure why direction flag is required and can be done away with.
Both sg and cyclic API have a direction parameter and that should be
used. A channel can be used in any direction client wishes to.
>
> This will allow the channel to be configured once when its requested,
> and then the prepare call can configure the direction as required.
>
> > Now coming to the buffer related attributes, sg list is a nice way to
> > represent a disjoint buffer; all the offload engines in drivers/dma
> > create a descriptor for each of the contiguous chunk in the sg list
> > buffer and pass it to the controller.
>
> The sg list is the standard Linux way to describe scattered buffers.
>
> > But many a times a client may want to transfer from a single buffer to
> > the peripheral and most of the DMA controllers have the capability to
> > transfer data in chunks/frames directly at a stretch.
>
> So far, I've only seen DMA controllers which operate on a linked list of
> source, destination, length, attributes, and next entry pointer.
>
> > All the attributes of a buffer, which are required for the transfer
> > can be programmed in single descriptor and submitted to the
> > DMA controller.
>
> I'm not sure that this is useful - in order to make use of the data, the
> data would have to be copied in between the descriptors - and doesn't that
> rather negate the point of DMA if you have to memcpy() the data around?
>
> Isn't it far more efficient to have DMA place the data exactly where it's
> required in memory right from the start without any memcpy() operations?
>
> Can you explain where and how you would use something like this:
>
> > -------------------------------------------------------------------
> > | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 0
> > -------------------------------------------------------------------
> > | Inter Frame Gap |
> > -------------------------------------------------------------------
> > | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 1
> > -------------------------------------------------------------------
> > | Inter Frame Gap |
> > -------------------------------------------------------------------
> > | ........ |
> > -------------------------------------------------------------------
> > | Inter Frame Gap |
> > -------------------------------------------------------------------
> > | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame m
> > -------------------------------------------------------------------
>
> Can you also explain what a chunk contains, what a frame contains,
> how they're different and how they're processed? Where does the
> data to be transferred to the device live?
>
> Also, bear in mind that in Linux, we try hard to avoid allocating
> buffers larger than one page at a time - as soon as we get to multiple
> contiguous page buffers, allocations can start failing because of
> memory fragmentation. This is why we much prefer scatterlists over
> single contiguous buffers for DMA.


--
~Vinod

2011-06-10 10:22:14

by Raju, Sundaram

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

I think I should have tried to explain my case with a specific example.
I tried to generalize it and it has confused and misled every one.
I have tried again now. :)

<snip>
> > > A simple single buffer transfer (i.e. non sg transfer) can be done only
> > > as a trivial case of the device_prep_slave_sg API. The client driver is
> > > expected to prepare a sg list and provide to the dmaengine API for that
> > > single buffer also.
> >
> > We can avoid preparing a sg list in every driver which wants this by
> > having dmaengine.h provide a helper for this case:
> >
> > static inline dma_async_tx_descriptor *dmaengine_prep_slave_single(
> > struct dma_chan *chan, void *buf, size_t len,
> > enum dma_data_direction dir, unsigned long flags)
> > {
> > struct scatterlist sg;
> >
> > sg_init_one(&sg, buf, len);
> >
> > return chan->device->device_prep_slave_sg(chan, &sg, 1, dir, flags);
> > }
> That sounds good...

Yes, this should be sufficient if the only aim is to avoid preparing a sg list
in every driver which wants to transfer a single buffer.

But my main aim was to add more buffer related details to the API.
I have tried to explain a use case below, where this will be useful.

<snip>
> >
> > I think also providing dmaengine_prep_slave_sg() and
> > dmaengine_prep_dma_cyclic() as wrappers to hide the chan-> >device->blah
> > bit would also be beneficial (and helps keep that detail out of the
> > users of DMA engine.)
> >

Yes, this would really be helpful if someone is just looking at the
client drivers and are not familiar with the dmaengine's
internal data structures.

> > > In a slave transfer, the client has to define all the buffer related
> > > attributes and the peripheral related attributes.
> >
> > I'd argue that it is incorrect to have drivers specify the buffer
> > related attributes - that makes the API unnecessarily complicated
> > to use, requiring two calls (one to configure the channel, and one
> > to prepare the transfer) each time it needs to be used.

I am not able to understand why 2 calls will be required?
Client configures the slave using dma_slave_config only once.
If the direction flag is removed then, this configuration doesn’t
have to be modified at all.

<quote>
> So I think the slave transfer API must also have a provision to pass
> the buffer configuration. The buffer attributes have to be passed
> directly as an argument in the prepare API, unlike dma_slave_config
> as they will be different for each buffer that is going to be
> transferred.
</quote>

The API is called only once in the current framework to prepare
the descriptor.
After adding any extra arguments in the prepare API, client will
still have to call it only once.

> >
> > Not only that but the memory side of the transfer should be no business
> > of the driver - the driver itself should only specify the attributes
> > for the device being driven. The attributes for the memory side of the
> > transfer should be a property of the DMA engine itself.
> >

Yes Russell, you are correct. Attributes of the memory side should
be a property of DMA engine itself.

What is meant here is that, client has told the DMAC how to
write to/ read from the slave peripheral by defining all the slave
properties in the dma_slave_config.

It is assumed that the buffer side has to be read/written to
byte after byte continuously by the DMAC.
I wanted to say client must have provisions to pass the details
on how it wants the DMAC to read/ write to the
buffer, if the capability is available in the DMAC.

DMACs have the capability to auto increment the
source/destination address accordingly after transferring
a byte of data and they automatically read/write to the
next byte location. In some DMACs you can program an offset also.
They read x bytes, skip y bytes and read the next x bytes.
This detail will be passed to the client driver by the application.
Then it is for the slave driver to communicate this to the
DMA engine, right?

> > I would like to see in the long term the dma_slave_config structure
> > lose its 'direction' argument, and the rest of the parameters used to
> > define the device side parameters only.
> I am not sure why direction flag is required and can be done away with.
> Both sg and cyclic API have a direction parameter and that should be
> used. A channel can be used in any direction client wishes to.

I also agree. The direction flag in the dma_slave_config is redundant.
As Russell had already mentioned in another thread, this redundancy
forces DMAC drivers to check if the direction flag in the
dma_slave_config is same as that passed to the prepare API.

Also client drivers have to keep modifying the dma_slave_config
every time before they make a direction change in transfer.

> >
> > > But many a times a client may want to transfer from a single buffer to
> > > the peripheral and most of the DMA controllers have the capability to
> > > transfer data in chunks/frames directly at a stretch.
> >
> > So far, I've only seen DMA controllers which operate on a linked list of
> > source, destination, length, attributes, and next entry pointer.
> >
> > > All the attributes of a buffer, which are required for the transfer
> > > can be programmed in single descriptor and submitted to the
> > > DMA controller.
> >
> > I'm not sure that this is useful - in order to make use of the data, the
> > data would have to be copied in between the descriptors - and doesn't that
> > rather negate the point of DMA if you have to memcpy() the data around?
> >
> > Isn't it far more efficient to have DMA place the data exactly where it's
> > required in memory right from the start without any memcpy() operations?
> >
> > Can you explain where and how you would use something like this:
> >

Consider a simple video use case of de-interlacing a video buffer.
Odd lines have to be transferred first, and then the even lines are
transferred separately. This can be done by programming the
inter frame gap as the line size of the video buffer in the DMAC.
This will require you to have only 2 descriptors, one for the
odd lines and another for the even lines. This results in only
2 descriptors being written to DMAC registers.

But if we have to construct a sglist for this, we end up having to
write the descriptors to DMAC as many times as the
number of lines in the video buffer.

To simply put it in other words, current DMA engine supports
1D transfers alone. Though sg list may be 2D, but again it the
DMACs are programmed for multiple 1D transfers again and
again in a sg case. If the DMAC supports multi dimensional
transfer, that feature cannot be utilized by using the current
DMA engine APIs.

> > -------------------------------------------------------------------
> > | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 0
> > -------------------------------------------------------------------
> > | Inter Frame Gap |
> > -------------------------------------------------------------------
> > | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame 1
> > -------------------------------------------------------------------
> > | Inter Frame Gap |
> > -------------------------------------------------------------------
> > | ........ |
> > -------------------------------------------------------------------
> > | Inter Frame Gap |
> > -------------------------------------------------------------------
> > | Chunk 0 |ICG| Chunk 1 |ICG| ... |ICG| Chunk n | Frame m
> > -------------------------------------------------------------------
>
> Can you also explain what a chunk contains, what a frame contains,
> how they're different and how they're processed? Where does the
> data to be transferred to the device live?
>
> >
> > Note: ICG = Inter Chunk Gap.
> >
> > struct dma_buffer_config {
> > u32 chunk_size; /* number of bytes in a chunk */
> > u32 frame_size; /* number of chunks in a frame */
> > /* u32 n_frames; number of frames in the buffer */
> > u32 inter_chunk_gap; /* number of bytes between end of a chunk
> > and the start of the next chunk */
> > u32 inter_frame_gap; /* number of bytes between end of a frame
> > and the start of the next frame */
> > bool sync_rate; /* 0 - a sync event is required from the
> > peripheral to transfer a chunk
> > 1 - a sync event is required from the
> > peripheral to transfer a frame */
> > };
> >

I have tried to represent a 3 dimensional buffer here.
Each chunk is a 1D buffer with 'x' bytes.
Each frame is a 2D buffer with 'n' chunks.
The overall buffer is 3D with 'm' frames.

Actually we can deduce the chunk_size from the
dma_slave_config itself. It is either the src_addr_width or
dst_addr_width based on the direction. Because at a stretch
DMAC cannot transfer more than the slave register width.

All these values in the dma_buffer_config proposed above
can be programmed in the DMAC through a single descriptor
if the DMAC is 3D transfer capable.

If the client has to do this type of transfer using sg list
then it will end up having a sglist of m*n entries and that
many descriptors have to be programmed on the DMAC

>
> Also, bear in mind that in Linux, we try hard to avoid allocating
> buffers larger than one page at a time - as soon as we get to multiple
> contiguous page buffers, allocations can start failing because of
> memory fragmentation. This is why we much prefer scatterlists over
> single contiguous buffers for DMA.

Regards,
Sundaram
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2011-06-10 10:43:44

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

On Fri, Jun 10, 2011 at 03:51:41PM +0530, Raju, Sundaram wrote:
> Consider a simple video use case of de-interlacing a video buffer.
> Odd lines have to be transferred first, and then the even lines are
> transferred separately. This can be done by programming the
> inter frame gap as the line size of the video buffer in the DMAC.
> This will require you to have only 2 descriptors, one for the
> odd lines and another for the even lines. This results in only
> 2 descriptors being written to DMAC registers.

How would this be handled with DMACs which can't 'skip' bytes in the
buffer? You would have to generate a list of LLIs separately
describing each 'line' of video and chain them together.

How do you handle the situation where a driver uses your new proposed
API, but it doesn't support that in hardware.

> Actually we can deduce the chunk_size from the
> dma_slave_config itself. It is either the src_addr_width or
> dst_addr_width based on the direction. Because at a stretch
> DMAC cannot transfer more than the slave register width.

I think you're misinterpreting those fields. (dst|src)_addr_width tells
the DMA controller the width of each transaction - whether to issue a
byte, half-word, word or double-word read or write to the peripheral.
It doesn't say how many of those to issue, it just says what the
peripheral access size is to be.

In other words, they describe the width of the FIFO register.

2011-06-10 11:14:17

by Raju, Sundaram

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

Vinod,

> -----Original Message-----
> From: Koul, Vinod [mailto:[email protected]]
> Sent: Friday, June 10, 2011 11:39 AM
> To: Raju, Sundaram; Dan
> Cc: Russell King - ARM Linux; davinci-linux-open-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer
>

<snip>
> >
> > The above 2 APIs in the dmaengine framework expect all the
> > peripheral/slave related attributes to be filled in the
> > dma_slave_config data structure.
> > struct dma_slave_config {
> > enum dma_data_direction direction;
> > dma_addr_t src_addr;
> > dma_addr_t dst_addr;
> > enum dma_slave_buswidth src_addr_width;
> > enum dma_slave_buswidth dst_addr_width;
> > u32 src_maxburst;
> > u32 dst_maxburst;
> > };
> >
> > This data structure is passed to the offload engine via the dma_chan
> > data structure in its private pointer.
> No, this is passed thru control API you described above. Please read
> Documentation/dmaengine.txt

Yes, Vinod its my mistake. I wanted to say control API only,
but just mixed it up with how the dma_slave_config is attached
to each dma_chan so that the offload drivers can use it while
preparing the descriptors.

>
> > Now coming to the buffer related attributes, sg list is a nice way to
> > represent a disjoint buffer; all the offload engines in drivers/dma
> > create a descriptor for each of the contiguous chunk in the sg list
> > buffer and pass it to the controller.
> >
> > But many a times a client may want to transfer from a single buffer to
> > the peripheral and most of the DMA controllers have the capability to
> > transfer data in chunks/frames directly at a stretch.
> > All the attributes of a buffer, which are required for the transfer
> > can be programmed in single descriptor and submitted to the
> > DMA controller.
> >
> > So I think the slave transfer API must also have a provision to pass
> > the buffer configuration. The buffer attributes have to be passed
> > directly as an argument in the prepare API, unlike dma_slave_config
> > as they will be different for each buffer that is going to be
> > transferred.
> Can you articulate what attributes you are talking about. The
> dma_slave_config parameters don't represent buffer attributes. They
> represent the dma attributes on how you want to transfer. These things
> like bus widths, burst lengths are usually constant for the slave
> transfers, not sure why they should change per buffer?
>

I have tried to explain the attributes in the previous mail
I posted in this thread.

Yes, buffer attributes should not be represented in
the dma_slave_config. It is for slave related data.
That is why had mentioned that buffer configuration should be
a separate structure and passed in the prepare API.
See quoted below:

<quote>
> struct dma_buffer_config {
> u32 chunk_size; /* number of bytes in a chunk */
> u32 frame_size; /* number of chunks in a frame */
> /* u32 n_frames; number of frames in the buffer */
> u32 inter_chunk_gap; /* number of bytes between end of a chunk
> and the start of the next chunk */
> u32 inter_frame_gap; /* number of bytes between end of a frame
> and the start of the next frame */
> bool sync_rate; /* 0 - a sync event is required from the
> peripheral to transfer a chunk
> 1 - a sync event is required from the
> peripheral to transfer a frame */
> };
>
> The patch to add a new API for single buffer transfers alone:
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> @@ -491,6 +492,10 @@ struct dma_device {
> struct dma_chan *chan, struct scatterlist *sgl,
> unsigned int sg_len, enum dma_data_direction direction,
> unsigned long flags);
> + struct dma_async_tx_descriptor *(*device_prep_slave)(
> + struct dma_chan *chan, dma_addr_t buf_addr,
> + unsigned int buf_len, void *buf_prop,
> + enum dma_data_direction direction, unsigned long flags);
> struct dma_async_tx_descriptor *(*device_prep_dma_cyclic)(
> struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> size_t period_len, enum dma_data_direction direction);
>
</quote>

Regards,
Sundaram
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2011-06-10 11:49:12

by Raju, Sundaram

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

Russell,

> -----Original Message-----
> From: Russell King - ARM Linux [mailto:[email protected]]
> Sent: Friday, June 10, 2011 4:13 PM
> To: Raju, Sundaram
> Cc: Koul, Vinod; Dan; [email protected]; linux-
> [email protected]; [email protected]; linux-arm-
> [email protected]
> Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer
>
> On Fri, Jun 10, 2011 at 03:51:41PM +0530, Raju, Sundaram wrote:
> > Consider a simple video use case of de-interlacing a video buffer.
> > Odd lines have to be transferred first, and then the even lines are
> > transferred separately. This can be done by programming the
> > inter frame gap as the line size of the video buffer in the DMAC.
> > This will require you to have only 2 descriptors, one for the
> > odd lines and another for the even lines. This results in only
> > 2 descriptors being written to DMAC registers.
>
> How would this be handled with DMACs which can't 'skip' bytes in the
> buffer? You would have to generate a list of LLIs separately
> describing each 'line' of video and chain them together.

Correct.

>
> How do you handle the situation where a driver uses your new proposed
> API, but it doesn't support that in hardware.
>

It should be handled the same way how a sg buffer is handled,
if LLI chaining feature is not available in the hardware.

It has to be submitted as a queue of separate descriptors to
the DMAC, where each descriptor will be processed after the
completion of the previous descriptor.

> > Actually we can deduce the chunk_size from the
> > dma_slave_config itself. It is either the src_addr_width or
> > dst_addr_width based on the direction. Because at a stretch
> > DMAC cannot transfer more than the slave register width.
>
> I think you're misinterpreting those fields. (dst|src)_addr_width tells
> the DMA controller the width of each transaction - whether to issue a
> byte, half-word, word or double-word read or write to the peripheral.
> It doesn't say how many of those to issue, it just says what the
> peripheral access size is to be.
>
> In other words, they describe the width of the FIFO register.

Yes correct, I was just giving an example for considering or
understanding a buffer in 3D and how each dimension should be.

chunk_size = (src|dst)_addr_width, for a special case,
i.e, if DMAC is programmed to transfer the entire 1D buffer
per sync event received from the peripheral.

In slave transfer, the peripheral is going to give a sync event to
DMAC when the FIFO register is full|empty.

Now DMACs capable of 3D transfer, do transfer of the whole 1D
buffer per sync received or even whole 2D buffer per sync received
(based on the sync rate programmed in the DMAC).

So the DMAC has to be programmed for a 1D size (i.e. chunk size)
equal to that of the width of the FIFO register if the sync rate
programmed in DMAC is "per chunk".

Regards,
Sundaram

2011-06-10 13:34:08

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

On Fri, Jun 10, 2011 at 05:18:46PM +0530, Raju, Sundaram wrote:
> Russell,
>
> > How do you handle the situation where a driver uses your new proposed
> > API, but it doesn't support that in hardware.
>
> It should be handled the same way how a sg buffer is handled,
> if LLI chaining feature is not available in the hardware.

No, if the LLI chaining feature is available in hardware and your special
'skip' feature is not, then you have to generate a set of LLIs for the
hardware to walk through to 'emulate' the skip feature.

If you have no LLI support, then you need to program each transfer manually
into the DMA controller.

> > > Actually we can deduce the chunk_size from the
> > > dma_slave_config itself. It is either the src_addr_width or
> > > dst_addr_width based on the direction. Because at a stretch
> > > DMAC cannot transfer more than the slave register width.
> >
> > I think you're misinterpreting those fields. (dst|src)_addr_width tells
> > the DMA controller the width of each transaction - whether to issue a
> > byte, half-word, word or double-word read or write to the peripheral.
> > It doesn't say how many of those to issue, it just says what the
> > peripheral access size is to be.
> >
> > In other words, they describe the width of the FIFO register.
>
> Yes correct, I was just giving an example for considering or
> understanding a buffer in 3D and how each dimension should be.
>
> chunk_size = (src|dst)_addr_width, for a special case,
> i.e, if DMAC is programmed to transfer the entire 1D buffer
> per sync event received from the peripheral.

Please, don't generate special cases. Design proper APIs from the
outset rather than abusing what's already there. So no, don't abuse
the address width stuff.

In any case, the address width stuff must still be used to describe the
peripherals FIFO register.

> In slave transfer, the peripheral is going to give a sync event to
> DMAC when the FIFO register is full|empty.

'sync event' - peripherals give 'request' signals to the DMAC asking
them to transfer some more data only, and the DMAC gives an acknowledge
signal back so that the peripheral knows that the DMAC is giving them
data. In the generic case, they typically have no further signalling.

When the DMAC sees an active request, it will attempt to transfer up
to the minimum of (burst size, number of transfers remaining) to the
peripheral in one go.

> Now DMACs capable of 3D transfer, do transfer of the whole 1D
> buffer per sync received or even whole 2D buffer per sync received
> (based on the sync rate programmed in the DMAC).

Ok, what I'm envisioning is that your term "chunk size" means "register
width", and you view that as one dimension. We already describe this.

A frame is a collection of chunks. That's already described - the number
of chunks in a buffer is the buffer size divided by the chunk size.
Buffers must be a multiple of the chunk size.

Then we have a collection of frames. These can be non-contiguous.
That's effectively described by our scatterlist.

> So the DMAC has to be programmed for a 1D size (i.e. chunk size)
> equal to that of the width of the FIFO register if the sync rate
> programmed in DMAC is "per chunk".

This appears to be a circular definition, and makes little sense to me.

The overall conclusion which I'm coming to is that we already support
what you're asking for, but the problem is that we're using different
(and I'd argue standard) terminology to describe what we have.

The only issue which I see that we don't cover is the case where you want
to describe a single buffer which is organised as N bytes to be transferred,
M following bytes to be skipped, N bytes to be transferred, M bytes to be
skipped. I doubt there are many controllers which can be programmed with
both 'N' and 'M' parameters directly.

Let me finish with a summary of how memory-to-peripheral transfers are
expected to operate:
1. The DMA controller reads data from system RAM using whatever parameters
are most suitable for it to use up to the current buffer size.

2. When the DMA request goes active, it transfers data that it read
previously to the device in dest_addr_width chunks of data. It
transfers the minimum of the remaining data or the burst size.

3. The peripheral, receiving data and filling its FIFO, drops the DMA
request.

4. The DMA controller then reads more data from system RAM in the same
way in (1).

5. If at any point the DMA controller reaches the end of the buffer, and
as a link to the next buffer, it immediately moves to the next buffer
and starts fetching data from that new buffer.

2011-06-10 17:22:45

by Vinod Koul

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

On Fri, 2011-06-10 at 16:43 +0530, Raju, Sundaram wrote:
> Vinod,

...
> >
> > > Now coming to the buffer related attributes, sg list is a nice way to
> > > represent a disjoint buffer; all the offload engines in drivers/dma
> > > create a descriptor for each of the contiguous chunk in the sg list
> > > buffer and pass it to the controller.
> > >
> > > But many a times a client may want to transfer from a single buffer to
> > > the peripheral and most of the DMA controllers have the capability to
> > > transfer data in chunks/frames directly at a stretch.
> > > All the attributes of a buffer, which are required for the transfer
> > > can be programmed in single descriptor and submitted to the
> > > DMA controller.
> > >
> > > So I think the slave transfer API must also have a provision to pass
> > > the buffer configuration. The buffer attributes have to be passed
> > > directly as an argument in the prepare API, unlike dma_slave_config
> > > as they will be different for each buffer that is going to be
> > > transferred.
> > Can you articulate what attributes you are talking about. The
> > dma_slave_config parameters don't represent buffer attributes. They
> > represent the dma attributes on how you want to transfer. These things
> > like bus widths, burst lengths are usually constant for the slave
> > transfers, not sure why they should change per buffer?
> >
>
> I have tried to explain the attributes in the previous mail
> I posted in this thread.
>
> Yes, buffer attributes should not be represented in
> the dma_slave_config. It is for slave related data.
> That is why had mentioned that buffer configuration should be
> a separate structure and passed in the prepare API.
> See quoted below:
>
> <quote>
> > struct dma_buffer_config {
> > u32 chunk_size; /* number of bytes in a chunk */
> > u32 frame_size; /* number of chunks in a frame */
> > /* u32 n_frames; number of frames in the buffer */
> > u32 inter_chunk_gap; /* number of bytes between end of a chunk
> > and the start of the next chunk */
> > u32 inter_frame_gap; /* number of bytes between end of a frame
> > and the start of the next frame */
> > bool sync_rate; /* 0 - a sync event is required from the
> > peripheral to transfer a chunk
> > 1 - a sync event is required from the
> > peripheral to transfer a frame */
> > };
IIUC, you have a buffer with gaps in between (given by above params).
Is your DMA h/w capable of parsing this buffer and directly do a
transfer based on above parameters (without any work for SW), or you are
doing this in your dma driver and then programming a list of buffers?
In former case (although i haven't seen such dma controller yet), can
you share the datasheet? It would make very little sense to change the
current API for your extra special case. This special case needs to be
handled differently rather than making rule out of it!!

And in latter case you are really going the wrong path as Russel pointed
out and trying to abuse the APIs


--
~Vinod

2011-06-13 14:13:20

by Linus Walleij

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

On Fri, Jun 10, 2011 at 3:33 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Fri, Jun 10, 2011 at 05:18:46PM +0530, Raju, Sundaram wrote:
>> Now DMACs capable of 3D transfer, do transfer of the whole 1D
>> buffer per sync received or even whole 2D buffer per sync received
>> (based on the sync rate programmed in the DMAC).
>
> The only issue which I see that we don't cover is the case where you want
> to describe a single buffer which is organised as N bytes to be transferred,
> M following bytes to be skipped, N bytes to be transferred, M bytes to be
> skipped. ?I doubt there are many controllers which can be programmed with
> both 'N' and 'M' parameters directly.

Sundaram is this how your controller works?
I mean the hardware can skip over sequences like this?

When we added the config interface to DMAengine I originally included
a "custom config" call, but Dan wanted me to keep it out until we
had some specific usecase for it. FSLDMA recently started
to use it.

Notice how dmaengine_slave_config() is implemented:

static inline int dmaengine_slave_config(struct dma_chan *chan,
struct dma_slave_config *config)
{
return dmaengine_device_control(chan, DMA_SLAVE_CONFIG,
(unsigned long)config);
}

So what is passed to the driver is just an unsigned long.

This is actually modeled to be ioctl()-like so you can pass in a
custom config to the same callback on the device driver,
just use some other enumerator than DMA_SLAVE_CONFIG,
say like FSLDMA already does with FSLDMA_EXTERNAL_START.

Just put some enumerator in enum dma_ctrl_cmd in
dmaengine.h such as SDMA_TEXAS_STRIDE_CONFIG and call
like this:

/* However that config struct needs to look, basically */
static struct sdma_ti_stride_cgf = {
take = M,
skip = N,
};

ret = chan->device->device_control(chan, SDMA_TEXAS_STRIDE_CONFIG,
&sdma_ti_stride_cfg);

Or something like this.

Thanks,
Linus Walleij

2011-06-14 05:39:17

by Raju, Sundaram

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

Linus,

Thanks for the pointers.

> -----Original Message-----
> From: Linus Walleij [mailto:[email protected]]
> Sent: Monday, June 13, 2011 7:43 PM
> To: Raju, Sundaram
> Cc: Russell King - ARM Linux; Koul, Vinod; Dan; davinci-linux-open-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer
>
> On Fri, Jun 10, 2011 at 3:33 PM, Russell King - ARM Linux
> <[email protected]> wrote:
> > On Fri, Jun 10, 2011 at 05:18:46PM +0530, Raju, Sundaram wrote:
> >> Now DMACs capable of 3D transfer, do transfer of the whole 1D
> >> buffer per sync received or even whole 2D buffer per sync received
> >> (based on the sync rate programmed in the DMAC).
> >
> > The only issue which I see that we don't cover is the case where you want
> > to describe a single buffer which is organised as N bytes to be transferred,
> > M following bytes to be skipped, N bytes to be transferred, M bytes to be
> > skipped. ?I doubt there are many controllers which can be programmed with
> > both 'N' and 'M' parameters directly.
>
> Sundaram is this how your controller works?
> I mean the hardware can skip over sequences like this?
>
> When we added the config interface to DMAengine I originally included
> a "custom config" call, but Dan wanted me to keep it out until we
> had some specific usecase for it. FSLDMA recently started
> to use it.
>
> Notice how dmaengine_slave_config() is implemented:
>
> static inline int dmaengine_slave_config(struct dma_chan *chan,
> struct dma_slave_config *config)
> {
> return dmaengine_device_control(chan, DMA_SLAVE_CONFIG,
> (unsigned long)config);
> }
>
> So what is passed to the driver is just an unsigned long.
>
> This is actually modeled to be ioctl()-like so you can pass in a
> custom config to the same callback on the device driver,
> just use some other enumerator than DMA_SLAVE_CONFIG,
> say like FSLDMA already does with FSLDMA_EXTERNAL_START.
>
> Just put some enumerator in enum dma_ctrl_cmd in
> dmaengine.h such as SDMA_TEXAS_STRIDE_CONFIG and call
> like this:
>
> /* However that config struct needs to look, basically */
> static struct sdma_ti_stride_cgf = {
> take = M,
> skip = N,
> };
>
> ret = chan->device->device_control(chan, SDMA_TEXAS_STRIDE_CONFIG,
> &sdma_ti_stride_cfg);
>
> Or something like this.

Yes, the hardware can skip over sequences like that.
I also thought about your suggestion, at first before submitting the RFC.
But I dint pursue this because, this ioctl() call has to be made before
every single prepare call.

I may have to end up using this if we decide not to change the API
signature for prepare APIs.

I actually intend to use this for all DMAC related ioctl(). Configuring
the DMAC and programming various modes etc specific to the
DMAC. I suppose this is the only way to do it.
Let me know if there is any other way to do it.

Thanks,
Sundaram

2011-06-14 05:59:34

by Linus Walleij

[permalink] [raw]
Subject: Re: [RFC] dmaengine: add new api for preparing simple slave transfer

2011/6/14 Raju, Sundaram <[email protected]>:

> Yes, the hardware can skip over sequences like that.
> I also thought about your suggestion, at first before submitting the RFC.
> But I dint pursue this because, this ioctl() call has to be made before
> every single prepare call.

It's not an ioctl(), ioctl()s would be expensive since they involve
userspace to kernelspace context switches. It is just "ioctl()-like".
It is a simple, fast function call.

> I may have to end up using this if we decide not to change the API
> signature for prepare APIs.

Is it really so bad? It is a custom configuration after all. Even if
there were many DMACs out there supporting it, we'd probably
model it like this, just pass something like DMA_STRIDE_CONFIG
instead of something named after a specific slave controller.

This way DMACs that didn't support striding could NACK a
transfer for device drivers requesting it and then it could figure
out what to do.

If we can get some indication as to whether more DMACs can
do this kind of stuff, we'd maybe like to introduce
DMA_STRIDE_CONFIG already now.

Thanks,
Linus Walleij

2011-06-14 06:42:34

by Raju, Sundaram

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

Russell,

Thanks for all the quick pointers and the summary of how
memory-to-peripheral transfers are expected to operate.

> Ok, what I'm envisioning is that your term "chunk size" means "register
> width", and you view that as one dimension. We already describe this.
>
> A frame is a collection of chunks. That's already described - the number
> of chunks in a buffer is the buffer size divided by the chunk size.
> Buffers must be a multiple of the chunk size.
>
> Then we have a collection of frames. These can be non-contiguous.
> That's effectively described by our scatterlist.
>
<snip>
>
> The overall conclusion which I'm coming to is that we already support
> what you're asking for, but the problem is that we're using different
> (and I'd argue standard) terminology to describe what we have.
>
> The only issue which I see that we don't cover is the case where you want
> to describe a single buffer which is organised as N bytes to be transferred,
> M following bytes to be skipped, N bytes to be transferred, M bytes to be
> skipped. I doubt there are many controllers which can be programmed with
> both 'N' and 'M' parameters directly.
>

Yes this is what I wanted to communicate and discuss in the list.
Thanks for describing it in the standard terminology, and pointing me in the
right direction.

Also please note that if the gap between the buffers in a scatterlist is
uniform and that can again be skipped by the DMAC automatically
by programming that gap size, then that feature also is not available
in the current framework.

I understand it can be done with the existing scatterlist, but just writing
a value in a DMAC register is much better if available, than preparing
a scatterlist and passing to the API.

>
> Please, don't generate special cases. Design proper APIs from the
> outset rather than abusing what's already there. So no, don't abuse
> the address width stuff.
>
> In any case, the address width stuff must still be used to describe the
> peripherals FIFO register.
>

I did not intend to abuse the existing address width. It might look
like that because of how I described it here.
I agree that the dma_slave_config is for peripheral related
properties to be stored. I was pointing out that the chunk size
variable in the dma_buffer_config I proposed will be in most cases
equal to FIFO register width, to describe what I actually meant by
chunk size.

> IIUC, you have a buffer with gaps in between (given by above params).
> Is your DMA h/w capable of parsing this buffer and directly do a
> transfer based on above parameters (without any work for SW), or you are
> doing this in your dma driver and then programming a list of buffers?
> In former case (although i haven't seen such dma controller yet), can
> you share the datasheet? It would make very little sense to change the
> current API for your extra special case. This special case needs to be
> handled differently rather than making rule out of it!!
>

Yes, Vinod. This is directly handled in the DMAC h/w.

This capability is present in 2 TI DMACs.
EDMA (Enhanced DMA) in all TI DaVinci SoCs and the
SDMA (System DMA) in all TI OMAP SoCs. The drivers of these
controllers are present in the respective DaVinci tree and OMAP tree
under the SoC folders.

<quote>
> SDMA and EDMA are TI SoC specific DMA controllers. Their drivers have
> been maintained in the respective SoC folders till now.
> arch/arm/plat-omap/dma.c
> arch/arm/mach-davinci/dma.c
</quote>

The Manual of the EDMA controller in TI DaVinci SoCs is available at
http://www.ti.com/litv/pdf/sprue23d
Section 2.2 in the page 23 explains how transfers are made based
on the gaps programmed. It also explains how the 3D buffer is
internally split in EDMA based on the gaps programmed.

The complete Reference manual of TI OMAP SoCs is available at
http://www.ti.com/litv/pdf/spruf98s
Chapter 9 in this document describes the SDMA controller.
Section 9.4.3 in page 981 explains the various address modes,
how the address is incremented by the DMAC and about the gaps
in between buffers and frames and how the DMAC handles them.

Linus,
>
> Is it really so bad? It is a custom configuration after all. Even if
> there were many DMACs out there supporting it, we'd probably
> model it like this, just pass something like DMA_STRIDE_CONFIG
> instead of something named after a specific slave controller.
>
> This way DMACs that didn't support striding could NACK a
> transfer for device drivers requesting it and then it could figure
> out what to do.
>
> If we can get some indication as to whether more DMACs can
> do this kind of stuff, we'd maybe like to introduce
> DMA_STRIDE_CONFIG already now.
>

I wanted to discuss this feature in the list and get this feature
added to the current dmaengine framework. If the current APIs
can handle this feature, then its very good
and I will follow that only.

If what you suggest is the right way to go then I am okay.
IMHO the framework should always handle the complex case
and individual implementations should implement a subset of
the capability and hence I suggest the changes I posted to the list.

Regards,
Sundaram
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2011-06-16 06:21:29

by Vinod Koul

[permalink] [raw]
Subject: RE: [RFC] dmaengine: add new api for preparing simple slave transfer

On Tue, 2011-06-14 at 12:12 +0530, Raju, Sundaram wrote:
> <snip>
> >
> > The overall conclusion which I'm coming to is that we already support
> > what you're asking for, but the problem is that we're using different
> > (and I'd argue standard) terminology to describe what we have.
> >
> > The only issue which I see that we don't cover is the case where you want
> > to describe a single buffer which is organised as N bytes to be transferred,
> > M following bytes to be skipped, N bytes to be transferred, M bytes to be
> > skipped. I doubt there are many controllers which can be programmed with
> > both 'N' and 'M' parameters directly.
> >
>
> Yes this is what I wanted to communicate and discuss in the list.
> Thanks for describing it in the standard terminology, and pointing me in the
> right direction.
>
> Also please note that if the gap between the buffers in a scatterlist is
> uniform and that can again be skipped by the DMAC automatically
> by programming that gap size, then that feature also is not available
> in the current framework.
>
> I understand it can be done with the existing scatterlist, but just writing
> a value in a DMAC register is much better if available, than preparing
> a scatterlist and passing to the API.
>
> >
> > Please, don't generate special cases. Design proper APIs from the
> > outset rather than abusing what's already there. So no, don't abuse
> > the address width stuff.
> >
> > In any case, the address width stuff must still be used to describe the
> > peripherals FIFO register.
> >
>
> I did not intend to abuse the existing address width. It might look
> like that because of how I described it here.
> I agree that the dma_slave_config is for peripheral related
> properties to be stored. I was pointing out that the chunk size
> variable in the dma_buffer_config I proposed will be in most cases
> equal to FIFO register width, to describe what I actually meant by
> chunk size.
>
> > IIUC, you have a buffer with gaps in between (given by above params).
> > Is your DMA h/w capable of parsing this buffer and directly do a
> > transfer based on above parameters (without any work for SW), or you are
> > doing this in your dma driver and then programming a list of buffers?
> > In former case (although i haven't seen such dma controller yet), can
> > you share the datasheet? It would make very little sense to change the
> > current API for your extra special case. This special case needs to be
> > handled differently rather than making rule out of it!!
> >
>
> Yes, Vinod. This is directly handled in the DMAC h/w.
>
> This capability is present in 2 TI DMACs.
> EDMA (Enhanced DMA) in all TI DaVinci SoCs and the
> SDMA (System DMA) in all TI OMAP SoCs. The drivers of these
> controllers are present in the respective DaVinci tree and OMAP tree
> under the SoC folders.
>
> <quote>
> > SDMA and EDMA are TI SoC specific DMA controllers. Their drivers have
> > been maintained in the respective SoC folders till now.
> > arch/arm/plat-omap/dma.c
> > arch/arm/mach-davinci/dma.c
> </quote>
>
> The Manual of the EDMA controller in TI DaVinci SoCs is available at
> http://www.ti.com/litv/pdf/sprue23d
> Section 2.2 in the page 23 explains how transfers are made based
> on the gaps programmed. It also explains how the 3D buffer is
> internally split in EDMA based on the gaps programmed.
>
> The complete Reference manual of TI OMAP SoCs is available at
> http://www.ti.com/litv/pdf/spruf98s
> Chapter 9 in this document describes the SDMA controller.
> Section 9.4.3 in page 981 explains the various address modes,
> how the address is incremented by the DMAC and about the gaps
> in between buffers and frames and how the DMAC handles them.
>
> Linus,
> >
> > Is it really so bad? It is a custom configuration after all. Even if
> > there were many DMACs out there supporting it, we'd probably
> > model it like this, just pass something like DMA_STRIDE_CONFIG
> > instead of something named after a specific slave controller.
> >
> > This way DMACs that didn't support striding could NACK a
> > transfer for device drivers requesting it and then it could figure
> > out what to do.
> >
> > If we can get some indication as to whether more DMACs can
> > do this kind of stuff, we'd maybe like to introduce
> > DMA_STRIDE_CONFIG already now.
> >
>
> I wanted to discuss this feature in the list and get this feature
> added to the current dmaengine framework. If the current APIs
> can handle this feature, then its very good
> and I will follow that only.
>
> If what you suggest is the right way to go then I am okay.
> IMHO the framework should always handle the complex case
> and individual implementations should implement a subset of
> the capability and hence I suggest the changes I posted to the list.
Okay now things are more clear on what you are trying to do...

1) The changes you need are to describe your buffer and convey to dmac
so please don't talk about slave here as that is specific to dmac config

2) I think this can be achieved in two ways:
a) you use current standard sg_list mechanism, the dmac driver parses
the list and programs the dma offsets into dmac
Pros: you can use existing APIs, no changes to i/f.
If dmac has this capability they program dmac accordingly
Cons: you need to create sg-list in client driver

b) create a new api to describe these offset values, something like:
prep_buffer_offset(struct offset_description *buffer,.....)
I would not like to change the current API for this and rather have a
new API for this, this should better then overriding current.

Overall I would prefer you to use approach 2-a above

--
~Vinod