2015-11-11 10:38:23

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH 0/4] dmaengine: omap-dma: transfer start and short memcpy improvement

Hi,

The first two patch is trivial fix.
The third (remove the tasklet use for starting the transfer):
I had been wondering about this for a while and now I was able to spend some
time to look at this in more detail.
In 'normal' operation I have never seen the tasklet to start more then one
transfer even when pushing I/O based workloads on the board. However when I run
the DMAtest module I can see that the tasklet executes about 15 transfers.
When the tasklet use has been removed, everything worked as well as before but
the throughput of MMC/SD and memcpy increased slightly:

dd if=/dev/mmcblk0 of=/dev/null bs=64k count=24000
~16.5 MB/s -> ~16.6 MB/s

echo 6553 > /sys/module/dmatest/parameters/test_buf_size
echo 2000 > /sys/module/dmatest/parameters/timeout
echo 5000 > /sys/module/dmatest/parameters/iterations
~585 KB/s -> ~638 KB/s

It worth mentioning that _with_ the tasklet starting the transfers I managed to
hit a situation once when for some reason the memcpy tests were started to time
out. This happend with memcpy, iozone, dd and grep -R blabla /usr/ running at
the same time.

The last patch is to correct the behaviour of omap-dma when short memcpy is used
by a client and the client is not using completion, but polling for the end of
the transfer.

Regards,
Peter
---
Peter Ujfalusi (4):
dmaengine: omap-dma: Correct status reporting for memcpy
dmaengine: omap-dma: Clean up the prep_slave_sg sg list walk code
dmaengine: omap-dma: Remove tasklet to start the transfers
dmaengine: omap-dma: Handle cases when the channel is polled for
completion

drivers/dma/omap-dma.c | 78 +++++++++-----------------------------------------
1 file changed, 14 insertions(+), 64 deletions(-)

--
2.6.2


2015-11-11 10:39:08

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH 1/4] dmaengine: omap-dma: Correct status reporting for memcpy

During mem copy both src and dst position moves at the same pace. Check the
dst position for progress reporting.

Signed-off-by: Peter Ujfalusi <[email protected]>
Tested-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Jyri Sarha <[email protected]>
---
drivers/dma/omap-dma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 1dfc71c90123..8ed39ce24d46 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -719,7 +719,7 @@ static enum dma_status omap_dma_tx_status(struct dma_chan *chan,

if (d->dir == DMA_MEM_TO_DEV)
pos = omap_dma_get_src_pos(c);
- else if (d->dir == DMA_DEV_TO_MEM)
+ else if (d->dir == DMA_DEV_TO_MEM || d->dir == DMA_MEM_TO_MEM)
pos = omap_dma_get_dst_pos(c);
else
pos = 0;
--
2.6.2

2015-11-11 10:38:31

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH 2/4] dmaengine: omap-dma: Clean up the prep_slave_sg sg list walk code

The for_each_sg() macro's last parameter is inteded to be used as counter.
We can use 'i' instead of 'j' within the loop for indexes.

Signed-off-by: Peter Ujfalusi <[email protected]>
---
drivers/dma/omap-dma.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 8ed39ce24d46..4afc2c18d451 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -768,7 +768,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
struct scatterlist *sgent;
struct omap_desc *d;
dma_addr_t dev_addr;
- unsigned i, j = 0, es, en, frame_bytes;
+ unsigned i, es, en, frame_bytes;
u32 burst;

if (dir == DMA_DEV_TO_MEM) {
@@ -845,13 +845,12 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
en = burst;
frame_bytes = es_bytes[es] * en;
for_each_sg(sgl, sgent, sglen, i) {
- d->sg[j].addr = sg_dma_address(sgent);
- d->sg[j].en = en;
- d->sg[j].fn = sg_dma_len(sgent) / frame_bytes;
- j++;
+ d->sg[i].addr = sg_dma_address(sgent);
+ d->sg[i].en = en;
+ d->sg[i].fn = sg_dma_len(sgent) / frame_bytes;
}

- d->sglen = j;
+ d->sglen = sglen;

return vchan_tx_prep(&c->vc, &d->vd, tx_flags);
}
--
2.6.2

2015-11-11 10:38:24

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH 3/4] dmaengine: omap-dma: Remove tasklet to start the transfers

The use of tasklet to actually start the DMA transfer slightly decreases the
DMA throughput since it adds small scheduling delay when the transfer is
started. In normal use, even with high I/O load the tasklet would start
one transaction at a time, however running the DMAtest for memcpy on all
available channels will cause the tasklet to start about 15 transfers.
The performance numbers on OMAP4 PandaBoard-es (test_buf_size = 6553):
With the tasklet:
dmatest: dma0chan30-copy: summary 5000 tests, 0 failures 186 iops 593 KB/s (0)
dmatest: dma0chan8-copy0: summary 5000 tests, 0 failures 184 iops 584 KB/s (0)
dmatest: dma0chan13-copy: summary 5000 tests, 0 failures 184 iops 585 KB/s (0)
dmatest: dma0chan12-copy: summary 5000 tests, 0 failures 184 iops 585 KB/s (0)
dmatest: dma0chan7-copy0: summary 5000 tests, 0 failures 183 iops 581 KB/s (0)

With this patch (no tasklet):
dmatest: dma0chan4-copy0: summary 5000 tests, 0 failures 199 iops 644 KB/s (0)
dmatest: dma0chan5-copy0: summary 5000 tests, 0 failures 199 iops 645 KB/s (0)
dmatest: dma0chan6-copy0: summary 5000 tests, 0 failures 199 iops 637 KB/s (0)
dmatest: dma0chan24-copy: summary 5000 tests, 0 failures 199 iops 638 KB/s (0)
dmatest: dma0chan16-copy: summary 5000 tests, 0 failures 199 iops 638 KB/s (0)

Signed-off-by: Peter Ujfalusi <[email protected]>
---
drivers/dma/omap-dma.c | 59 ++------------------------------------------------
1 file changed, 2 insertions(+), 57 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 4afc2c18d451..4e4642f561f5 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -28,8 +28,6 @@
struct omap_dmadev {
struct dma_device ddev;
spinlock_t lock;
- struct tasklet_struct task;
- struct list_head pending;
void __iomem *base;
const struct omap_dma_reg *reg_map;
struct omap_system_dma_plat_info *plat;
@@ -42,7 +40,6 @@ struct omap_dmadev {

struct omap_chan {
struct virt_dma_chan vc;
- struct list_head node;
void __iomem *channel_base;
const struct omap_dma_reg *reg_map;
uint32_t ccr;
@@ -454,33 +451,6 @@ static void omap_dma_callback(int ch, u16 status, void *data)
spin_unlock_irqrestore(&c->vc.lock, flags);
}

-/*
- * This callback schedules all pending channels. We could be more
- * clever here by postponing allocation of the real DMA channels to
- * this point, and freeing them when our virtual channel becomes idle.
- *
- * We would then need to deal with 'all channels in-use'
- */
-static void omap_dma_sched(unsigned long data)
-{
- struct omap_dmadev *d = (struct omap_dmadev *)data;
- LIST_HEAD(head);
-
- spin_lock_irq(&d->lock);
- list_splice_tail_init(&d->pending, &head);
- spin_unlock_irq(&d->lock);
-
- while (!list_empty(&head)) {
- struct omap_chan *c = list_first_entry(&head,
- struct omap_chan, node);
-
- spin_lock_irq(&c->vc.lock);
- list_del_init(&c->node);
- omap_dma_start_desc(c);
- spin_unlock_irq(&c->vc.lock);
- }
-}
-
static irqreturn_t omap_dma_irq(int irq, void *devid)
{
struct omap_dmadev *od = devid;
@@ -739,22 +709,8 @@ static void omap_dma_issue_pending(struct dma_chan *chan)
unsigned long flags;

spin_lock_irqsave(&c->vc.lock, flags);
- if (vchan_issue_pending(&c->vc) && !c->desc) {
- /*
- * c->cyclic is used only by audio and in this case the DMA need
- * to be started without delay.
- */
- if (!c->cyclic) {
- struct omap_dmadev *d = to_omap_dma_dev(chan->device);
- spin_lock(&d->lock);
- if (list_empty(&c->node))
- list_add_tail(&c->node, &d->pending);
- spin_unlock(&d->lock);
- tasklet_schedule(&d->task);
- } else {
- omap_dma_start_desc(c);
- }
- }
+ if (vchan_issue_pending(&c->vc) && !c->desc)
+ omap_dma_start_desc(c);
spin_unlock_irqrestore(&c->vc.lock, flags);
}

@@ -1017,17 +973,11 @@ static int omap_dma_slave_config(struct dma_chan *chan, struct dma_slave_config
static int omap_dma_terminate_all(struct dma_chan *chan)
{
struct omap_chan *c = to_omap_dma_chan(chan);
- struct omap_dmadev *d = to_omap_dma_dev(c->vc.chan.device);
unsigned long flags;
LIST_HEAD(head);

spin_lock_irqsave(&c->vc.lock, flags);

- /* Prevent this channel being scheduled */
- spin_lock(&d->lock);
- list_del_init(&c->node);
- spin_unlock(&d->lock);
-
/*
* Stop DMA activity: we assume the callback will not be called
* after omap_dma_stop() returns (even if it does, it will see
@@ -1101,14 +1051,12 @@ static int omap_dma_chan_init(struct omap_dmadev *od)
c->reg_map = od->reg_map;
c->vc.desc_free = omap_dma_desc_free;
vchan_init(&c->vc, &od->ddev);
- INIT_LIST_HEAD(&c->node);

return 0;
}

static void omap_dma_free(struct omap_dmadev *od)
{
- tasklet_kill(&od->task);
while (!list_empty(&od->ddev.channels)) {
struct omap_chan *c = list_first_entry(&od->ddev.channels,
struct omap_chan, vc.chan.device_node);
@@ -1164,12 +1112,9 @@ static int omap_dma_probe(struct platform_device *pdev)
od->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
od->ddev.dev = &pdev->dev;
INIT_LIST_HEAD(&od->ddev.channels);
- INIT_LIST_HEAD(&od->pending);
spin_lock_init(&od->lock);
spin_lock_init(&od->irq_lock);

- tasklet_init(&od->task, omap_dma_sched, (unsigned long)od);
-
od->dma_requests = OMAP_SDMA_REQUESTS;
if (pdev->dev.of_node && of_property_read_u32(pdev->dev.of_node,
"dma-requests",
--
2.6.2

2015-11-11 10:38:33

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH 4/4] dmaengine: omap-dma: Handle cases when the channel is polled for completion

When a DMA client driver decides that it is not providing callback for
completion of a transfer (and/or does not set the DMA_PREP_INTERRUPT) but
it will poll the status of the transfer (in case of short memcpy for
example) we will not get interrupt for the completion of the transfer and
will not mark the transaction as done.
Check the channel enable bit in the CCR when the status is queried and if
the channel is no longer active, we call the omap_dma_callback() to handle
the transfer completion.

Signed-off-by: Peter Ujfalusi <[email protected]>
---
drivers/dma/omap-dma.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 4e4642f561f5..f86827ac0c8a 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -673,8 +673,14 @@ static enum dma_status omap_dma_tx_status(struct dma_chan *chan,
struct omap_chan *c = to_omap_dma_chan(chan);
struct virt_dma_desc *vd;
enum dma_status ret;
+ uint32_t ccr;
unsigned long flags;

+ ccr = omap_dma_chan_read(c, CCR);
+ /* The channel is no longer active, handle the completion right away */
+ if (!(ccr & CCR_ENABLE))
+ omap_dma_callback(c->dma_ch, 0, c);
+
ret = dma_cookie_status(chan, cookie, txstate);
if (ret == DMA_COMPLETE || !txstate)
return ret;
--
2.6.2

2015-12-05 08:04:00

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 0/4] dmaengine: omap-dma: transfer start and short memcpy improvement

On Wed, Nov 11, 2015 at 12:37:54PM +0200, Peter Ujfalusi wrote:
> Hi,
>
> The first two patch is trivial fix.
> The third (remove the tasklet use for starting the transfer):
> I had been wondering about this for a while and now I was able to spend some
> time to look at this in more detail.
> In 'normal' operation I have never seen the tasklet to start more then one
> transfer even when pushing I/O based workloads on the board. However when I run
> the DMAtest module I can see that the tasklet executes about 15 transfers.
> When the tasklet use has been removed, everything worked as well as before but
> the throughput of MMC/SD and memcpy increased slightly:
>
> dd if=/dev/mmcblk0 of=/dev/null bs=64k count=24000
> ~16.5 MB/s -> ~16.6 MB/s
>
> echo 6553 > /sys/module/dmatest/parameters/test_buf_size
> echo 2000 > /sys/module/dmatest/parameters/timeout
> echo 5000 > /sys/module/dmatest/parameters/iterations
> ~585 KB/s -> ~638 KB/s
>
> It worth mentioning that _with_ the tasklet starting the transfers I managed to
> hit a situation once when for some reason the memcpy tests were started to time
> out. This happend with memcpy, iozone, dd and grep -R blabla /usr/ running at
> the same time.
>
> The last patch is to correct the behaviour of omap-dma when short memcpy is used
> by a client and the client is not using completion, but polling for the end of
> the transfer.

Applied, thanks

--
~Vinod