Hello!
This series contain free running cyclic transfer implementation for pl330.
It affect ALL chips using pl330 (not only rockchip) and
allow to run cyclic transfers without CPU intervention. As a result
it fix sound clicks (observed and not yet observed) because
sound clicks must be heard under heavy system load due to the way
how cyclic transfers implemented now for pl330.
My previous series[1] doesn't get enough attention (no one except me
tested it). And it don't get upstream:
> 8-03-2016, 6:03, Vinod Koul <[email protected]> *:
>Overall this series looks okay, but can someone test this. I would not like
pl330 to be broken again
Now I was asked about the series[1] again by guys from Rockchip,
so I send rebased against 4.10.10 version. Hope, someone might test it
and confirm that patches work fine.
Regards,
Alexander.
Alexander Kochetkov (2):
dmaengine: pl330: make cyclic transfer free runnable
dmaengine: pl330: don't emit code for one iteration loop
drivers/dma/pl330.c | 200 +++++++++++++++++++++++++--------------------------
1 file changed, 98 insertions(+), 102 deletions(-)
--
1.7.9.5
The patch solve the I2S click problem on rk3188. Actually
all the devices using pl330 should have I2S click problem
due to pl330 cyclic transfer implementation.
Current implementation depends on soft irq. If pl330 unable
to submit next transfer in time some samples could be lost.
The lost samples heard as sound click. In order to check
lost samples, I've installed I2S interrupt handler to signal
overflow/underflow conditions. Sometimes I saw overflow
or underflow events and heard clicks.
The patch setup cyclic transfer such a way, that transfer can
run infinitely without CPU intervention. As a result,
lost samples and clicks gone.
Signed-off-by: Alexander Kochetkov <[email protected]>
---
drivers/dma/pl330.c | 192 +++++++++++++++++++++++++--------------------------
1 file changed, 94 insertions(+), 98 deletions(-)
diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 7539f73..56a2377 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -446,9 +446,6 @@ struct dma_pl330_chan {
int burst_len; /* the number of burst */
dma_addr_t fifo_addr;
- /* for cyclic capability */
- bool cyclic;
-
/* for runtime pm tracking */
bool active;
};
@@ -532,6 +529,10 @@ struct dma_pl330_desc {
unsigned peri:5;
/* Hook to attach to DMAC's list of reqs with due callback */
struct list_head rqd;
+
+ /* For cyclic capability */
+ bool cyclic;
+ size_t num_periods;
};
struct _xfer_spec {
@@ -1333,16 +1334,19 @@ static inline int _setup_loops(struct pl330_dmac *pl330,
}
static inline int _setup_xfer(struct pl330_dmac *pl330,
- unsigned dry_run, u8 buf[],
+ unsigned dry_run, u8 buf[], u32 period,
const struct _xfer_spec *pxs)
{
struct pl330_xfer *x = &pxs->desc->px;
+ struct pl330_reqcfg *rqcfg = &pxs->desc->rqcfg;
int off = 0;
/* DMAMOV SAR, x->src_addr */
- off += _emit_MOV(dry_run, &buf[off], SAR, x->src_addr);
+ off += _emit_MOV(dry_run, &buf[off], SAR,
+ x->src_addr + rqcfg->src_inc * period * x->bytes);
/* DMAMOV DAR, x->dst_addr */
- off += _emit_MOV(dry_run, &buf[off], DAR, x->dst_addr);
+ off += _emit_MOV(dry_run, &buf[off], DAR,
+ x->dst_addr + rqcfg->dst_inc * period * x->bytes);
/* Setup Loop(s) */
off += _setup_loops(pl330, dry_run, &buf[off], pxs);
@@ -1362,23 +1366,41 @@ static int _setup_req(struct pl330_dmac *pl330, unsigned dry_run,
struct pl330_xfer *x;
u8 *buf = req->mc_cpu;
int off = 0;
+ int period;
+ int again_off;
PL330_DBGMC_START(req->mc_bus);
/* DMAMOV CCR, ccr */
off += _emit_MOV(dry_run, &buf[off], CCR, pxs->ccr);
+ again_off = off;
x = &pxs->desc->px;
/* Error if xfer length is not aligned at burst size */
if (x->bytes % (BRST_SIZE(pxs->ccr) * BRST_LEN(pxs->ccr)))
return -EINVAL;
- off += _setup_xfer(pl330, dry_run, &buf[off], pxs);
+ for (period = 0; period < pxs->desc->num_periods; period++) {
+ off += _setup_xfer(pl330, dry_run, &buf[off], period, pxs);
- /* DMASEV peripheral/event */
- off += _emit_SEV(dry_run, &buf[off], thrd->ev);
- /* DMAEND */
- off += _emit_END(dry_run, &buf[off]);
+ /* DMASEV peripheral/event */
+ off += _emit_SEV(dry_run, &buf[off], thrd->ev);
+ }
+
+ if (!pxs->desc->cyclic) {
+ /* DMAEND */
+ off += _emit_END(dry_run, &buf[off]);
+ } else {
+ struct _arg_LPEND lpend;
+ /* LP */
+ off += _emit_LP(dry_run, &buf[off], 0, 255);
+ /* LPEND */
+ lpend.cond = ALWAYS;
+ lpend.forever = false;
+ lpend.loop = 0;
+ lpend.bjump = off - again_off;
+ off += _emit_LPEND(dry_run, &buf[off], &lpend);
+ }
return off;
}
@@ -1640,12 +1662,13 @@ static int pl330_update(struct pl330_dmac *pl330)
/* Detach the req */
descdone = thrd->req[active].desc;
- thrd->req[active].desc = NULL;
-
- thrd->req_running = -1;
- /* Get going again ASAP */
- _start(thrd);
+ if (!descdone->cyclic) {
+ thrd->req[active].desc = NULL;
+ thrd->req_running = -1;
+ /* Get going again ASAP */
+ _start(thrd);
+ }
/* For now, just make a list of callbacks to be done */
list_add_tail(&descdone->rqd, &pl330->req_done);
@@ -2013,12 +2036,28 @@ static void pl330_tasklet(unsigned long data)
spin_lock_irqsave(&pch->lock, flags);
/* Pick up ripe tomatoes */
- list_for_each_entry_safe(desc, _dt, &pch->work_list, node)
+ list_for_each_entry_safe(desc, _dt, &pch->work_list, node) {
if (desc->status == DONE) {
- if (!pch->cyclic)
+ if (!desc->cyclic) {
dma_cookie_complete(&desc->txd);
- list_move_tail(&desc->node, &pch->completed_list);
+ list_move_tail(&desc->node,
+ &pch->completed_list);
+ } else {
+ struct dmaengine_desc_callback cb;
+
+ desc->status = BUSY;
+ dmaengine_desc_get_callback(&desc->txd, &cb);
+
+ if (dmaengine_desc_callback_valid(&cb)) {
+ spin_unlock_irqrestore(&pch->lock,
+ flags);
+ dmaengine_desc_callback_invoke(&cb,
+ NULL);
+ spin_lock_irqsave(&pch->lock, flags);
+ }
+ }
}
+ }
/* Try to submit a req imm. next to the last completed cookie */
fill_queue(pch);
@@ -2044,20 +2083,8 @@ static void pl330_tasklet(unsigned long data)
dmaengine_desc_get_callback(&desc->txd, &cb);
- if (pch->cyclic) {
- desc->status = PREP;
- list_move_tail(&desc->node, &pch->work_list);
- if (power_down) {
- pch->active = true;
- spin_lock(&pch->thread->dmac->lock);
- _start(pch->thread);
- spin_unlock(&pch->thread->dmac->lock);
- power_down = false;
- }
- } else {
- desc->status = FREE;
- list_move_tail(&desc->node, &pch->dmac->desc_pool);
- }
+ desc->status = FREE;
+ list_move_tail(&desc->node, &pch->dmac->desc_pool);
dma_descriptor_unmap(&desc->txd);
@@ -2117,7 +2144,6 @@ static int pl330_alloc_chan_resources(struct dma_chan *chan)
spin_lock_irqsave(&pl330->lock, flags);
dma_cookie_init(chan);
- pch->cyclic = false;
pch->thread = pl330_request_channel(pl330);
if (!pch->thread) {
@@ -2241,8 +2267,7 @@ static void pl330_free_chan_resources(struct dma_chan *chan)
pl330_release_channel(pch->thread);
pch->thread = NULL;
- if (pch->cyclic)
- list_splice_tail_init(&pch->work_list, &pch->dmac->desc_pool);
+ list_splice_tail_init(&pch->work_list, &pch->dmac->desc_pool);
spin_unlock_irqrestore(&pl330->lock, flags);
pm_runtime_mark_last_busy(pch->dmac->ddma.dev);
@@ -2304,7 +2329,7 @@ static int pl330_get_current_xferred_count(struct dma_pl330_chan *pch,
/* Check in pending list */
list_for_each_entry(desc, &pch->work_list, node) {
- if (desc->status == DONE)
+ if (desc->status == DONE && !desc->cyclic)
transferred = desc->bytes_requested;
else if (running && desc == running)
transferred =
@@ -2386,12 +2411,8 @@ static dma_cookie_t pl330_tx_submit(struct dma_async_tx_descriptor *tx)
/* Assign cookies to all nodes */
while (!list_empty(&last->node)) {
desc = list_entry(last->node.next, struct dma_pl330_desc, node);
- if (pch->cyclic) {
- desc->txd.callback = last->txd.callback;
- desc->txd.callback_param = last->txd.callback_param;
- }
- desc->last = false;
+ desc->last = false;
dma_cookie_assign(&desc->txd);
list_move_tail(&desc->node, &pch->submitted_list);
@@ -2491,6 +2512,9 @@ static struct dma_pl330_desc *pl330_get_desc(struct dma_pl330_chan *pch)
desc->peri = peri_id ? pch->chan.chan_id : 0;
desc->rqcfg.pcfg = &pch->dmac->pcfg;
+ desc->cyclic = false;
+ desc->num_periods = 1;
+
dma_async_tx_descriptor_init(&desc->txd, &pch->chan);
return desc;
@@ -2560,10 +2584,8 @@ static struct dma_async_tx_descriptor *pl330_prep_dma_cyclic(
size_t period_len, enum dma_transfer_direction direction,
unsigned long flags)
{
- struct dma_pl330_desc *desc = NULL, *first = NULL;
+ struct dma_pl330_desc *desc = NULL;
struct dma_pl330_chan *pch = to_pchan(chan);
- struct pl330_dmac *pl330 = pch->dmac;
- unsigned int i;
dma_addr_t dst;
dma_addr_t src;
@@ -2576,65 +2598,39 @@ static struct dma_async_tx_descriptor *pl330_prep_dma_cyclic(
return NULL;
}
- for (i = 0; i < len / period_len; i++) {
- desc = pl330_get_desc(pch);
- if (!desc) {
- dev_err(pch->dmac->ddma.dev, "%s:%d Unable to fetch desc\n",
- __func__, __LINE__);
-
- if (!first)
- return NULL;
-
- spin_lock_irqsave(&pl330->pool_lock, flags);
-
- while (!list_empty(&first->node)) {
- desc = list_entry(first->node.next,
- struct dma_pl330_desc, node);
- list_move_tail(&desc->node, &pl330->desc_pool);
- }
-
- list_move_tail(&first->node, &pl330->desc_pool);
-
- spin_unlock_irqrestore(&pl330->pool_lock, flags);
+ desc = pl330_get_desc(pch);
+ if (!desc) {
+ dev_err(pch->dmac->ddma.dev, "%s:%d Unable to fetch desc\n",
+ __func__, __LINE__);
return NULL;
- }
-
- switch (direction) {
- case DMA_MEM_TO_DEV:
- desc->rqcfg.src_inc = 1;
- desc->rqcfg.dst_inc = 0;
- src = dma_addr;
- dst = pch->fifo_addr;
- break;
- case DMA_DEV_TO_MEM:
- desc->rqcfg.src_inc = 0;
- desc->rqcfg.dst_inc = 1;
- src = pch->fifo_addr;
- dst = dma_addr;
- break;
- default:
- break;
- }
-
- desc->rqtype = direction;
- desc->rqcfg.brst_size = pch->burst_sz;
- desc->rqcfg.brst_len = 1;
- desc->bytes_requested = period_len;
- fill_px(&desc->px, dst, src, period_len);
-
- if (!first)
- first = desc;
- else
- list_add_tail(&desc->node, &first->node);
+ }
- dma_addr += period_len;
+ switch (direction) {
+ case DMA_MEM_TO_DEV:
+ desc->rqcfg.src_inc = 1;
+ desc->rqcfg.dst_inc = 0;
+ src = dma_addr;
+ dst = pch->fifo_addr;
+ break;
+ case DMA_DEV_TO_MEM:
+ desc->rqcfg.src_inc = 0;
+ desc->rqcfg.dst_inc = 1;
+ src = pch->fifo_addr;
+ dst = dma_addr;
+ break;
+ default:
+ break;
}
- if (!desc)
- return NULL;
+ desc->rqtype = direction;
+ desc->rqcfg.brst_size = pch->burst_sz;
+ desc->rqcfg.brst_len = 1;
+ desc->bytes_requested = len;
+ fill_px(&desc->px, dst, src, period_len);
- pch->cyclic = true;
+ desc->cyclic = true;
+ desc->num_periods = len / period_len;
desc->txd.flags = flags;
return &desc->txd;
--
1.7.9.5
The patch remove one iteration outer loop in the _loop().
Removing loop saves 4 bytes of MicroCode buffer. This
savings make sense for free-running cyclic transfer
implementation.
DMALP_0 0
...
DMALPENDA_0 bjmpto_9
Signed-off-by: Alexander Kochetkov <[email protected]>
---
drivers/dma/pl330.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 56a2377..6126dde 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1268,7 +1268,7 @@ static inline int _loop(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
lpend.bjump = 0;
szlpend = _emit_LPEND(1, buf, &lpend);
- if (lcnt0) {
+ if (lcnt0 > 1) {
szlp *= 2;
szlpend *= 2;
}
@@ -1284,7 +1284,7 @@ static inline int _loop(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
off = 0;
- if (lcnt0) {
+ if (lcnt0 > 1) {
off += _emit_LP(dry_run, &buf[off], 0, lcnt0);
ljmp0 = off;
}
@@ -1300,7 +1300,7 @@ static inline int _loop(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
lpend.bjump = off - ljmp1;
off += _emit_LPEND(dry_run, &buf[off], &lpend);
- if (lcnt0) {
+ if (lcnt0 > 1) {
lpend.cond = ALWAYS;
lpend.forever = false;
lpend.loop = 0;
@@ -1309,7 +1309,7 @@ static inline int _loop(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
}
*bursts = lcnt1 * cyc;
- if (lcnt0)
+ if (lcnt0 > 1)
*bursts *= lcnt0;
return off;
--
1.7.9.5
On Fri, Apr 14, 2017 at 05:35:29PM +0300, Alexander Kochetkov wrote:
> The patch solve the I2S click problem on rk3188. Actually
> all the devices using pl330 should have I2S click problem
> due to pl330 cyclic transfer implementation.
>
> Current implementation depends on soft irq. If pl330 unable
> to submit next transfer in time some samples could be lost.
> The lost samples heard as sound click. In order to check
> lost samples, I've installed I2S interrupt handler to signal
> overflow/underflow conditions. Sometimes I saw overflow
> or underflow events and heard clicks.
>
> The patch setup cyclic transfer such a way, that transfer can
> run infinitely without CPU intervention. As a result,
> lost samples and clicks gone.
>
> Signed-off-by: Alexander Kochetkov <[email protected]>
> ---
> drivers/dma/pl330.c | 192 +++++++++++++++++++++++++--------------------------
> 1 file changed, 94 insertions(+), 98 deletions(-)
>
I didn't look at the code but testing fails.
HW: Odroid U3 (Exynos4412), pl330 used for dma for i2s, max98090 audio
codec
SW: 4.11.0-rc6-next-20170413
Only first aplay workes fine. After it, the next sound stalls:
[ 170.059271] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 170.059441] 1-...: (1 GPs behind) idle=11e/140000000000000/0 softirq=2335/2335 fqs=1049
[ 170.067474] (detected by 3, t=2102 jiffies, g=459, c=458, q=194)
[ 170.073555] Sending NMI from CPU 3 to CPUs 1:
[ 180.083093] rcu_preempt kthread starved for 999 jiffies! g459 c458 f0x0 RCU_GP_DOING_FQS(4) ->state=0x0
[ 180.086848] rcu_preempt S 0 8 2 0x00000000
[ 180.092344] [<c073e150>] (__schedule) from [<c073e4dc>] (schedule+0x4c/0xac)
[ 180.099353] [<c073e4dc>] (schedule) from [<c0742634>] (schedule_timeout+0x14c/0x23c)
[ 180.107081] [<c0742634>] (schedule_timeout) from [<c01702f4>] (rcu_gp_kthread+0x5dc/0x94c)
[ 180.115330] [<c01702f4>] (rcu_gp_kthread) from [<c01381f0>] (kthread+0x124/0x154)
[ 180.122793] [<c01381f0>] (kthread) from [<c0108438>] (ret_from_fork+0x14/0x3c)
Let me know if you need more data. I wonder why you haven't experience
this?
Best regards,
Krzysztof
Hi Alexander,
I have tested this series on rk3399 evb board and works well.
Tested-by: Sugar Zhang <[email protected]>
?? 4/14/2017 22:35, Alexander Kochetkov д??:
> Hello!
>
> This series contain free running cyclic transfer implementation for pl330.
> It affect ALL chips using pl330 (not only rockchip) and
> allow to run cyclic transfers without CPU intervention. As a result
> it fix sound clicks (observed and not yet observed) because
> sound clicks must be heard under heavy system load due to the way
> how cyclic transfers implemented now for pl330.
>
> My previous series[1] doesn't get enough attention (no one except me
> tested it). And it don't get upstream:
>
>> 8-03-2016, 6:03, Vinod Koul <[email protected]> *:
>> Overall this series looks okay, but can someone test this. I would not like
> pl330 to be broken again
>
> Now I was asked about the series[1] again by guys from Rockchip,
> so I send rebased against 4.10.10 version. Hope, someone might test it
> and confirm that patches work fine.
>
> Regards,
> Alexander.
>
> Alexander Kochetkov (2):
> dmaengine: pl330: make cyclic transfer free runnable
> dmaengine: pl330: don't emit code for one iteration loop
>
> drivers/dma/pl330.c | 200 +++++++++++++++++++++++++--------------------------
> 1 file changed, 98 insertions(+), 102 deletions(-)
>
Hello!
Just to let you know, that I got following test report from Stephen (thanks a lot!).
The patch won’t work due to full cyclic transfer doesn’t fit into mcbufsz (256 bytes long).
His application requested driver to do cyclic transfer with large number of cycles.
pl330 microcode has restriction on how far PC can change and this limit is 256 bytes,
so increasing mcbufsz will not solve problem. Don’t want to make jump bridges or
something like on the fly CPU modified microcode.
> 14 апр. 2017 г., в 23:24, Stephen Barber <[email protected]> написал(а):
>
> Hi Alexander,
>
> Thanks for your patches!
>
> I gave them a try on kevin (Samsung Chromebook Plus) with our
> chromeos-4.4 kernel branch, plus some cherry-picks on top to get
> things to apply cleanly. Here's what my tree looks like:
>
> 344daf13fa4e (HEAD -> apply-pl330) dmaengine: pl330: don't emit code
> for one iteration loop
> 062596b83fec dmaengine: pl330: make cyclic transfer free runnable
> 0e75f2647b8e dmaengine: pl330: fix double lock
> 7d22b54b79f2 dmaengine: pl330: remove unused ‘regs’
> 8769d7115cec dmaengine: pl330: do not generate unaligned access
> 65ad077f685b dmaengine: pl330: convert callback to helper function
> 368e7aa6dffd dmaengine: add support to provide error result from a DMA
> transation
> 8f8afe84472f dmaengine: Add helper function to prep for error reporting
> 2acc1e704232 dmaengine: pl330: explicitly freeup irq
> ed36cde14cf0 (m/master, cros/chromeos-4.4) UPSTREAM: audit: add tty
> field to LOGIN event
>
> Unfortunately when I start playing audio, things don't seem to work :(
> Rolling back HEAD to "dmaengine: pl330: fix double lock" works though.
>
> The only thing I get from dmesg relevant to pl330 is this:
> [ 59.203375] dma-pl330 ff6d0000.dma-controller:
> pl330_submit_req:1498 Try increasing mcbufsz (12810/256)
> [ 59.203395] dma-pl330 ff6d0000.dma-controller: fill_queue:2023 Bad Desc(2)
> [ 63.837390] dma-pl330 ff6d0000.dma-controller:
> pl330_submit_req:1498 Try increasing mcbufsz (12810/256)
> [ 63.837410] dma-pl330 ff6d0000.dma-controller: fill_queue:2023 Bad Desc(2)
>
> Thanks,
> Steve
>
> 14 апр. 2017 г., в 21:04, Krzysztof Kozlowski <[email protected]> написал(а):
>
> Let me know if you need more data. I wonder why you haven't experience
> this?
I don’t have idea why this might happen on Odroid and due to the fact the patch don’t
work for Stephen and I don’t have another idea how to implement that, it is better to
leave the problem along.
Krzysztof, thanks a lot for help. Really.
Regards,
Alexander.