2012-07-11 09:22:45

by Qiang Liu

[permalink] [raw]
Subject: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

Modify the release process of dma descriptor for avoiding exception when
enable config NET_DMA, release dma descriptor from 1st to last second, the
last descriptor which is reserved in current descriptor register may not be
completed, race condition will be raised if free current descriptor.

A race condition which is raised when use both of talitos and dmaengine to
offload xor is because napi scheduler (NET_DMA is enabled) will sync all
pending requests in dma channels, it affects the process of raid operations.
The descriptor is freed which is submitted just now, but async_tx must check
whether this depend tx descriptor is acked, there are poison contents in the
invalid address, then BUG_ON() is thrown, so this descriptor will be freed
in the next time.

Cc: Dan Williams <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: Li Yang <[email protected]>
Signed-off-by: Qiang Liu <[email protected]>
---
drivers/dma/fsldma.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4f2f212..0ba3e40 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
static void dma_do_tasklet(unsigned long data)
{
struct fsldma_chan *chan = (struct fsldma_chan *)data;
- struct fsl_desc_sw *desc, *_desc;
+ struct fsl_desc_sw *desc, *_desc, *prev = NULL;
LIST_HEAD(ld_cleanup);
unsigned long flags;
+ dma_addr_t curr_phys = get_cdar(chan);

chan_dbg(chan, "tasklet entry\n");

spin_lock_irqsave(&chan->desc_lock, flags);

+ /* find the descriptor which is already completed */
+ list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
+ if (prev && desc->async_tx.phys == curr_phys)
+ break;
+ prev = desc;
+ }
+
/* update the cookie if we have some descriptors to cleanup */
if (!list_empty(&chan->ld_running)) {
dma_cookie_t cookie;
@@ -1058,13 +1066,14 @@ static void dma_do_tasklet(unsigned long data)
* move the descriptors to a temporary list so we can drop the lock
* during the entire cleanup operation
*/
- list_splice_tail_init(&chan->ld_running, &ld_cleanup);
+ list_cut_position(&ld_cleanup, &chan->ld_running, &prev->node);

/* the hardware is now idle and ready for more */
chan->idle = true;

/*
- * Start any pending transactions automatically
+ * Start any pending transactions automatically if current descriptor
+ * list is completed
*
* In the ideal case, we keep the DMA controller busy while we go
* ahead and free the descriptors below.
--
1.7.5.1


2012-07-11 16:31:03

by Ira W. Snyder

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote:
> Modify the release process of dma descriptor for avoiding exception when
> enable config NET_DMA, release dma descriptor from 1st to last second, the
> last descriptor which is reserved in current descriptor register may not be
> completed, race condition will be raised if free current descriptor.
>
> A race condition which is raised when use both of talitos and dmaengine to
> offload xor is because napi scheduler (NET_DMA is enabled) will sync all
> pending requests in dma channels, it affects the process of raid operations.
> The descriptor is freed which is submitted just now, but async_tx must check
> whether this depend tx descriptor is acked, there are poison contents in the
> invalid address, then BUG_ON() is thrown, so this descriptor will be freed
> in the next time.
>

This patch seems to be covering up a bug in the driver, rather than
actually fixing it.

When it was written, it was expected that dma_do_tasklet() would run
only when the controller was idle.

> Cc: Dan Williams <[email protected]>
> Cc: Vinod Koul <[email protected]>
> Cc: Li Yang <[email protected]>
> Signed-off-by: Qiang Liu <[email protected]>
> ---
> drivers/dma/fsldma.c | 15 ++++++++++++---
> 1 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 4f2f212..0ba3e40 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
> static void dma_do_tasklet(unsigned long data)
> {
> struct fsldma_chan *chan = (struct fsldma_chan *)data;
> - struct fsl_desc_sw *desc, *_desc;
> + struct fsl_desc_sw *desc, *_desc, *prev = NULL;
> LIST_HEAD(ld_cleanup);
> unsigned long flags;
> + dma_addr_t curr_phys = get_cdar(chan);
>
> chan_dbg(chan, "tasklet entry\n");
>
> spin_lock_irqsave(&chan->desc_lock, flags);
>
> + /* find the descriptor which is already completed */
> + list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> + if (prev && desc->async_tx.phys == curr_phys)
> + break;
> + prev = desc;
> + }
> +

If the DMA controller was still busy processing transactions, you should
have gotten the printout "irq: controller not idle!" from
fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run.
If you did not get this printout, how was dma_do_tasklet() entered with
the controller still busy? I don't understand how it can happen.

If you test without your spin_lock_bh() and spin_unlock_bh() conversion
patch, do you still hit the error?

What happens if a user submits exactly one DMA transaction, and then
leaves the system idle? The callback for the last descriptor in the
chain will never get run, right? That's a bug.

> /* update the cookie if we have some descriptors to cleanup */
> if (!list_empty(&chan->ld_running)) {
> dma_cookie_t cookie;
> @@ -1058,13 +1066,14 @@ static void dma_do_tasklet(unsigned long data)
> * move the descriptors to a temporary list so we can drop the lock
> * during the entire cleanup operation
> */
> - list_splice_tail_init(&chan->ld_running, &ld_cleanup);
> + list_cut_position(&ld_cleanup, &chan->ld_running, &prev->node);
>
> /* the hardware is now idle and ready for more */
> chan->idle = true;
>
> /*
> - * Start any pending transactions automatically
> + * Start any pending transactions automatically if current descriptor
> + * list is completed
> *
> * In the ideal case, we keep the DMA controller busy while we go
> * ahead and free the descriptors below.
> --
> 1.7.5.1
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2012-07-12 07:12:17

by Liu Qiang-B32616

[permalink] [raw]
Subject: RE: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

> -----Original Message-----
> From: Ira W. Snyder [mailto:[email protected]]
> Sent: Thursday, July 12, 2012 12:31 AM
> To: Liu Qiang-B32616
> Cc: [email protected]; [email protected]; Vinod
> Koul; [email protected]; Dan Williams; [email protected]
> Subject: Re: [PATCH v2 3/4] fsl-dma: change the release process of dma
> descriptor
>
> On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote:
> > Modify the release process of dma descriptor for avoiding exception
> when
> > enable config NET_DMA, release dma descriptor from 1st to last second,
> the
> > last descriptor which is reserved in current descriptor register may
> not be
> > completed, race condition will be raised if free current descriptor.
> >
> > A race condition which is raised when use both of talitos and dmaengine
> to
> > offload xor is because napi scheduler (NET_DMA is enabled) will sync
> all
> > pending requests in dma channels, it affects the process of raid
> operations.
> > The descriptor is freed which is submitted just now, but async_tx must
> check
> > whether this depend tx descriptor is acked, there are poison contents
> in the
> > invalid address, then BUG_ON() is thrown, so this descriptor will be
> freed
> > in the next time.
> >
>
> This patch seems to be covering up a bug in the driver, rather than
> actually fixing it.
Yes, it's fine for fsl-dma itself, but it cannot work under complex condition.
For example, we enable NET_DMA and SEC xor offload, if NAPI task is waken up to
synchronize pending requests when raid5 dma copy was submitted, the process order
of raid5 tx descriptor is not as our expected. Unfortunately, sometime we have
to check this dependent tx descriptor which has was already released.

>
> When it was written, it was expected that dma_do_tasklet() would run
> only when the controller was idle.
>
> > Cc: Dan Williams <[email protected]>
> > Cc: Vinod Koul <[email protected]>
> > Cc: Li Yang <[email protected]>
> > Signed-off-by: Qiang Liu <[email protected]>
> > ---
> > drivers/dma/fsldma.c | 15 ++++++++++++---
> > 1 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 4f2f212..0ba3e40 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq,
> void *data)
> > static void dma_do_tasklet(unsigned long data)
> > {
> > struct fsldma_chan *chan = (struct fsldma_chan *)data;
> > - struct fsl_desc_sw *desc, *_desc;
> > + struct fsl_desc_sw *desc, *_desc, *prev = NULL;
> > LIST_HEAD(ld_cleanup);
> > unsigned long flags;
> > + dma_addr_t curr_phys = get_cdar(chan);
> >
> > chan_dbg(chan, "tasklet entry\n");
> >
> > spin_lock_irqsave(&chan->desc_lock, flags);
> >
> > + /* find the descriptor which is already completed */
> > + list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> > + if (prev && desc->async_tx.phys == curr_phys)
> > + break;
> > + prev = desc;
> > + }
> > +
>
> If the DMA controller was still busy processing transactions, you should
> have gotten the printout "irq: controller not idle!" from
> fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run.
> If you did not get this printout, how was dma_do_tasklet() entered with
> the controller still busy? I don't understand how it can happen.
Hi ira, this issue should be not related to dma status. The last descriptor
is left as usb null descriptor, actually, this descriptor is used as usb null
descriptor, at any case, I believe it has been already completed, but I
will released it in next chain, it doesn't affect the upper api to use the
data, and make sure async_tx api won't raise an exception
(BUG_ON(async_tx_test_ack(depend_tx)), this depend_tx is the desc->async_tx).

>
> If you test without your spin_lock_bh() and spin_unlock_bh() conversion
> patch, do you still hit the error?
The error still happened. spin_lock_bh() and spin_unlock_bh() are modified
after this patch.

>
> What happens if a user submits exactly one DMA transaction, and then
> leaves the system idle? The callback for the last descriptor in the
> chain will never get run, right? That's a bug.
It won't be happened if use fsl-dma, because the right order is
xor-copy-xor->callback, The callback which you concerned is implemented
in talitos driver, callback should be null in fsl-dma.

>
> > /* update the cookie if we have some descriptors to cleanup */
> > if (!list_empty(&chan->ld_running)) {
> > dma_cookie_t cookie;
> > @@ -1058,13 +1066,14 @@ static void dma_do_tasklet(unsigned long data)
> > * move the descriptors to a temporary list so we can drop the lock
> > * during the entire cleanup operation
> > */
> > - list_splice_tail_init(&chan->ld_running, &ld_cleanup);
> > + list_cut_position(&ld_cleanup, &chan->ld_running, &prev->node);
> >
> > /* the hardware is now idle and ready for more */
> > chan->idle = true;
> >
> > /*
> > - * Start any pending transactions automatically
> > + * Start any pending transactions automatically if current
> descriptor
> > + * list is completed
> > *
> > * In the ideal case, we keep the DMA controller busy while we go
> > * ahead and free the descriptors below.
> > --
> > 1.7.5.1
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > [email protected]
> > https://lists.ozlabs.org/listinfo/linuxppc-dev

2012-07-12 08:50:34

by Liu Qiang-B32616

[permalink] [raw]
Subject: RE: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

> -----Original Message-----
> From: Linuxppc-dev [mailto:linuxppc-dev-
> [email protected]] On Behalf Of Liu Qiang-
> B32616
> Sent: Thursday, July 12, 2012 3:12 PM
> To: Ira W. Snyder
> Cc: [email protected]; Vinod Koul; [email protected];
> Dan Williams; [email protected]; [email protected]
> Subject: RE: [PATCH v2 3/4] fsl-dma: change the release process of dma
> descriptor
>
> > -----Original Message-----
> > From: Ira W. Snyder [mailto:[email protected]]
> > Sent: Thursday, July 12, 2012 12:31 AM
> > To: Liu Qiang-B32616
> > Cc: [email protected].kernel.org; [email protected]; Vinod
> > Koul; [email protected]; Dan Williams; [email protected]
> > Subject: Re: [PATCH v2 3/4] fsl-dma: change the release process of dma
> > descriptor
> >
> > On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote:
> > > Modify the release process of dma descriptor for avoiding exception
> > when
> > > enable config NET_DMA, release dma descriptor from 1st to last
> > > second,
> > the
> > > last descriptor which is reserved in current descriptor register may
> > not be
> > > completed, race condition will be raised if free current descriptor.
> > >
> > > A race condition which is raised when use both of talitos and
> > > dmaengine
> > to
> > > offload xor is because napi scheduler (NET_DMA is enabled) will sync
> > all
> > > pending requests in dma channels, it affects the process of raid
> > operations.
> > > The descriptor is freed which is submitted just now, but async_tx
> > > must
> > check
> > > whether this depend tx descriptor is acked, there are poison
> > > contents
> > in the
> > > invalid address, then BUG_ON() is thrown, so this descriptor will be
> > freed
> > > in the next time.
> > >
> >
> > This patch seems to be covering up a bug in the driver, rather than
> > actually fixing it.
> Yes, it's fine for fsl-dma itself, but it cannot work under complex
> condition.
> For example, we enable NET_DMA and SEC xor offload, if NAPI task is waken
> up to synchronize pending requests when raid5 dma copy was submitted, the
> process order of raid5 tx descriptor is not as our expected.
> Unfortunately, sometime we have to check this dependent tx descriptor
> which has was already released.
>
> >
> > When it was written, it was expected that dma_do_tasklet() would run
> > only when the controller was idle.
> >
> > > Cc: Dan Williams <[email protected]>
> > > Cc: Vinod Koul <[email protected]>
> > > Cc: Li Yang <[email protected]>
> > > Signed-off-by: Qiang Liu <[email protected]>
> > > ---
> > > drivers/dma/fsldma.c | 15 ++++++++++++---
> > > 1 files changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > > 4f2f212..0ba3e40 100644
> > > --- a/drivers/dma/fsldma.c
> > > +++ b/drivers/dma/fsldma.c
> > > @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq,
> > void *data)
> > > static void dma_do_tasklet(unsigned long data) {
> > > struct fsldma_chan *chan = (struct fsldma_chan *)data;
> > > - struct fsl_desc_sw *desc, *_desc;
> > > + struct fsl_desc_sw *desc, *_desc, *prev = NULL;
> > > LIST_HEAD(ld_cleanup);
> > > unsigned long flags;
> > > + dma_addr_t curr_phys = get_cdar(chan);
> > >
> > > chan_dbg(chan, "tasklet entry\n");
> > >
> > > spin_lock_irqsave(&chan->desc_lock, flags);
> > >
> > > + /* find the descriptor which is already completed */
> > > + list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> > > + if (prev && desc->async_tx.phys == curr_phys)
> > > + break;
> > > + prev = desc;
> > > + }
> > > +
> >
> > If the DMA controller was still busy processing transactions, you
> > should have gotten the printout "irq: controller not idle!" from
> > fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run.
> > If you did not get this printout, how was dma_do_tasklet() entered
> > with the controller still busy? I don't understand how it can happen.
> Hi ira, this issue should be not related to dma status. The last
> descriptor is left as usb null descriptor, actually, this descriptor is
> used as usb null descriptor, at any case, I believe it has been already
> completed, but I will released it in next chain, it doesn't affect the
> upper api to use the data, and make sure async_tx api won't raise an
> exception (BUG_ON(async_tx_test_ack(depend_tx)), this depend_tx is the
> desc->async_tx).
I consider your concerns carefully, I think my implement is not a good/right
way.
First, there is possibility that happen to the callback is ignored.
Second, I missed some better ways which can resolve this issue, actually
async_tx api provide our methods to avoid this.

So, I agree with your concern. Thanks.
I will resolve your concerns in next version. Thanks again.

>
> >
> > If you test without your spin_lock_bh() and spin_unlock_bh()
> > conversion patch, do you still hit the error?
> The error still happened. spin_lock_bh() and spin_unlock_bh() are
> modified after this patch.
>
> >
> > What happens if a user submits exactly one DMA transaction, and then
> > leaves the system idle? The callback for the last descriptor in the
> > chain will never get run, right? That's a bug.
> It won't be happened if use fsl-dma, because the right order is
> xor-copy-xor->callback, The callback which you concerned is implemented
> in talitos driver, callback should be null in fsl-dma.
>
> >
> > > /* update the cookie if we have some descriptors to cleanup */
> > > if (!list_empty(&chan->ld_running)) {
> > > dma_cookie_t cookie;
> > > @@ -1058,13 +1066,14 @@ static void dma_do_tasklet(unsigned long data)
> > > * move the descriptors to a temporary list so we can drop the lock
> > > * during the entire cleanup operation
> > > */
> > > - list_splice_tail_init(&chan->ld_running, &ld_cleanup);
> > > + list_cut_position(&ld_cleanup, &chan->ld_running, &prev->node);
> > >
> > > /* the hardware is now idle and ready for more */
> > > chan->idle = true;
> > >
> > > /*
> > > - * Start any pending transactions automatically
> > > + * Start any pending transactions automatically if current
> > descriptor
> > > + * list is completed
> > > *
> > > * In the ideal case, we keep the DMA controller busy while we go
> > > * ahead and free the descriptors below.
> > > --
> > > 1.7.5.1
> > >
> > >
> > > _______________________________________________
> > > Linuxppc-dev mailing list
> > > [email protected]
> > > https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev