2012-08-09 08:46:33

by Qiang Liu

[permalink] [raw]
Subject: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

Hi all,

The following 8 patches enabling fsl-dma and talitos offload raid
operations for improving raid performance and balancing CPU load.

These patches include talitos, fsl-dma and carma module (caram uses
some features of fsl-dma).

Write performance will be improved by 25-30% tested by iozone.
Write performance is improved about 2% after using spin_lock_bh replace
spin_lock_irqsave.
CPU load will be reduced by 8%.

"fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
[1], and this patchseries, along with FSL_DMA && NET_DMA set seems
to be holding water, so this series gets my:"

Tested-by: Kim Phillips <[email protected]>

[1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-devices=4 \
/dev/ram[0123]

Changes in v7:
- add test result which is provided by Kim Phillips;
- correct one coding style issue in patch 5/8;
- add comments by Arnd Bergmann in patch 6/8;

Changes in v6:
- swap the order of original patch 3/6 and 4/6;
- merge Ira's patch to reduce the size of original patch;
- merge Ira's patch of carma in 8/8;
- update documents and descriptions according to Ira's advice;

Changes in v5:
- add detail description in patch 3/6 about the process of completed
descriptor, the process is in align with fsl-dma Reference Manual,
illustrate the potential risk and how to reproduce it;
- drop the patch 7/7 in v4 according to Timur's comments;

Changes in v4:
- fix an error in talitos when dest addr is same with src addr, dest
should be freed only one time if src is same with dest addr;
- correct coding style in fsl-dma according to Ira's comments;
- fix a race condition in fsl-dma fsl_tx_status(), remove the interface
which is used to free descriptors in queue ld_completed, this interface
has been included in fsldma_cleanup_descriptor(), in v3, there is one
place missed spin_lock protect;
- split the original patch 3/4 up to 2 patches 3/7 and 4/7 according to
Li Yang's comments;
- fix a warning of unitialized cookie;
- add memory copy self test in fsl-dma;
- add more detail description about use spin_lock_bh() to instead of
spin_lock_irqsave() according to Timur's comments.

Changes in v3:
- change release process of fsl-dma descriptor for resolve the
potential race condition;
- add test result when use spin_lock_bh replace spin_lock_irqsave;
- modify the benchmark results according to the latest patch.

Changes in v2:
- rebase onto cryptodev tree;
- split the patch 3/4 up to 3 independent patches;
- remove the patch 4/4, the fix is not for cryptodev tree;

Qiang Liu (8):
Talitos: Support for async_tx XOR offload
fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
fsl-dma: move functions to avoid forward declarations
fsl-dma: change release process of dma descriptor for supporting async_tx
fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
fsl-dma: fix a warning of unitialized cookie
carma: remove unnecessary DMA_INTERRUPT capability

drivers/crypto/Kconfig | 9 +
drivers/crypto/talitos.c | 413 ++++++++++++++++++++++++++
drivers/crypto/talitos.h | 53 ++++
drivers/dma/fsldma.c | 488 +++++++++++++++++--------------
drivers/dma/fsldma.h | 17 +-
drivers/misc/carma/carma-fpga-program.c | 1 -
drivers/misc/carma/carma-fpga.c | 2 +-
7 files changed, 761 insertions(+), 222 deletions(-)


2012-08-09 17:03:11

by Ira W. Snyder

[permalink] [raw]
Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

On Thu, Aug 09, 2012 at 04:19:35PM +0800, [email protected] wrote:
> Hi all,
>
> The following 8 patches enabling fsl-dma and talitos offload raid
> operations for improving raid performance and balancing CPU load.
>
> These patches include talitos, fsl-dma and carma module (caram uses
> some features of fsl-dma).
>
> Write performance will be improved by 25-30% tested by iozone.
> Write performance is improved about 2% after using spin_lock_bh replace
> spin_lock_irqsave.
> CPU load will be reduced by 8%.
>
> "fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
> [1], and this patchseries, along with FSL_DMA && NET_DMA set seems
> to be holding water, so this series gets my:"
>
> Tested-by: Kim Phillips <[email protected]>
>

The fsldma parts of the series all look great to me.

Thanks,
Ira

> [1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-devices=4 \
> /dev/ram[0123]
>
> Changes in v7:
> - add test result which is provided by Kim Phillips;
> - correct one coding style issue in patch 5/8;
> - add comments by Arnd Bergmann in patch 6/8;
>
> Changes in v6:
> - swap the order of original patch 3/6 and 4/6;
> - merge Ira's patch to reduce the size of original patch;
> - merge Ira's patch of carma in 8/8;
> - update documents and descriptions according to Ira's advice;
>
> Changes in v5:
> - add detail description in patch 3/6 about the process of completed
> descriptor, the process is in align with fsl-dma Reference Manual,
> illustrate the potential risk and how to reproduce it;
> - drop the patch 7/7 in v4 according to Timur's comments;
>
> Changes in v4:
> - fix an error in talitos when dest addr is same with src addr, dest
> should be freed only one time if src is same with dest addr;
> - correct coding style in fsl-dma according to Ira's comments;
> - fix a race condition in fsl-dma fsl_tx_status(), remove the interface
> which is used to free descriptors in queue ld_completed, this interface
> has been included in fsldma_cleanup_descriptor(), in v3, there is one
> place missed spin_lock protect;
> - split the original patch 3/4 up to 2 patches 3/7 and 4/7 according to
> Li Yang's comments;
> - fix a warning of unitialized cookie;
> - add memory copy self test in fsl-dma;
> - add more detail description about use spin_lock_bh() to instead of
> spin_lock_irqsave() according to Timur's comments.
>
> Changes in v3:
> - change release process of fsl-dma descriptor for resolve the
> potential race condition;
> - add test result when use spin_lock_bh replace spin_lock_irqsave;
> - modify the benchmark results according to the latest patch.
>
> Changes in v2:
> - rebase onto cryptodev tree;
> - split the patch 3/4 up to 3 independent patches;
> - remove the patch 4/4, the fix is not for cryptodev tree;
>
> Qiang Liu (8):
> Talitos: Support for async_tx XOR offload
> fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
> fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
> fsl-dma: move functions to avoid forward declarations
> fsl-dma: change release process of dma descriptor for supporting async_tx
> fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
> fsl-dma: fix a warning of unitialized cookie
> carma: remove unnecessary DMA_INTERRUPT capability
>
> drivers/crypto/Kconfig | 9 +
> drivers/crypto/talitos.c | 413 ++++++++++++++++++++++++++
> drivers/crypto/talitos.h | 53 ++++
> drivers/dma/fsldma.c | 488 +++++++++++++++++--------------
> drivers/dma/fsldma.h | 17 +-
> drivers/misc/carma/carma-fpga-program.c | 1 -
> drivers/misc/carma/carma-fpga.c | 2 +-
> 7 files changed, 761 insertions(+), 222 deletions(-)
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2012-08-14 09:04:16

by Liu Qiang-B32616

[permalink] [raw]
Subject: RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

Hi Vinod,

Would you like to apply this series from patch 2/8 to 7/8) in your tree?
The link as below,
http://patchwork.ozlabs.org/patch/176023/
http://patchwork.ozlabs.org/patch/176024/
http://patchwork.ozlabs.org/patch/176025/
http://patchwork.ozlabs.org/patch/176026/
http://patchwork.ozlabs.org/patch/176027/
http://patchwork.ozlabs.org/patch/176028/

After that, Herbert will merge patch 1/8, http://patchwork.ozlabs.org/patch/176022/
and Greg apply patch 8/8, http://patchwork.ozlabs.org/patch/176029/

Thanks.


> -----Original Message-----
> From: Ira W. Snyder [mailto:[email protected]]
> Sent: Friday, August 10, 2012 1:03 AM
> To: Liu Qiang-B32616
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
>
> On Thu, Aug 09, 2012 at 04:19:35PM +0800, [email protected] wrote:
> > Hi all,
> >
> > The following 8 patches enabling fsl-dma and talitos offload raid
> > operations for improving raid performance and balancing CPU load.
> >
> > These patches include talitos, fsl-dma and carma module (caram uses
> > some features of fsl-dma).
> >
> > Write performance will be improved by 25-30% tested by iozone.
> > Write performance is improved about 2% after using spin_lock_bh
> > replace spin_lock_irqsave.
> > CPU load will be reduced by 8%.
> >
> > "fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
> > [1], and this patchseries, along with FSL_DMA && NET_DMA set seems to
> > be holding water, so this series gets my:"
> >
> > Tested-by: Kim Phillips <[email protected]>
> >
>
> The fsldma parts of the series all look great to me.
>
> Thanks,
> Ira
>
> > [1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-
> devices=4 \
> > /dev/ram[0123]
> >
> > Changes in v7:
> > - add test result which is provided by Kim Phillips;
> > - correct one coding style issue in patch 5/8;
> > - add comments by Arnd Bergmann in patch 6/8;
> >
> > Changes in v6:
> > - swap the order of original patch 3/6 and 4/6;
> > - merge Ira's patch to reduce the size of original patch;
> > - merge Ira's patch of carma in 8/8;
> > - update documents and descriptions according to Ira's advice;
> >
> > Changes in v5:
> > - add detail description in patch 3/6 about the process of
> completed
> > descriptor, the process is in align with fsl-dma Reference Manual,
> > illustrate the potential risk and how to reproduce it;
> > - drop the patch 7/7 in v4 according to Timur's comments;
> >
> > Changes in v4:
> > - fix an error in talitos when dest addr is same with src addr,
> dest
> > should be freed only one time if src is same with dest addr;
> > - correct coding style in fsl-dma according to Ira's comments;
> > - fix a race condition in fsl-dma fsl_tx_status(), remove the
> interface
> > which is used to free descriptors in queue ld_completed, this
> interface
> > has been included in fsldma_cleanup_descriptor(), in v3, there is
> one
> > place missed spin_lock protect;
> > - split the original patch 3/4 up to 2 patches 3/7 and 4/7
> according to
> > Li Yang's comments;
> > - fix a warning of unitialized cookie;
> > - add memory copy self test in fsl-dma;
> > - add more detail description about use spin_lock_bh() to instead
> of
> > spin_lock_irqsave() according to Timur's comments.
> >
> > Changes in v3:
> > - change release process of fsl-dma descriptor for resolve the
> > potential race condition;
> > - add test result when use spin_lock_bh replace spin_lock_irqsave;
> > - modify the benchmark results according to the latest patch.
> >
> > Changes in v2:
> > - rebase onto cryptodev tree;
> > - split the patch 3/4 up to 3 independent patches;
> > - remove the patch 4/4, the fix is not for cryptodev tree;
> >
> > Qiang Liu (8):
> > Talitos: Support for async_tx XOR offload
> > fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
> > fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
> > fsl-dma: move functions to avoid forward declarations
> > fsl-dma: change release process of dma descriptor for supporting
> async_tx
> > fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
> > fsl-dma: fix a warning of unitialized cookie
> > carma: remove unnecessary DMA_INTERRUPT capability
> >
> > drivers/crypto/Kconfig | 9 +
> > drivers/crypto/talitos.c | 413
> ++++++++++++++++++++++++++
> > drivers/crypto/talitos.h | 53 ++++
> > drivers/dma/fsldma.c | 488 +++++++++++++++++------
> --------
> > drivers/dma/fsldma.h | 17 +-
> > drivers/misc/carma/carma-fpga-program.c | 1 -
> > drivers/misc/carma/carma-fpga.c | 2 +-
> > 7 files changed, 761 insertions(+), 222 deletions(-)
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > [email protected]
> > https://lists.ozlabs.org/listinfo/linuxppc-dev

2012-08-14 20:01:47

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

On Tue, Aug 14, 2012 at 2:04 AM, Liu Qiang-B32616 <[email protected]> wrote:
> Hi Vinod,
>
> Would you like to apply this series from patch 2/8 to 7/8) in your tree?
> The link as below,
> http://patchwork.ozlabs.org/patch/176023/
> http://patchwork.ozlabs.org/patch/176024/
> http://patchwork.ozlabs.org/patch/176025/
> http://patchwork.ozlabs.org/patch/176026/
> http://patchwork.ozlabs.org/patch/176027/
> http://patchwork.ozlabs.org/patch/176028/
>

Hi, sorry for the recent silence I've been transitioning and am now
just catching up. I'll take a look and then it's fine for these to go
through Vinod's tree.

--
Dan

2012-08-15 07:59:58

by Liu Qiang-B32616

[permalink] [raw]
Subject: RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Dan Williams
> Sent: Wednesday, August 15, 2012 4:02 AM
> To: Liu Qiang-B32616
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linuxppc-
> [email protected]; [email protected]; linux-
> [email protected]; Ira W. Snyder
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
>
> On Tue, Aug 14, 2012 at 2:04 AM, Liu Qiang-B32616 <[email protected]>
> wrote:
> > Hi Vinod,
> >
> > Would you like to apply this series from patch 2/8 to 7/8) in your tree?
> > The link as below,
> > http://patchwork.ozlabs.org/patch/176023/
> > http://patchwork.ozlabs.org/patch/176024/
> > http://patchwork.ozlabs.org/patch/176025/
> > http://patchwork.ozlabs.org/patch/176026/
> > http://patchwork.ozlabs.org/patch/176027/
> > http://patchwork.ozlabs.org/patch/176028/
> >
>
> Hi, sorry for the recent silence I've been transitioning and am now
> just catching up. I'll take a look and then it's fine for these to go
> through Vinod's tree.
Hello Dan,

Please review, this issue has been continued since many years. I hope we can fix
it this time. Thanks.

>
> --
> Dan

2012-08-29 11:16:00

by Liu Qiang-B32616

[permalink] [raw]
Subject: RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

Hi Dan,

Ping?
Can you apply these patches? Thanks.


- Qiang

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Dan Williams
> Sent: Wednesday, August 15, 2012 4:02 AM
> To: Liu Qiang-B32616
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linuxppc-
> [email protected]; [email protected]; linux-
> [email protected]; Ira W. Snyder
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
>
> On Tue, Aug 14, 2012 at 2:04 AM, Liu Qiang-B32616 <[email protected]>
> wrote:
> > Hi Vinod,
> >
> > Would you like to apply this series from patch 2/8 to 7/8) in your tree?
> > The link as below,
> > http://patchwork.ozlabs.org/patch/176023/
> > http://patchwork.ozlabs.org/patch/176024/
> > http://patchwork.ozlabs.org/patch/176025/
> > http://patchwork.ozlabs.org/patch/176026/
> > http://patchwork.ozlabs.org/patch/176027/
> > http://patchwork.ozlabs.org/patch/176028/
> >
>
> Hi, sorry for the recent silence I've been transitioning and am now
> just catching up. I'll take a look and then it's fine for these to go
> through Vinod's tree.
>
> --
> Dan

2012-08-29 14:53:18

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

On Wed, 2012-08-29 at 11:15 +0000, Liu Qiang-B32616 wrote:
> Hi Dan,
>
> Ping?
> Can you apply these patches? Thanks.
>

I'm working my way through them.

The first thing I notice is that xor_chan->desc_lock is taken
inconsistently. I.e. spin_lock_irqsave() in talitos_process_pending()
and spin_lock_bh() everywhere else. Have you run these patches with
lockdep?

--
Dan

2012-08-30 06:20:05

by Liu Qiang-B32616

[permalink] [raw]
Subject: RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

> -----Original Message-----
> From: Dan Williams [mailto:[email protected]]
> Sent: Wednesday, August 29, 2012 10:53 PM
> To: Liu Qiang-B32616
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; Ira W. Snyder
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
>
> On Wed, 2012-08-29 at 11:15 +0000, Liu Qiang-B32616 wrote:
> > Hi Dan,
> >
> > Ping?
> > Can you apply these patches? Thanks.
> >
>
> I'm working my way through them.
>
> The first thing I notice is that xor_chan->desc_lock is taken
> inconsistently. I.e. spin_lock_irqsave() in talitos_process_pending()
> and spin_lock_bh() everywhere else. Have you run these patches with
> lockdep?
Thanks for your reply.
LOCKDEP is enabled as you suggested, there is not any info about "inconsistent lock state" displayed.
I don't know whether it's enough.

I'm confused about the attribute of DMA_INTERRUPT, my understanding is this interface is only used to trigger an interrupt (make sure all former operations are finished before switching to other channels), but fsl-dma will trigger an interrupt by "Programmed Error". I'm wondering whether other hardware are same with fsl-dma (the interrupt is a normal interrupt, but not an error) i.e. xscale-iop?
If other hardware also trigger an interrupt by an abnormal error, maybe my patch 2/8 should be reverted because it violates the rules of this attribute.

BTW, could you please reply in the patch if you have any comments. Thanks.

>
> --
> Dan
>
>