This patch series optimized CRC32C calculations with PCLMULQDQ
instruction for crc32c-intel module. It speeds up the original
implementation by 1.6x for 1K buffer and by 3x for buffer 4k or
more. The tcrypt module was enhanced for doing speed test
on crc32c calculations.
Tim
Signed-off-by: Tim Chen <[email protected]>
---
Tim Chen (3):
Rename crc32c-intel.c to crc32c-intel_glue.c
Optimize CRC32C calculation with PCLMULQDQ instruction
Added speed test in tcrypt for crc32c
arch/x86/crypto/Makefile | 1 +
.../crypto/{crc32c-intel.c => crc32c-intel_glue.c} | 75 ++++
arch/x86/crypto/crc32c-pcl-intel-asm.S | 460 ++++++++++++++++++++
crypto/tcrypt.c | 4 +
4 files changed, 540 insertions(+), 0 deletions(-)
rename arch/x86/crypto/{crc32c-intel.c => crc32c-intel_glue.c} (70%)
create mode 100644 arch/x86/crypto/crc32c-pcl-intel-asm.S
--
1.7.7.6
From: Tim Chen <[email protected]>
Date: Tue, 25 Sep 2012 14:50:08 -0700
> This patch series optimized CRC32C calculations with PCLMULQDQ
> instruction for crc32c-intel module. It speeds up the original
> implementation by 1.6x for 1K buffer and by 3x for buffer 4k or
> more. The tcrypt module was enhanced for doing speed test
> on crc32c calculations.
>
> Signed-off-by: Tim Chen <[email protected]>
Great work.
I intend to do something nearly identical on sparc64 since we have
similar instructions in the form of "xmulx" and "xmulxhi" which return
the lower and upper 64-bits (respectively) of a XOR multiply.
On Tue, Sep 25, 2012 at 02:50:08PM -0700, Tim Chen wrote:
> This patch series optimized CRC32C calculations with PCLMULQDQ
> instruction for crc32c-intel module. It speeds up the original
> implementation by 1.6x for 1K buffer and by 3x for buffer 4k or
> more. The tcrypt module was enhanced for doing speed test
> on crc32c calculations.
>
> Tim
>
> Signed-off-by: Tim Chen <[email protected]>
All applied. Thanks Tim!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt