2013-03-26 20:58:41

by Tim Chen

[permalink] [raw]
Subject: [PATCH v2 00/10] Optimize SHA256 and SHA512 for Intel x86_64 with SSSE3, AVX or AVX2 instructions

Herbert,

The following patch series provides optimized SHA256 and SHA512 routines
using the SSSE3, AVX or AVX2 instructions on x86_64 for Intel cpus.
Depending on cpu capabilities, speedup between 40% to 70% or more can be achieved
over the generic SHA256 and SHA512 routines.

Tim

Version 2:

1. Check AVX2 feature directly in glue code
2. Add CONFGI_AS_AVX2 check for AVX2 code
3. Use ENTRY/ENDPROC macros for assembly routines
4. Fix SSSE3 feature check in glue code

Thanks to Peter Anvin, Jussi Kivilinna and Jim Kukunas for their reviews and comments.

Tim Chen (10):
Expose SHA256 generic routine to be callable externally.
Optimized sha256 x86_64 assembly routine using Supplemental SSE3
instructions.
Optimized sha256 x86_64 assembly routine with AVX instructions.
Optimized sha256 x86_64 routine using AVX2's RORX instructions
Create module providing optimized SHA256 routines using SSSE3, AVX or
AVX2 instructions.
Expose generic sha512 routine to be callable from other modules
Optimized SHA512 x86_64 assembly routine using Supplemental SSE3
instructions.
Optimized SHA512 x86_64 assembly routine using AVX instructions.
Optimized SHA512 x86_64 assembly routine using AVX2 RORX instruction.
Create module providing optimized SHA512 routines using SSSE3, AVX or
AVX2 instructions.

arch/x86/crypto/Makefile | 4 +
arch/x86/crypto/sha256-avx-asm.S | 496 +++++++++++++++++++++++
arch/x86/crypto/sha256-avx2-asm.S | 772 ++++++++++++++++++++++++++++++++++++
arch/x86/crypto/sha256-ssse3-asm.S | 506 +++++++++++++++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 275 +++++++++++++
arch/x86/crypto/sha512-avx-asm.S | 423 ++++++++++++++++++++
arch/x86/crypto/sha512-avx2-asm.S | 743 ++++++++++++++++++++++++++++++++++
arch/x86/crypto/sha512-ssse3-asm.S | 421 ++++++++++++++++++++
arch/x86/crypto/sha512_ssse3_glue.c | 282 +++++++++++++
crypto/Kconfig | 22 +
crypto/sha256_generic.c | 11 +-
crypto/sha512_generic.c | 13 +-
include/crypto/sha.h | 5 +
13 files changed, 3962 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/crypto/sha256-avx-asm.S
create mode 100644 arch/x86/crypto/sha256-avx2-asm.S
create mode 100644 arch/x86/crypto/sha256-ssse3-asm.S
create mode 100644 arch/x86/crypto/sha256_ssse3_glue.c
create mode 100644 arch/x86/crypto/sha512-avx-asm.S
create mode 100644 arch/x86/crypto/sha512-avx2-asm.S
create mode 100644 arch/x86/crypto/sha512-ssse3-asm.S
create mode 100644 arch/x86/crypto/sha512_ssse3_glue.c

--
1.7.11.7


2013-04-03 01:53:24

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] Optimize SHA256 and SHA512 for Intel x86_64 with SSSE3, AVX or AVX2 instructions

On Tue, Mar 26, 2013 at 01:58:39PM -0700, Tim Chen wrote:
> Herbert,
>
> The following patch series provides optimized SHA256 and SHA512 routines
> using the SSSE3, AVX or AVX2 instructions on x86_64 for Intel cpus.
> Depending on cpu capabilities, speedup between 40% to 70% or more can be achieved
> over the generic SHA256 and SHA512 routines.

All applied. Thanks!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2013-04-19 20:25:14

by Tim Chen

[permalink] [raw]
Subject: [PATCH] Fix prototype definitions of sha256_transform_asm, sha512_transform_asm

Herbert,

This is a follow on patch to the optimized sha256 and sha512 patch series that's just
merged into the crypto-dev. Let me know if you prefer me to respin the
patch series.

This patch corrects the prototype of sha256_transform_asm and
sha512_transform_asm function pointer declaration to static. It also
fixes a typo in sha512_ssse3_final function that affects the computation
of upper 64 bits of the buffer size.

Thanks.

Tim

Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/crypto/sha256_ssse3_glue.c | 2 +-
arch/x86/crypto/sha512_ssse3_glue.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index fa65453..8a0b711 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -53,7 +53,7 @@ asmlinkage void sha256_transform_rorx(const char *data, u32 *digest,
u64 rounds);
#endif

-asmlinkage void (*sha256_transform_asm)(const char *, u32 *, u64);
+static void (*sha256_transform_asm)(const char *, u32 *, u64);


static int sha256_ssse3_init(struct shash_desc *desc)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 295f790..3c844f2 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -52,7 +52,7 @@ asmlinkage void sha512_transform_rorx(const char *data, u64 *digest,
u64 rounds);
#endif

-asmlinkage void (*sha512_transform_asm)(const char *, u64 *, u64);
+static void (*sha512_transform_asm)(const char *, u64 *, u64);


static int sha512_ssse3_init(struct shash_desc *desc)
@@ -141,7 +141,7 @@ static int sha512_ssse3_final(struct shash_desc *desc, u8 *out)

/* save number of bits */
bits[1] = cpu_to_be64(sctx->count[0] << 3);
- bits[0] = cpu_to_be64(sctx->count[1] << 3) | sctx->count[0] >> 61;
+ bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61);

/* Pad out to 112 mod 128 and append length */
index = sctx->count[0] & 0x7f;
--
1.7.11.7


2013-04-25 13:04:28

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] Fix prototype definitions of sha256_transform_asm, sha512_transform_asm

On Fri, Apr 19, 2013 at 01:25:12PM -0700, Tim Chen wrote:
> Herbert,
>
> This is a follow on patch to the optimized sha256 and sha512 patch series that's just
> merged into the crypto-dev. Let me know if you prefer me to respin the
> patch series.
>
> This patch corrects the prototype of sha256_transform_asm and
> sha512_transform_asm function pointer declaration to static. It also
> fixes a typo in sha512_ssse3_final function that affects the computation
> of upper 64 bits of the buffer size.

Thanks, I've folded this into the original patches in order to
maintain bisectability.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2013-06-07 16:48:38

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH] Fix prototype definitions of sha256_transform_asm, sha512_transform_asm

On Thu, 2013-04-25 at 21:04 +0800, Herbert Xu wrote:
> On Fri, Apr 19, 2013 at 01:25:12PM -0700, Tim Chen wrote:
> > Herbert,
> >
> > This is a follow on patch to the optimized sha256 and sha512 patch series that's just
> > merged into the crypto-dev. Let me know if you prefer me to respin the
> > patch series.
> >
> > This patch corrects the prototype of sha256_transform_asm and
> > sha512_transform_asm function pointer declaration to static. It also
> > fixes a typo in sha512_ssse3_final function that affects the computation
> > of upper 64 bits of the buffer size.
>
> Thanks, I've folded this into the original patches in order to
> maintain bisectability.

Hi Herbert,

Wonder if you had a chance to merge this patch? I think it is not in
the latest crypto-dev.

Thanks.

Tim

2013-06-07 16:52:05

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH] Fix prototype definitions of sha256_transform_asm, sha512_transform_asm

On Fri, 2013-06-07 at 09:47 -0700, Tim Chen wrote:
> On Thu, 2013-04-25 at 21:04 +0800, Herbert Xu wrote:
> > On Fri, Apr 19, 2013 at 01:25:12PM -0700, Tim Chen wrote:
> > > Herbert,
> > >
> > > This is a follow on patch to the optimized sha256 and sha512 patch series that's just
> > > merged into the crypto-dev. Let me know if you prefer me to respin the
> > > patch series.
> > >
> > > This patch corrects the prototype of sha256_transform_asm and
> > > sha512_transform_asm function pointer declaration to static. It also
> > > fixes a typo in sha512_ssse3_final function that affects the computation
> > > of upper 64 bits of the buffer size.
> >
> > Thanks, I've folded this into the original patches in order to
> > maintain bisectability.
>
> Hi Herbert,
>
> Wonder if you had a chance to merge this patch? I think it is not in
> the latest crypto-dev.
>

Actually it has been merged. The kbuild test robot was picking up an
old tree.

Sorry for the noise. Thanks.

Tim