2024-04-06 00:28:10

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions

This series adds missing vzeroupper instructions before returning from
code that uses ymm registers.

Eric Biggers (3):
crypto: x86/nh-avx2 - add missing vzeroupper
crypto: x86/sha256-avx2 - add missing vzeroupper
crypto: x86/sha512-avx2 - add missing vzeroupper

arch/x86/crypto/nh-avx2-x86_64.S | 1 +
arch/x86/crypto/sha256-avx2-asm.S | 1 +
arch/x86/crypto/sha512-avx2-asm.S | 1 +
3 files changed, 3 insertions(+)


base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
--
2.44.0



2024-04-06 00:28:12

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper

From: Eric Biggers <[email protected]>

Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it. This is necessary to avoid reducing the performance of SSE
code.

Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)

vpaddq T5, T4, T4
vpaddq T1, T0, T0
vpaddq T4, T0, T0
vmovdqu T0, (HASH)
+ vzeroupper
RET
SYM_FUNC_END(nh_avx2)
--
2.44.0


2024-04-06 00:28:13

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper

From: Eric Biggers <[email protected]>

Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.

Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/crypto/sha256-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9918212faf91..0ffb072be956 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha256_transform_rorx)

.section .rodata.cst512.K256, "aM", @progbits, 512
.align 64
--
2.44.0


2024-04-06 00:28:14

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper

From: Eric Biggers <[email protected]>

Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.

Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/crypto/sha512-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index f08496cd6870..24973f42c43f 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
pop %r14
pop %r13
pop %r12
pop %rbx

+ vzeroupper
RET
SYM_FUNC_END(sha512_transform_rorx)

########################################################################
### Binary Data
--
2.44.0


2024-04-09 22:38:31

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
> popq %r15
> popq %r14
> popq %r13
> popq %r12
> popq %rbx
> + vzeroupper
> RET
> SYM_FUNC_END(sha256_transform_rorx)
>
> .section .rodata.cst512.K256, "aM", @progbits, 512
> .align 64

Acked-by: Tim Chen <[email protected]>

2024-04-09 22:38:52

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
> index f08496cd6870..24973f42c43f 100644
> --- a/arch/x86/crypto/sha512-avx2-asm.S
> +++ b/arch/x86/crypto/sha512-avx2-asm.S
> @@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
> pop %r14
> pop %r13
> pop %r12
> pop %rbx
>
> + vzeroupper
> RET
> SYM_FUNC_END(sha512_transform_rorx)
>
> ########################################################################
> ### Binary Data

Acked-by: Tim Chen <[email protected]>

2024-04-09 22:42:38

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <[email protected]>
> >
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it. This is necessary to avoid reducing the performance of SSE
> > code.
> >
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <[email protected]>
> > ---
> > arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >
> > vpaddq T5, T4, T4
> > vpaddq T1, T0, T0
> > vpaddq T4, T0, T0
> > vmovdqu T0, (HASH)
> > + vzeroupper
> > RET
> > SYM_FUNC_END(nh_avx2)

Acked-by: Tim Chen <[email protected]>