This series adds missing vzeroupper instructions before returning from
code that uses ymm registers.
Eric Biggers (3):
crypto: x86/nh-avx2 - add missing vzeroupper
crypto: x86/sha256-avx2 - add missing vzeroupper
crypto: x86/sha512-avx2 - add missing vzeroupper
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
arch/x86/crypto/sha256-avx2-asm.S | 1 +
arch/x86/crypto/sha512-avx2-asm.S | 1 +
3 files changed, 3 insertions(+)
base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
--
2.44.0
From: Eric Biggers <[email protected]>
Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it. This is necessary to avoid reducing the performance of SSE
code.
Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
vpaddq T5, T4, T4
vpaddq T1, T0, T0
vpaddq T4, T0, T0
vmovdqu T0, (HASH)
+ vzeroupper
RET
SYM_FUNC_END(nh_avx2)
--
2.44.0
From: Eric Biggers <[email protected]>
Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.
Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/crypto/sha256-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9918212faf91..0ffb072be956 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha256_transform_rorx)
.section .rodata.cst512.K256, "aM", @progbits, 512
.align 64
--
2.44.0
From: Eric Biggers <[email protected]>
Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.
Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/crypto/sha512-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index f08496cd6870..24973f42c43f 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
pop %r14
pop %r13
pop %r12
pop %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha512_transform_rorx)
########################################################################
### Binary Data
--
2.44.0
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
> popq %r15
> popq %r14
> popq %r13
> popq %r12
> popq %rbx
> + vzeroupper
> RET
> SYM_FUNC_END(sha256_transform_rorx)
>
> .section .rodata.cst512.K256, "aM", @progbits, 512
> .align 64
Acked-by: Tim Chen <[email protected]>
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
> index f08496cd6870..24973f42c43f 100644
> --- a/arch/x86/crypto/sha512-avx2-asm.S
> +++ b/arch/x86/crypto/sha512-avx2-asm.S
> @@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
> pop %r14
> pop %r13
> pop %r12
> pop %rbx
>
> + vzeroupper
> RET
> SYM_FUNC_END(sha512_transform_rorx)
>
> ########################################################################
> ### Binary Data
Acked-by: Tim Chen <[email protected]>
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <[email protected]>
> >
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it. This is necessary to avoid reducing the performance of SSE
> > code.
> >
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <[email protected]>
> > ---
> > arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >
> > vpaddq T5, T4, T4
> > vpaddq T1, T0, T0
> > vpaddq T4, T0, T0
> > vmovdqu T0, (HASH)
> > + vzeroupper
> > RET
> > SYM_FUNC_END(nh_avx2)
Acked-by: Tim Chen <[email protected]>