2014-09-28 20:24:23

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 0/3] crypto: aesni - fix and re-enable "by8" CTR variant

This series fixes the counter overflow handling of the "by8" CTR variant
which lead to failing cryptomgr tests and, in turn, disabling this
optimization with commit 7da4b29d496b.

Patch 1 fixes the bug, patch 2 removes some unused defines (left overs
from the unification of the initial source files) and patch 3 re-enables
the code.

The fix was tested by me, doing tcrypt and dm-crypt tests. It was also
tested by Romain who initially reported the issue.

The patches should go on top of crypto-2.6.git.

In case this doesn't get merged for v3.17, patches 1 and 3 may be cc'ed
to stable to propagate the fix.

Please apply!

Thanks,
Mathias


Mathias Krause (3):
crypto: aesni - fix counter overflow handling in "by8" variant
crypto: aesni - remove unused defines in "by8" variant
Revert "crypto: aesni - disable "by8" AVX CTR optimization"

arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 20 +++++++++++++++-----
arch/x86/crypto/aesni-intel_glue.c | 4 ++--
2 files changed, 17 insertions(+), 7 deletions(-)

--
1.7.10.4


2014-09-28 20:24:26

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 2/3] crypto: aesni - remove unused defines in "by8" variant

The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Chandramouli Narayanan <[email protected]>
---
arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index a029bc744244..2df2a0298f5a 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -79,9 +79,6 @@
#define xcounter %xmm8
#define xbyteswap %xmm9
#define xkey0 %xmm10
-#define xkey3 %xmm11
-#define xkey6 %xmm12
-#define xkey9 %xmm13
#define xkey4 %xmm11
#define xkey8 %xmm12
#define xkey12 %xmm13
--
1.7.10.4

2014-09-28 20:24:28

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 3/3] Revert "crypto: aesni - disable "by8" AVX CTR optimization"

This reverts commit 7da4b29d496b1389d3a29b55d3668efecaa08ebd.

Now, that the issue is fixed, we can re-enable the code.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Chandramouli Narayanan <[email protected]>
---
arch/x86/crypto/aesni-intel_glue.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a7ccd57f19e4..888950f29fd9 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
crypto_inc(ctrblk, AES_BLOCK_SIZE);
}

-#if 0 /* temporary disabled due to failing crypto tests */
+#ifdef CONFIG_AS_AVX
static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv)
{
@@ -1522,7 +1522,7 @@ static int __init aesni_init(void)
aesni_gcm_dec_tfm = aesni_gcm_dec;
}
aesni_ctr_enc_tfm = aesni_ctr_enc;
-#if 0 /* temporary disabled due to failing crypto tests */
+#ifdef CONFIG_AS_AVX
if (cpu_has_avx) {
/* optimize performance of ctr mode encryption transform */
aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
--
1.7.10.4

2014-09-28 20:24:24

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 1/3] crypto: aesni - fix counter overflow handling in "by8" variant

The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d496b
("crypto: aesni - disable "by8" AVX CTR optimization").

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.

Tested-by: Romain Francoise <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Cc: Chandramouli Narayanan <[email protected]>
---
arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index f091f122ed24..a029bc744244 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -108,6 +108,10 @@

byteswap_const:
.octa 0x000102030405060708090A0B0C0D0E0F
+ddq_low_msk:
+ .octa 0x0000000000000000FFFFFFFFFFFFFFFF
+ddq_high_add_1:
+ .octa 0x00000000000000010000000000000000
ddq_add_1:
.octa 0x00000000000000000000000000000001
ddq_add_2:
@@ -169,7 +173,12 @@ ddq_add_8:
.rept (by - 1)
club DDQ_DATA, i
club XDATA, i
- vpaddd var_ddq_add(%rip), xcounter, var_xdata
+ vpaddq var_ddq_add(%rip), xcounter, var_xdata
+ vptest ddq_low_msk(%rip), var_xdata
+ jnz 1f
+ vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata
+ vpaddq ddq_high_add_1(%rip), xcounter, xcounter
+ 1:
vpshufb xbyteswap, var_xdata, var_xdata
.set i, (i +1)
.endr
@@ -178,7 +187,11 @@ ddq_add_8:

vpxor xkey0, xdata0, xdata0
club DDQ_DATA, by
- vpaddd var_ddq_add(%rip), xcounter, xcounter
+ vpaddq var_ddq_add(%rip), xcounter, xcounter
+ vptest ddq_low_msk(%rip), xcounter
+ jnz 1f
+ vpaddq ddq_high_add_1(%rip), xcounter, xcounter
+ 1:

.set i, 1
.rept (by - 1)
--
1.7.10.4

2014-10-02 07:15:24

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 0/3] crypto: aesni - fix and re-enable "by8" CTR variant

On Sun, Sep 28, 2014 at 10:23:58PM +0200, Mathias Krause wrote:
> This series fixes the counter overflow handling of the "by8" CTR variant
> which lead to failing cryptomgr tests and, in turn, disabling this
> optimization with commit 7da4b29d496b.
>
> Patch 1 fixes the bug, patch 2 removes some unused defines (left overs
> from the unification of the initial source files) and patch 3 re-enables
> the code.
>
> The fix was tested by me, doing tcrypt and dm-crypt tests. It was also
> tested by Romain who initially reported the issue.
>
> The patches should go on top of crypto-2.6.git.
>
> In case this doesn't get merged for v3.17, patches 1 and 3 may be cc'ed
> to stable to propagate the fix.
>
> Please apply!

All applied. Thanks!

I'll hold off on the stable propagation until this is in the tree
and has been tested for a while first.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt