2021-12-07 19:32:58

by Jakub Kicinski

[permalink] [raw]
Subject: x86 AES crypto alignment

Hi!

The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
achieve in networking. Is there any reason for this? On any moderately
recent Intel platform aligned and unaligned vmovdq should have the same
performance (reportedly).

I'll hack it up and do some testing, but I thought it's worth asking
first..


2021-12-07 23:13:01

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Tue, 7 Dec 2021 at 20:33, Jakub Kicinski <[email protected]> wrote:
>
> Hi!
>
> The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> achieve in networking. Is there any reason for this? On any moderately
> recent Intel platform aligned and unaligned vmovdq should have the same
> performance (reportedly).
>
> I'll hack it up and do some testing, but I thought it's worth asking
> first..

Most likely that whoever contributed the code originally cared more
about squeezing the last drop of performance out of it (on the
microarchitecture of the era) than about general usefulness in real
world scenarios.

So yes, please go ahead and remove this restriction: please use the
builtin randomized tests (CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y) which
should generate calls with misaligned plain/ciphertexts, IVs etc with
sufficient coverage.

2021-12-08 04:40:42

by Herbert Xu

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Tue, Dec 07, 2021 at 11:32:52AM -0800, Jakub Kicinski wrote:
> Hi!
>
> The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> achieve in networking. Is there any reason for this? On any moderately
> recent Intel platform aligned and unaligned vmovdq should have the same
> performance (reportedly).

There is no such thing as an alignment requirement. If an algorithm
specifies an alignment and you pass it a request which is unaligned,
the Crypto API will automatically align the data for you.

So what is the actual problem here?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2021-12-08 05:29:13

by Jakub Kicinski

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Wed, 8 Dec 2021 15:40:37 +1100 Herbert Xu wrote:
> On Tue, Dec 07, 2021 at 11:32:52AM -0800, Jakub Kicinski wrote:
> > Hi!
> >
> > The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> > achieve in networking. Is there any reason for this? On any moderately
> > recent Intel platform aligned and unaligned vmovdq should have the same
> > performance (reportedly).
>
> There is no such thing as an alignment requirement. If an algorithm
> specifies an alignment and you pass it a request which is unaligned,
> the Crypto API will automatically align the data for you.
>
> So what is the actual problem here?

By align you mean copy right? I'm trying to avoid the copy.

2021-12-20 23:03:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Tue, 7 Dec 2021 21:29:07 -0800 Jakub Kicinski wrote:
> On Wed, 8 Dec 2021 15:40:37 +1100 Herbert Xu wrote:
> > On Tue, Dec 07, 2021 at 11:32:52AM -0800, Jakub Kicinski wrote:
> > > Hi!
> > >
> > > The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> > > achieve in networking. Is there any reason for this? On any moderately
> > > recent Intel platform aligned and unaligned vmovdq should have the same
> > > performance (reportedly).
> >
> > There is no such thing as an alignment requirement. If an algorithm
> > specifies an alignment and you pass it a request which is unaligned,
> > the Crypto API will automatically align the data for you.
> >
> > So what is the actual problem here?
>
> By align you mean copy right? I'm trying to avoid the copy.

Hm, I'm benchmarking things now and it appears to be a regression
introduced somewhere around 5.11 / 5.12. I don't see the memcpy
eating 20% of performance on 5.10. Bisection time.

2021-12-21 00:11:31

by Jakub Kicinski

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> On Tue, 7 Dec 2021 21:29:07 -0800 Jakub Kicinski wrote:
> > On Wed, 8 Dec 2021 15:40:37 +1100 Herbert Xu wrote:
> > > There is no such thing as an alignment requirement. If an algorithm
> > > specifies an alignment and you pass it a request which is unaligned,
> > > the Crypto API will automatically align the data for you.
> > >
> > > So what is the actual problem here?
> >
> > By align you mean copy right? I'm trying to avoid the copy.
>
> Hm, I'm benchmarking things now and it appears to be a regression
> introduced somewhere around 5.11 / 5.12. I don't see the memcpy
> eating 20% of performance on 5.10. Bisection time.

83c83e658863 ("crypto: aesni - refactor scatterlist processing")

is what introduced the regression.

2021-12-21 00:52:55

by Jakub Kicinski

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Mon, 20 Dec 2021 16:11:25 -0800 Jakub Kicinski wrote:
> On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> > Hm, I'm benchmarking things now and it appears to be a regression
> > introduced somewhere around 5.11 / 5.12. I don't see the memcpy
> > eating 20% of performance on 5.10. Bisection time.
>
> 83c83e658863 ("crypto: aesni - refactor scatterlist processing")
>
> is what introduced the regression.

Something like this?

---->8-----------

From: Jakub Kicinski <[email protected]>
Date: Mon, 20 Dec 2021 16:29:26 -0800
Subject: [PATCH] x86/aesni: don't require alignment

Looks like we take care of the meta-data (key, iv etc.) alignment
anyway and data can safely be unaligned. In fact we were feeding
unaligned data into crypto routines up until commit 83c83e658863
("crypto: aesni - refactor scatterlist processing") switched to
use the full skcipher API.

This fixes 21% performance regression in kTLS.

Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
(and running thru various kTLS packets).

Signed-off-by: Jakub Kicinski <[email protected]>
---
arch/x86/crypto/aesni-intel_glue.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index e09f4672dd38..41901ba9d3a2 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1107,7 +1107,7 @@ static struct aead_alg aesni_aeads[] = { {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aesni_rfc4106_gcm_ctx),
- .cra_alignmask = AESNI_ALIGN - 1,
+ .cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
}, {
@@ -1124,7 +1124,7 @@ static struct aead_alg aesni_aeads[] = { {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct generic_gcmaes_ctx),
- .cra_alignmask = AESNI_ALIGN - 1,
+ .cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
} };
--
2.31.1


2021-12-21 07:00:10

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Tue, 21 Dec 2021 at 01:52, Jakub Kicinski <[email protected]> wrote:
>
> On Mon, 20 Dec 2021 16:11:25 -0800 Jakub Kicinski wrote:
> > On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> > > Hm, I'm benchmarking things now and it appears to be a regression
> > > introduced somewhere around 5.11 / 5.12. I don't see the memcpy
> > > eating 20% of performance on 5.10. Bisection time.
> >
> > 83c83e658863 ("crypto: aesni - refactor scatterlist processing")
> >
> > is what introduced the regression.
>
> Something like this?
>
> ---->8-----------
>
> From: Jakub Kicinski <[email protected]>
> Date: Mon, 20 Dec 2021 16:29:26 -0800
> Subject: [PATCH] x86/aesni: don't require alignment
>
> Looks like we take care of the meta-data (key, iv etc.) alignment
> anyway and data can safely be unaligned. In fact we were feeding
> unaligned data into crypto routines up until commit 83c83e658863
> ("crypto: aesni - refactor scatterlist processing") switched to
> use the full skcipher API.
>
> This fixes 21% performance regression in kTLS.
>
> Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
> (and running thru various kTLS packets).
>
> Signed-off-by: Jakub Kicinski <[email protected]>

Acked-by: Ard Biesheuvel <[email protected]>

but it needs a Fixes tag.

Could you check whether this means that gcm_context_data in
gcmaes_crypt_by_sg() does not have to be aligned either? It would be
nice if we could drop that horrible hack as well.


> ---
> arch/x86/crypto/aesni-intel_glue.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> index e09f4672dd38..41901ba9d3a2 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -1107,7 +1107,7 @@ static struct aead_alg aesni_aeads[] = { {
> .cra_flags = CRYPTO_ALG_INTERNAL,
> .cra_blocksize = 1,
> .cra_ctxsize = sizeof(struct aesni_rfc4106_gcm_ctx),
> - .cra_alignmask = AESNI_ALIGN - 1,
> + .cra_alignmask = 0,
> .cra_module = THIS_MODULE,
> },
> }, {
> @@ -1124,7 +1124,7 @@ static struct aead_alg aesni_aeads[] = { {
> .cra_flags = CRYPTO_ALG_INTERNAL,
> .cra_blocksize = 1,
> .cra_ctxsize = sizeof(struct generic_gcmaes_ctx),
> - .cra_alignmask = AESNI_ALIGN - 1,
> + .cra_alignmask = 0,
> .cra_module = THIS_MODULE,
> },
> } };
> --
> 2.31.1
>

2021-12-21 07:25:07

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Tue, 21 Dec 2021 at 07:59, Ard Biesheuvel <[email protected]> wrote:
>
> On Tue, 21 Dec 2021 at 01:52, Jakub Kicinski <[email protected]> wrote:
> >
> > On Mon, 20 Dec 2021 16:11:25 -0800 Jakub Kicinski wrote:
> > > On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> > > > Hm, I'm benchmarking things now and it appears to be a regression
> > > > introduced somewhere around 5.11 / 5.12. I don't see the memcpy
> > > > eating 20% of performance on 5.10. Bisection time.
> > >
> > > 83c83e658863 ("crypto: aesni - refactor scatterlist processing")
> > >
> > > is what introduced the regression.
> >
> > Something like this?
> >
> > ---->8-----------
> >
> > From: Jakub Kicinski <[email protected]>
> > Date: Mon, 20 Dec 2021 16:29:26 -0800
> > Subject: [PATCH] x86/aesni: don't require alignment
> >
> > Looks like we take care of the meta-data (key, iv etc.) alignment
> > anyway and data can safely be unaligned. In fact we were feeding
> > unaligned data into crypto routines up until commit 83c83e658863
> > ("crypto: aesni - refactor scatterlist processing") switched to
> > use the full skcipher API.
> >
> > This fixes 21% performance regression in kTLS.
> >
> > Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
> > (and running thru various kTLS packets).
> >
> > Signed-off-by: Jakub Kicinski <[email protected]>
>
> Acked-by: Ard Biesheuvel <[email protected]>
>
> but it needs a Fixes tag.
>
> Could you check whether this means that gcm_context_data in
> gcmaes_crypt_by_sg() does not have to be aligned either? It would be
> nice if we could drop that horrible hack as well.
>

I guess you meant by "we take care of the meta-data (key, iv etc.)
alignment anyway" that we have these hacks for gcm_context_data (which
carries the key) and the IV, using oversized buffers on the stack and
open coded realignment.

It would be really nice if we could just get rid of all of that as
well, and just use {v}movdqu to load those items.




>
> > ---
> > arch/x86/crypto/aesni-intel_glue.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> > index e09f4672dd38..41901ba9d3a2 100644
> > --- a/arch/x86/crypto/aesni-intel_glue.c
> > +++ b/arch/x86/crypto/aesni-intel_glue.c
> > @@ -1107,7 +1107,7 @@ static struct aead_alg aesni_aeads[] = { {
> > .cra_flags = CRYPTO_ALG_INTERNAL,
> > .cra_blocksize = 1,
> > .cra_ctxsize = sizeof(struct aesni_rfc4106_gcm_ctx),
> > - .cra_alignmask = AESNI_ALIGN - 1,
> > + .cra_alignmask = 0,
> > .cra_module = THIS_MODULE,
> > },
> > }, {
> > @@ -1124,7 +1124,7 @@ static struct aead_alg aesni_aeads[] = { {
> > .cra_flags = CRYPTO_ALG_INTERNAL,
> > .cra_blocksize = 1,
> > .cra_ctxsize = sizeof(struct generic_gcmaes_ctx),
> > - .cra_alignmask = AESNI_ALIGN - 1,
> > + .cra_alignmask = 0,
> > .cra_module = THIS_MODULE,
> > },
> > } };
> > --
> > 2.31.1
> >

2021-12-21 10:52:57

by Herbert Xu

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Mon, Dec 20, 2021 at 04:52:51PM -0800, Jakub Kicinski wrote:
>
> Something like this?
>
> ---->8-----------
>
> From: Jakub Kicinski <[email protected]>
> Date: Mon, 20 Dec 2021 16:29:26 -0800
> Subject: [PATCH] x86/aesni: don't require alignment

You need to send the patch with a new Subject line as otherwise
patchwork will ignore it.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2021-12-21 14:57:43

by Jakub Kicinski

[permalink] [raw]
Subject: Re: x86 AES crypto alignment

On Tue, 21 Dec 2021 08:24:48 +0100 Ard Biesheuvel wrote:
> > Could you check whether this means that gcm_context_data in
> > gcmaes_crypt_by_sg() does not have to be aligned either? It would be
> > nice if we could drop that horrible hack as well.
>
> I guess you meant by "we take care of the meta-data (key, iv etc.)
> alignment anyway" that we have these hacks for gcm_context_data (which
> carries the key) and the IV, using oversized buffers on the stack and
> open coded realignment.
>
> It would be really nice if we could just get rid of all of that as
> well, and just use {v}movdqu to load those items.

Yup, exactly. I did something close to s/movdqa/movdqu/ initially,
but doing a competent job removing the alignment assumption would
be more effort. Let's see if I can see the copy if any perf profile...

FWIW there is a comment up top in arch/x86/crypto/aesni-intel_asm.S
which explains the aligned operations were chosen because they have
a shorter encoding. Seems like an intentional choice.