by Martin Willi

[permalink] [raw]

Subject: Re: [PATCH 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

> If you're going to use sec you need to use at least 10 in order
> for it to be meaningful as shorter values often result in bogus
> numbers.

Ok, I'll use sec=10 in v2. There is no fundamental difference compared
to sec=1 (except for very short blocks):

testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 9498006 operations in 10 seconds (151968096 bytes)
test 1 (288 bit key, 64 byte blocks): 9423516 operations in 10 seconds (603105024 bytes)
test 2 (288 bit key, 256 byte blocks): 7597253 operations in 10 seconds (1944896768 bytes)
test 3 (288 bit key, 512 byte blocks): 6979753 operations in 10 seconds (3573633536 bytes)
test 4 (288 bit key, 1024 byte blocks): 5629328 operations in 10 seconds (5764431872 bytes)
test 5 (288 bit key, 2048 byte blocks): 4071284 operations in 10 seconds (8337989632 bytes)
test 6 (288 bit key, 4096 byte blocks): 2627325 operations in 10 seconds (10761523200 bytes)
test 7 (288 bit key, 8192 byte blocks): 1492531 operations in 10 seconds (12226813952 bytes)

testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 896305 operations in 1 seconds (14340880 bytes)
test 1 (288 bit key, 64 byte blocks): 929638 operations in 1 seconds (59496832 bytes)
test 2 (288 bit key, 256 byte blocks): 750673 operations in 1 seconds (192172288 bytes)
test 3 (288 bit key, 512 byte blocks): 687636 operations in 1 seconds (352069632 bytes)
test 4 (288 bit key, 1024 byte blocks): 555209 operations in 1 seconds (568534016 bytes)
test 5 (288 bit key, 2048 byte blocks): 402049 operations in 1 seconds (823396352 bytes)
test 6 (288 bit key, 4096 byte blocks): 259861 operations in 1 seconds (1064390656 bytes)
test 7 (288 bit key, 8192 byte blocks): 147283 operations in 1 seconds (1206542336 bytes)

> What sort of variance do you see with cycles?

Here a very fast and a very slow run (these are extremes, though):

testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 1 operation in 3765 cycles (16 bytes)
test 1 (288 bit key, 64 byte blocks): 1 operation in 3823 cycles (64 bytes)
test 2 (288 bit key, 256 byte blocks): 1 operation in 4728 cycles (256 bytes)
test 3 (288 bit key, 512 byte blocks): 1 operation in 5135 cycles (512 bytes)
test 4 (288 bit key, 1024 byte blocks): 1 operation in 7026 cycles (1024 bytes)
test 5 (288 bit key, 2048 byte blocks): 1 operation in 8804 cycles (2048 bytes)
test 6 (288 bit key, 4096 byte blocks): 1 operation in 14674 cycles (4096 bytes)
test 7 (288 bit key, 8192 byte blocks): 1 operation in 24616 cycles (8192 bytes)

testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 1 operation in 15031 cycles (16 bytes)
test 1 (288 bit key, 64 byte blocks): 1 operation in 15670 cycles (64 bytes)
test 2 (288 bit key, 256 byte blocks): 1 operation in 13034 cycles (256 bytes)
test 3 (288 bit key, 512 byte blocks): 1 operation in 14045 cycles (512 bytes)
test 4 (288 bit key, 1024 byte blocks): 1 operation in 20944 cycles (1024 bytes)
test 5 (288 bit key, 2048 byte blocks): 1 operation in 26445 cycles (2048 bytes)
test 6 (288 bit key, 4096 byte blocks): 1 operation in 31912 cycles (4096 bytes)
test 7 (288 bit key, 8192 byte blocks): 1 operation in 61366 cycles (8192 bytes)

> Do you get the same variance for other algorithms, e.g., cbc/aes?

Yes, another extreme:

testing speed of cbc(aes) (cbc(aes-aesni)) decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 593 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 1589 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 5311 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 20666 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 161483 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 593 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 1659 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 5609 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 21568 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 172484 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 612 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 1687 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 5836 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 22400 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 177799 cycles (8192 bytes)

testing speed of cbc(aes) (cbc(aes-aesni)) decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 1130 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 3002 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 10135 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 39911 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 308130 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 1151 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 3096 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 11486 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 42155 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 328148 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 1186 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 3260 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 11909 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 44794 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 325349 cycles (8192 bytes)

I see that with all algorithms, both kvm-virtualized and on bare metal.

Regards
Martin

2015-07-13 07:01:54

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

On Sat, Jul 11, 2015 at 10:15:52AM +0200, Martin Willi wrote:
>
> > If you're going to use sec you need to use at least 10 in order
> > for it to be meaningful as shorter values often result in bogus
> > numbers.
>
> Ok, I'll use sec=10 in v2. There is no fundamental difference compared
> to sec=1 (except for very short blocks):

Thanks.

> Yes, another extreme:
>
> testing speed of cbc(aes) (cbc(aes-aesni)) decryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 593 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 1589 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 5311 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 20666 cycles (1024 bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 161483 cycles (8192 bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 593 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 1659 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 5609 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 21568 cycles (1024 bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 172484 cycles (8192 bytes)
> test 10 (256 bit key, 16 byte blocks): 1 operation in 612 cycles (16 bytes)
> test 11 (256 bit key, 64 byte blocks): 1 operation in 1687 cycles (64 bytes)
> test 12 (256 bit key, 256 byte blocks): 1 operation in 5836 cycles (256 bytes)
> test 13 (256 bit key, 1024 byte blocks): 1 operation in 22400 cycles (1024 bytes)
> test 14 (256 bit key, 8192 byte blocks): 1 operation in 177799 cycles (8192 bytes)
>
> testing speed of cbc(aes) (cbc(aes-aesni)) decryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 1130 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 3002 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 10135 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 39911 cycles (1024 bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 308130 cycles (8192 bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 1151 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 3096 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 11486 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 42155 cycles (1024 bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 328148 cycles (8192 bytes)
> test 10 (256 bit key, 16 byte blocks): 1 operation in 1186 cycles (16 bytes)
> test 11 (256 bit key, 64 byte blocks): 1 operation in 3260 cycles (64 bytes)
> test 12 (256 bit key, 256 byte blocks): 1 operation in 11909 cycles (256 bytes)
> test 13 (256 bit key, 1024 byte blocks): 1 operation in 44794 cycles (1024 bytes)
> test 14 (256 bit key, 8192 byte blocks): 1 operation in 325349 cycles (8192 bytes)

Weird. I don't see this variance with tcrypt mode=200 at all.
I do however see it with mode=500 but that's probably just telling
us that the async tcrypt code path is buggy.

It'd be good if we can figure out what's going on here.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt