On Tue, Sep 28, 2021 at 11:04:03PM +0200, Ard Biesheuvel wrote:
> On Tue, 28 Sept 2021 at 08:27, Eric Biggers <[email protected]> wrote:
> >
> > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote:
> > > To improve performance on cores with deep piplines such as A72,N1,
> > > implement gcm(aes) using a 4-way interleave of aes and ghash (totally
> > > 8 blocks in parallel), which can make full utilize of pipelines rather
> > > than the 4-way interleave we used currently. It can gain about 20% for
> > > big data sizes such that 8k.
> > >
> > > This is a complete new version of the GCM part of the combined GCM/GHASH
> > > driver, it will co-exist with the old driver, only serve for big data
> > > sizes. Instead of interleaving four invocations of AES where each chunk
> > > of 64 bytes is encrypted first and then ghashed, the new version uses a
> > > more coarse grained approach where a chunk of 64 bytes is encrypted and
> > > at the same time, one chunk of 64 bytes is ghashed (or ghashed and
> > > decrypted in the converse case).
> > >
> > > The table below compares the performance of the old driver and the new
> > > one on various micro-architectures and running in various modes with
> > > various data sizes.
> > >
> > > | AES-128 | AES-192 | AES-256 |
> > > #bytes | 1024 | 1420 | 8k | 1024 | 1420 | 8k | 1024 | 1420 | 8k |
> > > -------+------+------+-----+------+------+-----+------+------+-----+
> > > A72 | 5.5% | 12% | 25% | 2.2% | 9.5%| 23%| -1% | 6.7%| 19% |
> > > A57 |-0.5% | 9.3%| 32% | -3% | 6.3%| 26%| -6% | 3.3%| 21% |
> > > N1 | 0.4% | 7.6%|24.5%| -2% | 5% | 22%| -4% | 2.7%| 20% |
> > >
> > > Signed-off-by: XiaokangQian <[email protected]>
> >
> > Does this pass the self-tests, including the fuzz tests which are enabled by
> > CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
> >
>
> Please test both little-endian and big-endian. (Note that you don't
> need a big-endian user space for this - the self tests are executed
> before the rootfs is mounted)
>
> Also, you will have to rebase this onto the latest cryptodev tree,
> which carries some changes I made recently to this driver.
XiaokangQian -- did you post an updated version of this? It would end up
going via Herbert, but I was keeping half an eye on it and it all seems
to have gone quiet.
Thanks,
Will
Hi Will:
I will post the update version 2 of this patch today or tomorrow.
Sorry for the delay.
> -----Original Message-----
> From: Will Deacon <[email protected]>
> Sent: Tuesday, December 14, 2021 2:29 AM
> To: Ard Biesheuvel <[email protected]>
> Cc: Eric Biggers <[email protected]>; Xiaokang Qian
> <[email protected]>; Herbert Xu <[email protected]>;
> David S. Miller <[email protected]>; Catalin Marinas
> <[email protected]>; nd <[email protected]>; Linux Crypto Mailing List
> <[email protected]>; Linux ARM <linux-arm-
> [email protected]>; Linux Kernel Mailing List <linux-
> [email protected]>
> Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way
> interleave of aes and ghash
>
> On Tue, Sep 28, 2021 at 11:04:03PM +0200, Ard Biesheuvel wrote:
> > On Tue, 28 Sept 2021 at 08:27, Eric Biggers <[email protected]> wrote:
> > >
> > > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote:
> > > > To improve performance on cores with deep piplines such as A72,N1,
> > > > implement gcm(aes) using a 4-way interleave of aes and ghash
> > > > (totally
> > > > 8 blocks in parallel), which can make full utilize of pipelines
> > > > rather than the 4-way interleave we used currently. It can gain
> > > > about 20% for big data sizes such that 8k.
> > > >
> > > > This is a complete new version of the GCM part of the combined
> > > > GCM/GHASH driver, it will co-exist with the old driver, only serve
> > > > for big data sizes. Instead of interleaving four invocations of
> > > > AES where each chunk of 64 bytes is encrypted first and then
> > > > ghashed, the new version uses a more coarse grained approach where
> > > > a chunk of 64 bytes is encrypted and at the same time, one chunk
> > > > of 64 bytes is ghashed (or ghashed and decrypted in the converse case).
> > > >
> > > > The table below compares the performance of the old driver and the
> > > > new one on various micro-architectures and running in various
> > > > modes with various data sizes.
> > > >
> > > > | AES-128 | AES-192 | AES-256 |
> > > > #bytes | 1024 | 1420 | 8k | 1024 | 1420 | 8k | 1024 | 1420 | 8k |
> > > > -------+------+------+-----+------+------+-----+------+------+-----+
> > > > A72 | 5.5% | 12% | 25% | 2.2% | 9.5%| 23%| -1% | 6.7%| 19% |
> > > > A57 |-0.5% | 9.3%| 32% | -3% | 6.3%| 26%| -6% | 3.3%| 21% |
> > > > N1 | 0.4% | 7.6%|24.5%| -2% | 5% | 22%| -4% |
> > > > 2.7%| 20% |
> > > >
> > > > Signed-off-by: XiaokangQian <[email protected]>
> > >
> > > Does this pass the self-tests, including the fuzz tests which are
> > > enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
> > >
> >
> > Please test both little-endian and big-endian. (Note that you don't
> > need a big-endian user space for this - the self tests are executed
> > before the rootfs is mounted)
> >
> > Also, you will have to rebase this onto the latest cryptodev tree,
> > which carries some changes I made recently to this driver.
>
> XiaokangQian -- did you post an updated version of this? It would end up
> going via Herbert, but I was keeping half an eye on it and it all seems to have
> gone quiet.
>
> Thanks,
>
> Will
On Tue, 14 Dec 2021 at 02:40, Xiaokang Qian <[email protected]> wrote:
>
> Hi Will:
> I will post the update version 2 of this patch today or tomorrow.
> Sorry for the delay.
>
Great, but please make sure you run the extended test suite.
I applied this version of the patch to test the performance delta
between the old and the new version on TX2, but it hit a failure in
the self test:
[ 0.592203] alg: aead: gcm-aes-ce decryption unexpectedly succeeded
on test vector "random: alen=91 plen=5326 authsize=16 klen=32
novrfy=1"; expected_error=-EBADMSG, cfg="random: inplace use_finup
src_divs=[100.0%@+3779] key_offset=43"
It's non-deterministic, though, so it may take a few attempts to reproduce it.
As for the performance delta, your code is 18% slower on TX2 for 1420
byte packets using AES-256 (and 9% slower on AES-192). In your
results, AES-256 does not outperform the old code as much as it does
with smaller key sizes either.
Is this something that can be solved? If not, the numbers are not as
appealing, to be honest, given the substantial performance regressions
on the other micro-architecture.
--
Ard.
Tcrypt output follows
OLD CODE
testing speed of gcm(aes) (gcm-aes-ce) encryption
test 0 (128 bit key, 16 byte blocks): 2023626 operations in 1 seconds
(32378016 bytes)
test 1 (128 bit key, 64 byte blocks): 2005175 operations in 1 seconds
(128331200 bytes)
test 2 (128 bit key, 256 byte blocks): 1408367 operations in 1 seconds
(360541952 bytes)
test 3 (128 bit key, 512 byte blocks): 1011877 operations in 1 seconds
(518081024 bytes)
test 4 (128 bit key, 1024 byte blocks): 646552 operations in 1 seconds
(662069248 bytes)
test 5 (128 bit key, 1420 byte blocks): 490188 operations in 1 seconds
(696066960 bytes)
test 6 (128 bit key, 4096 byte blocks): 204423 operations in 1 seconds
(837316608 bytes)
test 7 (128 bit key, 8192 byte blocks): 105149 operations in 1 seconds
(861380608 bytes)
test 8 (192 bit key, 16 byte blocks): 1924506 operations in 1 seconds
(30792096 bytes)
test 9 (192 bit key, 64 byte blocks): 1944413 operations in 1 seconds
(124442432 bytes)
test 10 (192 bit key, 256 byte blocks): 1337001 operations in 1
seconds (342272256 bytes)
test 11 (192 bit key, 512 byte blocks): 941146 operations in 1 seconds
(481866752 bytes)
test 12 (192 bit key, 1024 byte blocks): 590614 operations in 1
seconds (604788736 bytes)
test 13 (192 bit key, 1420 byte blocks): 443363 operations in 1
seconds (629575460 bytes)
test 14 (192 bit key, 4096 byte blocks): 182890 operations in 1
seconds (749117440 bytes)
test 15 (192 bit key, 8192 byte blocks): 93813 operations in 1 seconds
(768516096 bytes)
test 16 (256 bit key, 16 byte blocks): 1886970 operations in 1 seconds
(30191520 bytes)
test 17 (256 bit key, 64 byte blocks): 1893574 operations in 1 seconds
(121188736 bytes)
test 18 (256 bit key, 256 byte blocks): 1245478 operations in 1
seconds (318842368 bytes)
test 19 (256 bit key, 512 byte blocks): 865507 operations in 1 seconds
(443139584 bytes)
test 20 (256 bit key, 1024 byte blocks): 537822 operations in 1
seconds (550729728 bytes)
test 21 (256 bit key, 1420 byte blocks): 401451 operations in 1
seconds (570060420 bytes)
test 22 (256 bit key, 4096 byte blocks): 164378 operations in 1
seconds (673292288 bytes)
test 23 (256 bit key, 8192 byte blocks): 84205 operations in 1 seconds
(689807360 bytes)
NEW CODE
testing speed of gcm(aes) (gcm-aes-ce) encryption
test 0 (128 bit key, 16 byte blocks): 1894587 operations in 1 seconds
(30313392 bytes)
test 1 (128 bit key, 64 byte blocks): 1910971 operations in 1 seconds
(122302144 bytes)
test 2 (128 bit key, 256 byte blocks): 1360037 operations in 1 seconds
(348169472 bytes)
test 3 (128 bit key, 512 byte blocks): 985577 operations in 1 seconds
(504615424 bytes)
test 4 (128 bit key, 1024 byte blocks): 569656 operations in 1 seconds
(583327744 bytes)
test 5 (128 bit key, 1420 byte blocks): 462129 operations in 1 seconds
(656223180 bytes)
test 6 (128 bit key, 4096 byte blocks): 215284 operations in 1 seconds
(881803264 bytes)
test 7 (128 bit key, 8192 byte blocks): 115459 operations in 1 seconds
(945840128 bytes)
test 8 (192 bit key, 16 byte blocks): 1825915 operations in 1 seconds
(29214640 bytes)
test 9 (192 bit key, 64 byte blocks): 1836850 operations in 1 seconds
(117558400 bytes)
test 10 (192 bit key, 256 byte blocks): 1281626 operations in 1
seconds (328096256 bytes)
test 11 (192 bit key, 512 byte blocks): 913114 operations in 1 seconds
(467514368 bytes)
test 12 (192 bit key, 1024 byte blocks): 504804 operations in 1
seconds (516919296 bytes)
test 13 (192 bit key, 1420 byte blocks): 405749 operations in 1
seconds (576163580 bytes)
test 14 (192 bit key, 4096 byte blocks): 183999 operations in 1
seconds (753659904 bytes)
test 15 (192 bit key, 8192 byte blocks): 97914 operations in 1 seconds
(802111488 bytes)
test 16 (256 bit key, 16 byte blocks): 1776659 operations in 1 seconds
(28426544 bytes)
test 17 (256 bit key, 64 byte blocks): 1781110 operations in 1 seconds
(113991040 bytes)
test 18 (256 bit key, 256 byte blocks): 1206511 operations in 1
seconds (308866816 bytes)
test 19 (256 bit key, 512 byte blocks): 846284 operations in 1 seconds
(433297408 bytes)
test 20 (256 bit key, 1024 byte blocks): 424405 operations in 1
seconds (434590720 bytes)
test 21 (256 bit key, 1420 byte blocks): 331558 operations in 1
seconds (470812360 bytes)
test 22 (256 bit key, 4096 byte blocks): 143821 operations in 1
seconds (589090816 bytes)
test 23 (256 bit key, 8192 byte blocks): 75641 operations in 1 seconds
(619651072 bytes)
Hi Ard:
I have posted the updated patch with version 2. It has passed the extended test suite and extra tests.
For the performance data, it's wired that TX2 had some regressions. Here we find the performance data on TX2 are not stable locally, two times run with same patch(whether old or new), get different performance data, we happen to meet the same issue on OpenSSL . We will do more investigating on it.
Anyway, can you firstly help to see whether the updated patch performs well or not. Thanks.
> -----Original Message-----
> From: Ard Biesheuvel <[email protected]>
> Sent: Tuesday, December 14, 2021 11:59 PM
> To: Xiaokang Qian <[email protected]>
> Cc: Will Deacon <[email protected]>; Eric Biggers <[email protected]>;
> Herbert Xu <[email protected]>; David S. Miller
> <[email protected]>; Catalin Marinas <[email protected]>; nd
> <[email protected]>; Linux Crypto Mailing List <[email protected]>;
> Linux ARM <[email protected]>; Linux Kernel Mailing List
> <[email protected]>
> Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way
> interleave of aes and ghash
>
> On Tue, 14 Dec 2021 at 02:40, Xiaokang Qian <[email protected]>
> wrote:
> >
> > Hi Will:
> > I will post the update version 2 of this patch today or tomorrow.
> > Sorry for the delay.
> >
>
> Great, but please make sure you run the extended test suite.
>
> I applied this version of the patch to test the performance delta between the
> old and the new version on TX2, but it hit a failure in the self test:
>
> [ 0.592203] alg: aead: gcm-aes-ce decryption unexpectedly succeeded
> on test vector "random: alen=91 plen=5326 authsize=16 klen=32 novrfy=1";
> expected_error=-EBADMSG, cfg="random: inplace use_finup
> src_divs=[100.0%@+3779] key_offset=43"
>
> It's non-deterministic, though, so it may take a few attempts to reproduce it.
>
> As for the performance delta, your code is 18% slower on TX2 for 1420 byte
> packets using AES-256 (and 9% slower on AES-192). In your results, AES-256
> does not outperform the old code as much as it does with smaller key sizes
> either.
>
> Is this something that can be solved? If not, the numbers are not as
> appealing, to be honest, given the substantial performance regressions on
> the other micro-architecture.
>
> --
> Ard.
>
>
>
> Tcrypt output follows
>
>
> OLD CODE
>
> testing speed of gcm(aes) (gcm-aes-ce) encryption
> test 0 (128 bit key, 16 byte blocks): 2023626 operations in 1 seconds
> (32378016 bytes)
> test 1 (128 bit key, 64 byte blocks): 2005175 operations in 1 seconds
> (128331200 bytes)
> test 2 (128 bit key, 256 byte blocks): 1408367 operations in 1 seconds
> (360541952 bytes)
> test 3 (128 bit key, 512 byte blocks): 1011877 operations in 1 seconds
> (518081024 bytes)
> test 4 (128 bit key, 1024 byte blocks): 646552 operations in 1 seconds
> (662069248 bytes)
> test 5 (128 bit key, 1420 byte blocks): 490188 operations in 1 seconds
> (696066960 bytes)
> test 6 (128 bit key, 4096 byte blocks): 204423 operations in 1 seconds
> (837316608 bytes)
> test 7 (128 bit key, 8192 byte blocks): 105149 operations in 1 seconds
> (861380608 bytes)
> test 8 (192 bit key, 16 byte blocks): 1924506 operations in 1 seconds
> (30792096 bytes)
> test 9 (192 bit key, 64 byte blocks): 1944413 operations in 1 seconds
> (124442432 bytes)
> test 10 (192 bit key, 256 byte blocks): 1337001 operations in 1
> seconds (342272256 bytes)
> test 11 (192 bit key, 512 byte blocks): 941146 operations in 1 seconds
> (481866752 bytes)
> test 12 (192 bit key, 1024 byte blocks): 590614 operations in 1
> seconds (604788736 bytes)
> test 13 (192 bit key, 1420 byte blocks): 443363 operations in 1
> seconds (629575460 bytes)
> test 14 (192 bit key, 4096 byte blocks): 182890 operations in 1
> seconds (749117440 bytes)
> test 15 (192 bit key, 8192 byte blocks): 93813 operations in 1 seconds
> (768516096 bytes)
> test 16 (256 bit key, 16 byte blocks): 1886970 operations in 1 seconds
> (30191520 bytes)
> test 17 (256 bit key, 64 byte blocks): 1893574 operations in 1 seconds
> (121188736 bytes)
> test 18 (256 bit key, 256 byte blocks): 1245478 operations in 1
> seconds (318842368 bytes)
> test 19 (256 bit key, 512 byte blocks): 865507 operations in 1 seconds
> (443139584 bytes)
> test 20 (256 bit key, 1024 byte blocks): 537822 operations in 1
> seconds (550729728 bytes)
> test 21 (256 bit key, 1420 byte blocks): 401451 operations in 1
> seconds (570060420 bytes)
> test 22 (256 bit key, 4096 byte blocks): 164378 operations in 1
> seconds (673292288 bytes)
> test 23 (256 bit key, 8192 byte blocks): 84205 operations in 1 seconds
> (689807360 bytes)
>
>
> NEW CODE
>
> testing speed of gcm(aes) (gcm-aes-ce) encryption
> test 0 (128 bit key, 16 byte blocks): 1894587 operations in 1 seconds
> (30313392 bytes)
> test 1 (128 bit key, 64 byte blocks): 1910971 operations in 1 seconds
> (122302144 bytes)
> test 2 (128 bit key, 256 byte blocks): 1360037 operations in 1 seconds
> (348169472 bytes)
> test 3 (128 bit key, 512 byte blocks): 985577 operations in 1 seconds
> (504615424 bytes)
> test 4 (128 bit key, 1024 byte blocks): 569656 operations in 1 seconds
> (583327744 bytes)
> test 5 (128 bit key, 1420 byte blocks): 462129 operations in 1 seconds
> (656223180 bytes)
> test 6 (128 bit key, 4096 byte blocks): 215284 operations in 1 seconds
> (881803264 bytes)
> test 7 (128 bit key, 8192 byte blocks): 115459 operations in 1 seconds
> (945840128 bytes)
> test 8 (192 bit key, 16 byte blocks): 1825915 operations in 1 seconds
> (29214640 bytes)
> test 9 (192 bit key, 64 byte blocks): 1836850 operations in 1 seconds
> (117558400 bytes)
> test 10 (192 bit key, 256 byte blocks): 1281626 operations in 1
> seconds (328096256 bytes)
> test 11 (192 bit key, 512 byte blocks): 913114 operations in 1 seconds
> (467514368 bytes)
> test 12 (192 bit key, 1024 byte blocks): 504804 operations in 1
> seconds (516919296 bytes)
> test 13 (192 bit key, 1420 byte blocks): 405749 operations in 1
> seconds (576163580 bytes)
> test 14 (192 bit key, 4096 byte blocks): 183999 operations in 1
> seconds (753659904 bytes)
> test 15 (192 bit key, 8192 byte blocks): 97914 operations in 1 seconds
> (802111488 bytes)
> test 16 (256 bit key, 16 byte blocks): 1776659 operations in 1 seconds
> (28426544 bytes)
> test 17 (256 bit key, 64 byte blocks): 1781110 operations in 1 seconds
> (113991040 bytes)
> test 18 (256 bit key, 256 byte blocks): 1206511 operations in 1
> seconds (308866816 bytes)
> test 19 (256 bit key, 512 byte blocks): 846284 operations in 1 seconds
> (433297408 bytes)
> test 20 (256 bit key, 1024 byte blocks): 424405 operations in 1
> seconds (434590720 bytes)
> test 21 (256 bit key, 1420 byte blocks): 331558 operations in 1
> seconds (470812360 bytes)
> test 22 (256 bit key, 4096 byte blocks): 143821 operations in 1
> seconds (589090816 bytes)
> test 23 (256 bit key, 8192 byte blocks): 75641 operations in 1 seconds
> (619651072 bytes)
On Wed, 15 Dec 2021 at 06:48, Xiaokang Qian <[email protected]> wrote:
>
> Hi Ard:
>
> I have posted the updated patch with version 2. It has passed the extended test suite and extra tests.
>
> For the performance data, it's wired that TX2 had some regressions. Here we find the performance data on TX2 are not stable locally, two times run with same patch(whether old or new), get different performance data, we happen to meet the same issue on OpenSSL . We will do more investigating on it.
> Anyway, can you firstly help to see whether the updated patch performs well or not. Thanks.
>
I get the same results with this version of the patch, and the results
are highly consistent between runs.
So as it stands, I don't think we should merge this, to be honest. For
the block sizes that matter, this version performs roughly the same on
some micro-architectures, but substantially slower on others (4k and
8k are also slower on TX2 for AES-256). And the larger block sizes
only matter for kTLS anyway, and I don't see the point of kernel TLS
with pure software algorithms - user space can just issue the
instructions directly if TLS is not hardware accelerated.
I do have some minor review comments on the patch itself, but please
only post a v3 if you manage to fix the performance regression:
- push_stack/pop_stack don't need to preserve the D8-15 registers
- karatsuba not karasuba
> > -----Original Message-----
> > From: Ard Biesheuvel <[email protected]>
> > Sent: Tuesday, December 14, 2021 11:59 PM
> > To: Xiaokang Qian <[email protected]>
> > Cc: Will Deacon <[email protected]>; Eric Biggers <[email protected]>;
> > Herbert Xu <[email protected]>; David S. Miller
> > <[email protected]>; Catalin Marinas <[email protected]>; nd
> > <[email protected]>; Linux Crypto Mailing List <[email protected]>;
> > Linux ARM <[email protected]>; Linux Kernel Mailing List
> > <[email protected]>
> > Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way
> > interleave of aes and ghash
> >
> > On Tue, 14 Dec 2021 at 02:40, Xiaokang Qian <[email protected]>
> > wrote:
> > >
> > > Hi Will:
> > > I will post the update version 2 of this patch today or tomorrow.
> > > Sorry for the delay.
> > >
> >
> > Great, but please make sure you run the extended test suite.
> >
> > I applied this version of the patch to test the performance delta between the
> > old and the new version on TX2, but it hit a failure in the self test:
> >
> > [ 0.592203] alg: aead: gcm-aes-ce decryption unexpectedly succeeded
> > on test vector "random: alen=91 plen=5326 authsize=16 klen=32 novrfy=1";
> > expected_error=-EBADMSG, cfg="random: inplace use_finup
> > src_divs=[100.0%@+3779] key_offset=43"
> >
> > It's non-deterministic, though, so it may take a few attempts to reproduce it.
> >
> > As for the performance delta, your code is 18% slower on TX2 for 1420 byte
> > packets using AES-256 (and 9% slower on AES-192). In your results, AES-256
> > does not outperform the old code as much as it does with smaller key sizes
> > either.
> >
> > Is this something that can be solved? If not, the numbers are not as
> > appealing, to be honest, given the substantial performance regressions on
> > the other micro-architecture.
> >
> > --
> > Ard.
> >
> >
> >
> > Tcrypt output follows
> >
> >
> > OLD CODE
> >
> > testing speed of gcm(aes) (gcm-aes-ce) encryption
> > test 0 (128 bit key, 16 byte blocks): 2023626 operations in 1 seconds
> > (32378016 bytes)
> > test 1 (128 bit key, 64 byte blocks): 2005175 operations in 1 seconds
> > (128331200 bytes)
> > test 2 (128 bit key, 256 byte blocks): 1408367 operations in 1 seconds
> > (360541952 bytes)
> > test 3 (128 bit key, 512 byte blocks): 1011877 operations in 1 seconds
> > (518081024 bytes)
> > test 4 (128 bit key, 1024 byte blocks): 646552 operations in 1 seconds
> > (662069248 bytes)
> > test 5 (128 bit key, 1420 byte blocks): 490188 operations in 1 seconds
> > (696066960 bytes)
> > test 6 (128 bit key, 4096 byte blocks): 204423 operations in 1 seconds
> > (837316608 bytes)
> > test 7 (128 bit key, 8192 byte blocks): 105149 operations in 1 seconds
> > (861380608 bytes)
> > test 8 (192 bit key, 16 byte blocks): 1924506 operations in 1 seconds
> > (30792096 bytes)
> > test 9 (192 bit key, 64 byte blocks): 1944413 operations in 1 seconds
> > (124442432 bytes)
> > test 10 (192 bit key, 256 byte blocks): 1337001 operations in 1
> > seconds (342272256 bytes)
> > test 11 (192 bit key, 512 byte blocks): 941146 operations in 1 seconds
> > (481866752 bytes)
> > test 12 (192 bit key, 1024 byte blocks): 590614 operations in 1
> > seconds (604788736 bytes)
> > test 13 (192 bit key, 1420 byte blocks): 443363 operations in 1
> > seconds (629575460 bytes)
> > test 14 (192 bit key, 4096 byte blocks): 182890 operations in 1
> > seconds (749117440 bytes)
> > test 15 (192 bit key, 8192 byte blocks): 93813 operations in 1 seconds
> > (768516096 bytes)
> > test 16 (256 bit key, 16 byte blocks): 1886970 operations in 1 seconds
> > (30191520 bytes)
> > test 17 (256 bit key, 64 byte blocks): 1893574 operations in 1 seconds
> > (121188736 bytes)
> > test 18 (256 bit key, 256 byte blocks): 1245478 operations in 1
> > seconds (318842368 bytes)
> > test 19 (256 bit key, 512 byte blocks): 865507 operations in 1 seconds
> > (443139584 bytes)
> > test 20 (256 bit key, 1024 byte blocks): 537822 operations in 1
> > seconds (550729728 bytes)
> > test 21 (256 bit key, 1420 byte blocks): 401451 operations in 1
> > seconds (570060420 bytes)
> > test 22 (256 bit key, 4096 byte blocks): 164378 operations in 1
> > seconds (673292288 bytes)
> > test 23 (256 bit key, 8192 byte blocks): 84205 operations in 1 seconds
> > (689807360 bytes)
> >
> >
> > NEW CODE
> >
> > testing speed of gcm(aes) (gcm-aes-ce) encryption
> > test 0 (128 bit key, 16 byte blocks): 1894587 operations in 1 seconds
> > (30313392 bytes)
> > test 1 (128 bit key, 64 byte blocks): 1910971 operations in 1 seconds
> > (122302144 bytes)
> > test 2 (128 bit key, 256 byte blocks): 1360037 operations in 1 seconds
> > (348169472 bytes)
> > test 3 (128 bit key, 512 byte blocks): 985577 operations in 1 seconds
> > (504615424 bytes)
> > test 4 (128 bit key, 1024 byte blocks): 569656 operations in 1 seconds
> > (583327744 bytes)
> > test 5 (128 bit key, 1420 byte blocks): 462129 operations in 1 seconds
> > (656223180 bytes)
> > test 6 (128 bit key, 4096 byte blocks): 215284 operations in 1 seconds
> > (881803264 bytes)
> > test 7 (128 bit key, 8192 byte blocks): 115459 operations in 1 seconds
> > (945840128 bytes)
> > test 8 (192 bit key, 16 byte blocks): 1825915 operations in 1 seconds
> > (29214640 bytes)
> > test 9 (192 bit key, 64 byte blocks): 1836850 operations in 1 seconds
> > (117558400 bytes)
> > test 10 (192 bit key, 256 byte blocks): 1281626 operations in 1
> > seconds (328096256 bytes)
> > test 11 (192 bit key, 512 byte blocks): 913114 operations in 1 seconds
> > (467514368 bytes)
> > test 12 (192 bit key, 1024 byte blocks): 504804 operations in 1
> > seconds (516919296 bytes)
> > test 13 (192 bit key, 1420 byte blocks): 405749 operations in 1
> > seconds (576163580 bytes)
> > test 14 (192 bit key, 4096 byte blocks): 183999 operations in 1
> > seconds (753659904 bytes)
> > test 15 (192 bit key, 8192 byte blocks): 97914 operations in 1 seconds
> > (802111488 bytes)
> > test 16 (256 bit key, 16 byte blocks): 1776659 operations in 1 seconds
> > (28426544 bytes)
> > test 17 (256 bit key, 64 byte blocks): 1781110 operations in 1 seconds
> > (113991040 bytes)
> > test 18 (256 bit key, 256 byte blocks): 1206511 operations in 1
> > seconds (308866816 bytes)
> > test 19 (256 bit key, 512 byte blocks): 846284 operations in 1 seconds
> > (433297408 bytes)
> > test 20 (256 bit key, 1024 byte blocks): 424405 operations in 1
> > seconds (434590720 bytes)
> > test 21 (256 bit key, 1420 byte blocks): 331558 operations in 1
> > seconds (470812360 bytes)
> > test 22 (256 bit key, 4096 byte blocks): 143821 operations in 1
> > seconds (589090816 bytes)
> > test 23 (256 bit key, 8192 byte blocks): 75641 operations in 1 seconds
> > (619651072 bytes)