2014-08-17 15:55:55

by Stephan Müller

[permalink] [raw]
Subject: Kernel crypto API: cryptoperf performance measurement

Hi,

during playing around with the kernel crypto API, I implemented a performance
measurement tool kit for the various kernel crypto API cipher types. The
cryptoperf tool kit is provided in [1].

Comments are welcome.

In general, the results are as expected, i.e. the assembler implementations
are faster than the pure C implementations. However, there are curious results
which probably should be checked by the maintainers of the respective ciphers
(hoping that my tool works correctly ;-) ):

ablkcipher
----------

- cryptd is slower by factor 10 across the board

blkcipher
---------

- Blowfish x86_64 assembler together with the generic C block chaining modes
is significantly slower than Blowfish implemented in generic C

- Blowfish x86_64 assembler in ECB is significantly slower than generic C
Blowfish ECB

- Serpent assembler implementations are not significantly faster than generic
C implementations

- AES-NI ECB, LRW, CTR is significantly slower than AES i586 assembler.

- AES-NI ECB, LRW, CTR is not significantly faster than AES generic C

rng
---

- The ANSI X9.31 RNG seems to work massively faster than the underlying AES
cipher (by about a factor of 5). I am unsure about the cause of this.


Caveat
------

Please note that there is one small error which I am unsure how to fix it as
documented in the TODO file.

[1] http://www.chronox.de/cryptoperf.html

--
Ciao
Stephan


2014-08-19 07:17:48

by Jussi Kivilinna

[permalink] [raw]
Subject: Re: Kernel crypto API: cryptoperf performance measurement

Hello,

On 2014-08-17 18:55, Stephan Mueller wrote:
> Hi,
>
> during playing around with the kernel crypto API, I implemented a performance
> measurement tool kit for the various kernel crypto API cipher types. The
> cryptoperf tool kit is provided in [1].
>
> Comments are welcome.

Your results are quite slow compared to, for example "cryptsetup
benchmark", which uses kernel crypto from userspace.

With Intel i5-2450M (turbo enabled), I get:

# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 524,0 MiB/s 11909,1 MiB/s
serpent-cbc 128b 60,9 MiB/s 219,4 MiB/s
twofish-cbc 128b 143,4 MiB/s 240,3 MiB/s
aes-cbc 256b 330,4 MiB/s 1242,8 MiB/s
serpent-cbc 256b 66,1 MiB/s 220,3 MiB/s
twofish-cbc 256b 143,5 MiB/s 221,8 MiB/s
aes-xts 256b 1268,7 MiB/s 4193,0 MiB/s
serpent-xts 256b 234,8 MiB/s 224,6 MiB/s
twofish-xts 256b 253,5 MiB/s 254,7 MiB/s
aes-xts 512b 2535,0 MiB/s 2945,0 MiB/s
serpent-xts 512b 274,2 MiB/s 242,3 MiB/s
twofish-xts 512b 250,0 MiB/s 245,8 MiB/s

>
> In general, the results are as expected, i.e. the assembler implementations
> are faster than the pure C implementations. However, there are curious results
> which probably should be checked by the maintainers of the respective ciphers
> (hoping that my tool works correctly ;-) ):
>
> ablkcipher
> ----------
>
> - cryptd is slower by factor 10 across the board
>
> blkcipher
> ---------
>
> - Blowfish x86_64 assembler together with the generic C block chaining modes
> is significantly slower than Blowfish implemented in generic C
>
> - Blowfish x86_64 assembler in ECB is significantly slower than generic C
> Blowfish ECB
>
> - Serpent assembler implementations are not significantly faster than generic
> C implementations
>
> - AES-NI ECB, LRW, CTR is significantly slower than AES i586 assembler.
>
> - AES-NI ECB, LRW, CTR is not significantly faster than AES generic C
>

Quite many assembly implementations get speed up from processing
parallel block cipher blocks, which modes of operation that (CTR, XTS,
LWR, CBC(dec)). For small buffer sizes, these implementations will use
the non-parallel implementation of cipher.

-Jussi

> rng
> ---
>
> - The ANSI X9.31 RNG seems to work massively faster than the underlying AES
> cipher (by about a factor of 5). I am unsure about the cause of this.
>
>
> Caveat
> ------
>
> Please note that there is one small error which I am unsure how to fix it as
> documented in the TODO file.
>
> [1] http://www.chronox.de/cryptoperf.html
>

2014-08-19 18:23:42

by Stephan Müller

[permalink] [raw]
Subject: Re: Kernel crypto API: cryptoperf performance measurement

Am Dienstag, 19. August 2014, 10:17:36 schrieb Jussi Kivilinna:

Hi Jussi,

> Hello,
>
> On 2014-08-17 18:55, Stephan Mueller wrote:
> > Hi,
> >
> > during playing around with the kernel crypto API, I implemented a
> > performance measurement tool kit for the various kernel crypto API cipher
> > types. The cryptoperf tool kit is provided in [1].
> >
> > Comments are welcome.
>
> Your results are quite slow compared to, for example "cryptsetup
> benchmark", which uses kernel crypto from userspace.
>
> With Intel i5-2450M (turbo enabled), I get:
>
> # Algorithm | Key | Encryption | Decryption
> aes-cbc 128b 524,0 MiB/s 11909,1 MiB/s
> serpent-cbc 128b 60,9 MiB/s 219,4 MiB/s
> twofish-cbc 128b 143,4 MiB/s 240,3 MiB/s
> aes-cbc 256b 330,4 MiB/s 1242,8 MiB/s
> serpent-cbc 256b 66,1 MiB/s 220,3 MiB/s
> twofish-cbc 256b 143,5 MiB/s 221,8 MiB/s
> aes-xts 256b 1268,7 MiB/s 4193,0 MiB/s
> serpent-xts 256b 234,8 MiB/s 224,6 MiB/s
> twofish-xts 256b 253,5 MiB/s 254,7 MiB/s
> aes-xts 512b 2535,0 MiB/s 2945,0 MiB/s
> serpent-xts 512b 274,2 MiB/s 242,3 MiB/s
> twofish-xts 512b 250,0 MiB/s 245,8 MiB/s

One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that
does not sound right.
>
> > In general, the results are as expected, i.e. the assembler
> > implementations
> > are faster than the pure C implementations. However, there are curious
> > results which probably should be checked by the maintainers of the
> > respective ciphers (hoping that my tool works correctly ;-) ):
> >
> > ablkcipher
> > ----------
> >
> > - cryptd is slower by factor 10 across the board
> >
> > blkcipher
> > ---------
> >
> > - Blowfish x86_64 assembler together with the generic C block chaining
> > modes is significantly slower than Blowfish implemented in generic C
> >
> > - Blowfish x86_64 assembler in ECB is significantly slower than generic C
> > Blowfish ECB
> >
> > - Serpent assembler implementations are not significantly faster than
> > generic C implementations
> >
> > - AES-NI ECB, LRW, CTR is significantly slower than AES i586 assembler.
> >
> > - AES-NI ECB, LRW, CTR is not significantly faster than AES generic C
>
> Quite many assembly implementations get speed up from processing
> parallel block cipher blocks, which modes of operation that (CTR, XTS,
> LWR, CBC(dec)). For small buffer sizes, these implementations will use
> the non-parallel implementation of cipher.

Thanks for the pointer, I will rerun my tests with multiple of the block size
(e.g. 1024 blocks).
--
Ciao
Stephan

2014-08-20 13:25:59

by Jussi Kivilinna

[permalink] [raw]
Subject: Re: Kernel crypto API: cryptoperf performance measurement

Hello,

On 2014-08-19 21:23, Stephan Mueller wrote:
> Am Dienstag, 19. August 2014, 10:17:36 schrieb Jussi Kivilinna:
>
> Hi Jussi,
>
>> Hello,
>>
>> On 2014-08-17 18:55, Stephan Mueller wrote:
>>> Hi,
>>>
>>> during playing around with the kernel crypto API, I implemented a
>>> performance measurement tool kit for the various kernel crypto API cipher
>>> types. The cryptoperf tool kit is provided in [1].
>>>
>>> Comments are welcome.
>>
>> Your results are quite slow compared to, for example "cryptsetup
>> benchmark", which uses kernel crypto from userspace.
>>
>> With Intel i5-2450M (turbo enabled), I get:
>>
>> # Algorithm | Key | Encryption | Decryption
>> aes-cbc 128b 524,0 MiB/s 11909,1 MiB/s
>> serpent-cbc 128b 60,9 MiB/s 219,4 MiB/s
>> twofish-cbc 128b 143,4 MiB/s 240,3 MiB/s
>> aes-cbc 256b 330,4 MiB/s 1242,8 MiB/s
>> serpent-cbc 256b 66,1 MiB/s 220,3 MiB/s
>> twofish-cbc 256b 143,5 MiB/s 221,8 MiB/s
>> aes-xts 256b 1268,7 MiB/s 4193,0 MiB/s
>> serpent-xts 256b 234,8 MiB/s 224,6 MiB/s
>> twofish-xts 256b 253,5 MiB/s 254,7 MiB/s
>> aes-xts 512b 2535,0 MiB/s 2945,0 MiB/s
>> serpent-xts 512b 274,2 MiB/s 242,3 MiB/s
>> twofish-xts 512b 250,0 MiB/s 245,8 MiB/s
>
> One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that
> does not sound right.

Agreed, those do not look correct... I wonder what happened there. On
new run, I got more sane results:

# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 139,1 MiB/s 1713,6 MiB/s
serpent-cbc 128b 62,2 MiB/s 232,9 MiB/s
twofish-cbc 128b 116,3 MiB/s 243,7 MiB/s
aes-cbc 256b 375,1 MiB/s 1159,4 MiB/s
serpent-cbc 256b 62,1 MiB/s 214,9 MiB/s
twofish-cbc 256b 139,3 MiB/s 217,5 MiB/s
aes-xts 256b 1296,4 MiB/s 1272,5 MiB/s
serpent-xts 256b 283,3 MiB/s 275,6 MiB/s
twofish-xts 256b 294,8 MiB/s 299,3 MiB/s
aes-xts 512b 984,3 MiB/s 991,1 MiB/s
serpent-xts 512b 227,7 MiB/s 220,6 MiB/s
twofish-xts 512b 220,6 MiB/s 220,2 MiB/s

-Jussi

2014-08-20 18:14:14

by Milan Broz

[permalink] [raw]
Subject: Re: Kernel crypto API: cryptoperf performance measurement

On 08/20/2014 03:25 PM, Jussi Kivilinna wrote:
>> One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that
>> does not sound right.
>
> Agreed, those do not look correct... I wonder what happened there. On
> new run, I got more sane results:

Which cryptsetup version are you using?

There was a bug in that test on fast machines (fixed in 1.6.3, I hope :)

But anyway, it is not intended as rigorous speed test,
it was intended for comparison of ciphers speed on particular machine.

Test basically tries to encrypt 1MB block (or multiple of this
if machine is too fast). All it runs through kernel userspace crypto API
interface.
(Real FDE is always slower because it runs over 512bytes blocks.)

Milan


>
> # Algorithm | Key | Encryption | Decryption
> aes-cbc 128b 139,1 MiB/s 1713,6 MiB/s
> serpent-cbc 128b 62,2 MiB/s 232,9 MiB/s
> twofish-cbc 128b 116,3 MiB/s 243,7 MiB/s
> aes-cbc 256b 375,1 MiB/s 1159,4 MiB/s
> serpent-cbc 256b 62,1 MiB/s 214,9 MiB/s
> twofish-cbc 256b 139,3 MiB/s 217,5 MiB/s
> aes-xts 256b 1296,4 MiB/s 1272,5 MiB/s
> serpent-xts 256b 283,3 MiB/s 275,6 MiB/s
> twofish-xts 256b 294,8 MiB/s 299,3 MiB/s
> aes-xts 512b 984,3 MiB/s 991,1 MiB/s
> serpent-xts 512b 227,7 MiB/s 220,6 MiB/s
> twofish-xts 512b 220,6 MiB/s 220,2 MiB/s
>
> -Jussi
>

2014-08-21 07:38:13

by Jussi Kivilinna

[permalink] [raw]
Subject: Re: Kernel crypto API: cryptoperf performance measurement


On 2014-08-20 21:14, Milan Broz wrote:
> On 08/20/2014 03:25 PM, Jussi Kivilinna wrote:
>>> One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that
>>> does not sound right.
>>
>> Agreed, those do not look correct... I wonder what happened there. On
>> new run, I got more sane results:
>
> Which cryptsetup version are you using?
>
> There was a bug in that test on fast machines (fixed in 1.6.3, I hope :)

I had version 1.6.1 at hand.

>
> But anyway, it is not intended as rigorous speed test,
> it was intended for comparison of ciphers speed on particular machine.
>

True, but it's nice easy test when compared to parsing results from
tcrypt speed tests.

-Jussi

> Test basically tries to encrypt 1MB block (or multiple of this
> if machine is too fast). All it runs through kernel userspace crypto API
> interface.
> (Real FDE is always slower because it runs over 512bytes blocks.)
>
> Milan
>