Hello,
the VIA Padlock engine comes without native XTS-AES support, thus
compared to CBC-AES or ECB-AES XTS-AES performs quite bad on VIA CPUs
because it calls the Padlock ACE for each single AESenc() operation.
Using the Padlock's ECB-AES saves calls to the Padlock ACE and improves
the XTS-AES performance by 30% and more even in a naive proof-of-concept
implementation.
The idea comes from DiskCryptor which does this since v0.9.583.106.
Here are some performance measues for my VIA Nano U2250 done with
dm-crypt aes-xts-plain on top of a 1GB tmpfs-backed loop-device. The
table shows MB/s measured by dd. The first column shows the creation of
the loop-image in tmpfs, i.e. memory-bandwidth. The next 10 columns show
10 write runs on top of the dm-crypt device. The last column shows a
read run on a 10GB dm-crypt'ed disk-partition (read speed on the plain
partition is ~94MB/s). For the last column I also measured DiskCryptor's
read performance.
xts orig 325 | 38.1 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 | 34.2
xts PoC 322 | 48.6 49.1 49.1 49.1 49.1 49.2 49.2 49.2 49.2 49.2 | 49.2
DC 65.1
My proof-of-concept comes not even close to DiskCryptor at the moment
but already improves dm-crypt performance significant.
I attached 4 patches with the proof-of-concept code. They need to be
applied one after the other. The code is really just ugly-hacked
proof-of-concept (except the first patch maybe) with incomplete
error-handling and hardcoded ECB-AES usage. Even though it seems to
encode and decode correctly, I strongly recommend to avoid using it to
handle real data.
Utilizing ECB-AES required to unfold and duplicate the scatterlist-walk.
This does also duplicate the GF-Multiplications, which could probably be
avoided by using an internal buffer.
I have no idea where this should finally be implemented, since it slows
down XTS on non-accelerated CPUs. Maybe a seperate xts-aes-padlock
driver would make sense depending on how specific this is to VIA
Padlock, i.e. how it performs on other non-XTS-capable accelerators.
Please CC: me in replies, I'm not a member of the list. Mail-F'up2
should be set correctly.
regards
Mario
--
File names are infinite in length where infinity is set to 255 characters.
-- Peter Collinson, "The Unix File System"
On 23.04.2010 17:44, Mario 'BitKoenig' Holbe wrote:
> Hello,
Hi Mario,
> the VIA Padlock engine comes without native XTS-AES support, thus
> compared to CBC-AES or ECB-AES XTS-AES performs quite bad on VIA CPUs
> because it calls the Padlock ACE for each single AESenc() operation.
> Using the Padlock's ECB-AES saves calls to the Padlock ACE and improves
> the XTS-AES performance by 30% and more even in a naive proof-of-concept
> implementation.
> The idea comes from DiskCryptor which does this since v0.9.583.106.
The same problem exists with the mv_cesa driver. I really like to have a
solution for this as the performance of the system is mainly cripled by
the hard disk encryption.
> I have no idea where this should finally be implemented, since it slows
> down XTS on non-accelerated CPUs. Maybe a seperate xts-aes-padlock
> driver would make sense depending on how specific this is to VIA
> Padlock, i.e. how it performs on other non-XTS-capable accelerators.
I'm currently trying to get an understanding of the kernel crypto code,
but my impression is that instead of breaking support with the
synchronous API the better way would be to introduce asynchronous
ciphers (e.g. acipher) that can be used transparently by crypto_template
implementations to form an ablkcipher. In that way all crypto modes
would be supported and accelerated through hardware, even if it doesn't
provide an in-hardware implementation.
As I mentioned I'm new to the kernel API, so what is the opinion of
experienced developers on this? Is that in reach of the current API
architecture?
> Please CC: me in replies, I'm not a member of the list. Mail-F'up2
> should be set correctly.
Best regards,
Andr? Bubel
Mario 'BitKoenig' Holbe <[email protected]> wrote:
>
> I have no idea where this should finally be implemented, since it slows
> down XTS on non-accelerated CPUs. Maybe a seperate xts-aes-padlock
> driver would make sense depending on how specific this is to VIA
> Padlock, i.e. how it performs on other non-XTS-capable accelerators.
We can solve that by instantiating both the ECB-based XTS and the
current XTS, but by increasing the current XTS's priority value.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt