On Mon, May 31, 2010 at 10:44:30PM -0400, Mikulas Patocka wrote:
> Questions:
>
> If you are optimizing it,
>
> 1) why don't you optimize it in such a way that if one CPU submits
> requests, the crypto work is spread among all the CPUs? Currently it
> spreads the work only if different CPUs submit it.
Because the crypto layer already provides that functionality,
through pcrypt. By instantiating pcrypt for a given algorithm,
you can parallelise that algorithm across CPUs.
This would be inappropriate for upper layer code as they do not
know whether the underlying algorithm should be parallelised,
e.g., a PCI offload board certainly should not be parallelised.
> 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead
> of dm-crypt, so that all kernel subsystems can actually take advantage of
> those multi-CPU optimizations, not just dm-crypt?
Because you cannot do what Andi is doing here in the crypto layer.
What dm-crypt does today (which hasn't always been the case BTW)
hides information away (the original submitting CPU) that we cannot
recreate.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Tue, 1 Jun 2010, Herbert Xu wrote:
> On Mon, May 31, 2010 at 10:44:30PM -0400, Mikulas Patocka wrote:
> > Questions:
> >
> > If you are optimizing it,
> >
> > 1) why don't you optimize it in such a way that if one CPU submits
> > requests, the crypto work is spread among all the CPUs? Currently it
> > spreads the work only if different CPUs submit it.
>
> Because the crypto layer already provides that functionality,
> through pcrypt. By instantiating pcrypt for a given algorithm,
> you can parallelise that algorithm across CPUs.
And how can I use pcrypt for dm-crypt? After a quick look at pcrypt
sources, it seems to be dependent on aead and not useable for general
encryption algorithms at all.
I tried cryptd --- in theory it should work by requesting the algorithm
like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))"
in dm-crypt sources it locks up and doesn't work.
> This would be inappropriate for upper layer code as they do not
> know whether the underlying algorithm should be parallelised,
> e.g., a PCI offload board certainly should not be parallelised.
The upper layer should ideally request "cbc(aes)" and the crypto routine
should select the most efficient implementation --- sync on single-core
system, async with cryptd on multi-core system and async with hardware
implementation if you have HIFN crypto card.
> > 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead
> > of dm-crypt, so that all kernel subsystems can actually take advantage of
> > those multi-CPU optimizations, not just dm-crypt?
>
> Because you cannot do what Andi is doing here in the crypto layer.
> What dm-crypt does today (which hasn't always been the case BTW)
> hides information away (the original submitting CPU) that we cannot
> recreate.
It is pointless to track the submitting CPU.
Majority of time is consumed by raw encyption/decryption. And you must
optimize that --- i.e. on SMP system make sure that cryptd distributes the
work across all available cores.
When you get this right --- i.e. when reading encrypted disk, you get
either read speed equivalent to non-encrypted disk or all the cores are
saturated, then you can start thinking about other optimizations.
Mikulas
On Wed, Jun 02, 2010 at 01:10:00AM -0400, Mikulas Patocka wrote:
>
> And how can I use pcrypt for dm-crypt? After a quick look at pcrypt
> sources, it seems to be dependent on aead and not useable for general
> encryption algorithms at all.
You instantiate a pcrypt variant of whatever algorithm that you're
using. For example, if you're using XTS then you should instantiate
pcrypt(xts(aes)). Currently you must use tcrypt to instantiate.
> I tried cryptd --- in theory it should work by requesting the algorithm
> like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))"
> in dm-crypt sources it locks up and doesn't work.
cryptd is something else altogether. However, it certainly should
not lock up. What kernel version is this?
> > This would be inappropriate for upper layer code as they do not
> > know whether the underlying algorithm should be parallelised,
> > e.g., a PCI offload board certainly should not be parallelised.
>
> The upper layer should ideally request "cbc(aes)" and the crypto routine
> should select the most efficient implementation --- sync on single-core
> system, async with cryptd on multi-core system and async with hardware
> implementation if you have HIFN crypto card.
That's exactly what will happen when the admin instantiates pcrypt.
dm-crypt simply needs to specify cbc(aes) and it will get pcrypt
automatically.
The point is that on a modern processor like Nehalem you don't need
pcrypt.
> It is pointless to track the submitting CPU.
No you are wrong.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Wed, 2 Jun 2010, Herbert Xu wrote:
> On Wed, Jun 02, 2010 at 01:10:00AM -0400, Mikulas Patocka wrote:
> >
> > And how can I use pcrypt for dm-crypt? After a quick look at pcrypt
> > sources, it seems to be dependent on aead and not useable for general
> > encryption algorithms at all.
>
> You instantiate a pcrypt variant of whatever algorithm that you're
> using. For example, if you're using XTS then you should instantiate
> pcrypt(xts(aes)). Currently you must use tcrypt to instantiate.
I tried "pcrypt(%s(%s))" in dm-crypt and I get "table: 253:0: crypt: Error
allocating crypto tfm"
Are you sure that you know what you're talking about? pcrypt_alloc
contains this:
switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) {
case CRYPTO_ALG_TYPE_AEAD:
return pcrypt_alloc_aead(tb);
}
return ERR_PTR(-EINVAL);
--- so for anything other byt AEAD it returns -EINVAL.
> > I tried cryptd --- in theory it should work by requesting the algorithm
> > like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))"
> > in dm-crypt sources it locks up and doesn't work.
>
> cryptd is something else altogether. However, it certainly should
> not lock up. What kernel version is this?
2.6.34 and cryptsetup 1.1.2. It is a soft lockup interruptible with
Ctrl-C.
> > > This would be inappropriate for upper layer code as they do not
> > > know whether the underlying algorithm should be parallelised,
> > > e.g., a PCI offload board certainly should not be parallelised.
> >
> > The upper layer should ideally request "cbc(aes)" and the crypto routine
> > should select the most efficient implementation --- sync on single-core
> > system, async with cryptd on multi-core system and async with hardware
> > implementation if you have HIFN crypto card.
>
> That's exactly what will happen when the admin instantiates pcrypt.
> dm-crypt simply needs to specify cbc(aes) and it will get pcrypt
> automatically.
>
> The point is that on a modern processor like Nehalem you don't need
> pcrypt.
>
> > It is pointless to track the submitting CPU.
>
> No you are wrong.
For what? For avoiding cache bounces? But the encrypting is
order-of-magnitude slower than memory speed.
Mikulas
On Wed, Jun 02, 2010 at 01:25:11AM -0400, Mikulas Patocka wrote:
>
> Are you sure that you know what you're talking about? pcrypt_alloc
> contains this:
> switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) {
> case CRYPTO_ALG_TYPE_AEAD:
> return pcrypt_alloc_aead(tb);
> }
>
> return ERR_PTR(-EINVAL);
> --- so for anything other byt AEAD it returns -EINVAL.
So someone needs to write the ablkcipher plugins for pcrypt.
This is the whole point of pcrypt, to implement parallelisation
exactly once, in the crypto layer.
> For what? For avoiding cache bounces? But the encrypting is
> order-of-magnitude slower than memory speed.
Have you benchmarked the Nehalem AES performance lately?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Wed, Jun 02, 2010 at 04:07:02PM +1000, Herbert Xu wrote:
> On Wed, Jun 02, 2010 at 01:25:11AM -0400, Mikulas Patocka wrote:
> >
> > Are you sure that you know what you're talking about? pcrypt_alloc
> > contains this:
> > switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) {
> > case CRYPTO_ALG_TYPE_AEAD:
> > return pcrypt_alloc_aead(tb);
> > }
> >
> > return ERR_PTR(-EINVAL);
> > --- so for anything other byt AEAD it returns -EINVAL.
>
> So someone needs to write the ablkcipher plugins for pcrypt.
>
I did that already for one of the older (remote softirq based)
version of pcrypt. I just never pushed it out because I was focused
on IPsec and the softirq based version I was not that usefull for
dm-crypt. Now, that we use workqueues internally, dm-crypt could
benefit from pcrypt too. Let me see whether I can respin the old
ablkcipher plugin.
Steffen
On Wed, Jun 02, 2010 at 08:30:33AM +0200, Steffen Klassert wrote:
>
> I did that already for one of the older (remote softirq based)
> version of pcrypt. I just never pushed it out because I was focused
> on IPsec and the softirq based version I was not that usefull for
> dm-crypt. Now, that we use workqueues internally, dm-crypt could
> benefit from pcrypt too. Let me see whether I can respin the old
> ablkcipher plugin.
Thanks Steffen!
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>>
>>> It is pointless to track the submitting CPU.
>>
>> No you are wrong.
>
> For what? For avoiding cache bounces? But the encrypting is
> order-of-magnitude slower than memory speed.
On a system with reasonably fast CPUs it's fastest to just stay
on your current CPU, don't try to talk to other CPUs, avoid
communication, just get the work done ASAP. But make
sure you can do this on multiple CPUs at the same time.
With AES-NI this becomes even more pronounced because it effectively
makes the CPU faster for encryption.
-andi