2009-09-15 13:26:10

by Dimitrios Siganos

[permalink] [raw]
Subject: ESP hardware acceleration

Hi,

We are using linux-2.6.28 and we would like to hardware accelerate the
NETKEY IPsec traffic. We are using strongswan for the upper layers.

I understand that strongswan uses the Linux/NETKEY IPsec implementation,
which in turn, uses the Linux Scatterlist Crypto API for all its
cryptographic work. To hardware accelerate IPsec, I need to write a
"Linux Scatterlist Crypto API" driver for my hardware accelerator and
register it with the linux kernel.

What I would like to know is:
1) does the xfrm/ESP implementation support asynchronous/parallel packet
operation?
2) If yes, does it support it in both directions (tx/rx)?

Our hardware supports a queue packets for processing and we would like
to utilise that, to keep the hardware as busy as possible i.e. we would
like to be able to send multiple packets to the hardware engine for
encryption/hashing and then receive multiple acknowledgements that the
packets are ready.

Regards,
Dimitrios Siganos



2009-09-15 14:56:45

by Octavian Purdila

[permalink] [raw]
Subject: Re: ESP hardware acceleration

On Tuesday 15 September 2009 16:19:27 you wrote:
> Hi,
>
> We are using linux-2.6.28 and we would like to hardware accelerate the
> NETKEY IPsec traffic. We are using strongswan for the upper layers.
>
> I understand that strongswan uses the Linux/NETKEY IPsec implementation,
> which in turn, uses the Linux Scatterlist Crypto API for all its
> cryptographic work. To hardware accelerate IPsec, I need to write a
> "Linux Scatterlist Crypto API" driver for my hardware accelerator and
> register it with the linux kernel.
>
> What I would like to know is:
> 1) does the xfrm/ESP implementation support asynchronous/parallel packet
> operation?
> 2) If yes, does it support it in both directions (tx/rx)?
>
> Our hardware supports a queue packets for processing and we would like
> to utilise that, to keep the hardware as busy as possible i.e. we would
> like to be able to send multiple packets to the hardware engine for
> encryption/hashing and then receive multiple acknowledgements that the
> packets are ready.
>

Hi Dimitrios,

AFAK, the crypto interface is asynchronous but the hashing interface (as used
in IPSec) is synchronous.

There are two patches I've recently seen on the list, one for converting to
async hashing and one for parallel crypto/ipsec which will probably get in
2.6.32.

However, I think that the best results for hw accel will be obtained if you
accelerate the AEAD interface.

Speaking of hw accel, we are also playing with it and we got moderately good
results. We are now running into two major software bottlenecks: memcpy
(because of the copy required by TCP traffic) and CRC computation.

To solve the first issue we were thinking of extending the ESP implementation
to create two scatter-gather lists / skbs, one which will be used as the
source and once which will be used as the destination. This will allow to
offload the memcpy operation to hardware.

We will soon start working on this - after a bit of stabilization we need to
do and we will start pestering you crypto wizards with questions / patches,
but in the meanwhile, if you have any advice on this topic we will greatly
appreciate it :)

Thanks!
tavi

2009-09-15 17:09:47

by Herbert Xu

[permalink] [raw]
Subject: Re: ESP hardware acceleration

Dimitrios Siganos <[email protected]> wrote:
>
> What I would like to know is:
> 1) does the xfrm/ESP implementation support asynchronous/parallel packet
> operation?
> 2) If yes, does it support it in both directions (tx/rx)?

Yes on both counts.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-09-15 17:12:51

by Herbert Xu

[permalink] [raw]
Subject: Re: ESP hardware acceleration

Octavian Purdila <[email protected]> wrote:
>
> AFAK, the crypto interface is asynchronous but the hashing interface (as used
> in IPSec) is synchronous.
>
> There are two patches I've recently seen on the list, one for converting to
> async hashing and one for parallel crypto/ipsec which will probably get in
> 2.6.32.

Yes they're now in Linus's tree so both hsahing and ciphers are
now async.

> However, I think that the best results for hw accel will be obtained if you
> accelerate the AEAD interface.

If your driver benefits from seeing both the hashing request and the
cipher request at the same time then by all means go for the AEAD
interface. But don't feel compelled to use it just because it's
there :)

> Speaking of hw accel, we are also playing with it and we got moderately good
> results. We are now running into two major software bottlenecks: memcpy
> (because of the copy required by TCP traffic) and CRC computation.

What platform is this? And where does CRC come into this?

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-09-15 18:00:25

by Octavian Purdila

[permalink] [raw]
Subject: Re: ESP hardware acceleration

On Tuesday 15 September 2009 20:12:52 you wrote:

> > However, I think that the best results for hw accel will be obtained if
> > you accelerate the AEAD interface.
>
> If your driver benefits from seeing both the hashing request and the
> cipher request at the same time then by all means go for the AEAD
> interface. But don't feel compelled to use it just because it's
> there :)

I think this interface has the advantage of doing only one DMA transfer per
ESP packet instead of two such transfers required when using separate encr +
auth. (of course this may not matter at all on some architectures)

> > Speaking of hw accel, we are also playing with it and we got moderately
> > good results. We are now running into two major software bottlenecks:
> > memcpy (because of the copy required by TCP traffic) and CRC computation.
>
> What platform is this?
>

Its a ppc750 CPU clocked at 1GHz - pretty low end compared with today's
hardware. We were able to get about 360Mbits L2 throughput (TCP traffic) with
our hw accel engine although theoretically the hw engine can go up much higher
(and profiling the the hw engine itself shows that it is significantly idle).

> And where does CRC come into this?

Sorry, what I meant was TCP checksum.

tavi