2015-12-01 17:59:53

by Sowmini Varadhan

[permalink] [raw]
Subject: ipsec impact on performance


I instrumented iperf with and without ipsec, just using esp-null,
and 1 thread, to keep things simple. I'm seeing some pretty dismal
performance numbers with ipsec, and trying to think of ways to
improve this. Here are my findings, please share feedback.

I suspect that a big part of the problem is the implicit loss of GSO,
and this is made worse by some inefficiencies in the xfrm code:
for single stream iperf (to avoid effects of rx-hash), I see the
following on a 10G p2p ethernet link.
8.5-9.5 Gbps clear traffic, TSO disabled, so GSO, GRO is in effect
3-4 Gbps clear traffic, with both TSO/GSO disabled
1.8-2 Gbps for esp-null.
So the above numbers suggest that losing TSO/GSO results in one
big drop in performance, and then there's another cliff for the
clear -> esp-null transition. And those cliffs apply even if you are
merely doing TCP-MD5 or AO for basic protection of the TCP connection.

I tried moving things about a bit to defer the ipsec after GSO - I'll
share my experimental patch as an RFC in a separate thread. (Disclaimer:
the patch is just an experiment at this point).

In that patch, I'm only focussing on esp-null and transp-mode ipsec
for now, just to get some basic performance numbers to see if this is
at all interesting. Essentially my hack mainly involves the following

- don't disable TSO in sk_setup_caps() if a dst->header_len is found
- in xfrm4_output, if GSO is applicable, bail out without esp header
addition - that will get done after skb_segment()
- at the end of tcp_gso_segment() (when tcp segment is available),
set things up for xfrm_output_one and trigger the esp_output..
I have to be very careful about setting up skb pointers here, since
it looks like esp_output overloads the mac_header pointer e.g., for
setting up the ip protocol field

If I do all these things, the ipsec+iperf improves slightly- for
esp-null, I move from approx 1.8 Gbps to about 3 Gbps, but clearly,
this is still quite far from the 8 - 9 Gbps that I can get with just
GSO+GRO for non-ipsec traffic.

There are some inefficiencies that I can see in the xfrm code,
that I am inheriting in my patch, e.g.,:
memory management in the xfrm code has room for improvement. Every
pass through xfrm_transport_output ends up doing a (avoidable?) memmove,
and each pass through esp_output ends up doing a kmalloc/free of the
"tmp" buffer.
But these are all still relatively small things - tweaking them
doesnt get me significantly past the 3 Gbps limit. Any suggestions
on how to make this budge (or design criticism of the patch) would
be welcome.

--Sowmini


2015-12-01 18:17:45

by Rick Jones

[permalink] [raw]
Subject: Re: ipsec impact on performance

On 12/01/2015 09:59 AM, Sowmini Varadhan wrote:
> But these are all still relatively small things - tweaking them
> doesnt get me significantly past the 3 Gbps limit. Any suggestions
> on how to make this budge (or design criticism of the patch) would
> be welcome.

What do the perf profiles show? Presumably, loss of TSO/GSO means an
increase in the per-packet costs, but if the ipsec path significantly
increases the per-byte costs...

Short of a perf profile, I suppose one way to probe for per-packet
versus per-byte would be to up the MTU. That should reduce the
per-packet costs while keeping the per-byte roughly the same.

You could also compare the likes of a single-byte netperf TCP_RR test
between ipsec enabled and not to get an idea of the basic path length
differences without TSO/GSO/whatnot muddying the waters.

happy benchmarking,

rick jones

2015-12-01 18:18:12

by Tom Herbert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Tue, Dec 1, 2015 at 9:59 AM, Sowmini Varadhan
<[email protected]> wrote:
>
> I instrumented iperf with and without ipsec, just using esp-null,
> and 1 thread, to keep things simple. I'm seeing some pretty dismal
> performance numbers with ipsec, and trying to think of ways to
> improve this. Here are my findings, please share feedback.
>
> I suspect that a big part of the problem is the implicit loss of GSO,
> and this is made worse by some inefficiencies in the xfrm code:
> for single stream iperf (to avoid effects of rx-hash), I see the
> following on a 10G p2p ethernet link.
> 8.5-9.5 Gbps clear traffic, TSO disabled, so GSO, GRO is in effect
> 3-4 Gbps clear traffic, with both TSO/GSO disabled
> 1.8-2 Gbps for esp-null.

Are you losing checksum offload also?

> So the above numbers suggest that losing TSO/GSO results in one
> big drop in performance, and then there's another cliff for the
> clear -> esp-null transition. And those cliffs apply even if you are
> merely doing TCP-MD5 or AO for basic protection of the TCP connection.
>
> I tried moving things about a bit to defer the ipsec after GSO - I'll
> share my experimental patch as an RFC in a separate thread. (Disclaimer:
> the patch is just an experiment at this point).
>
> In that patch, I'm only focussing on esp-null and transp-mode ipsec
> for now, just to get some basic performance numbers to see if this is
> at all interesting. Essentially my hack mainly involves the following
>
> - don't disable TSO in sk_setup_caps() if a dst->header_len is found
> - in xfrm4_output, if GSO is applicable, bail out without esp header
> addition - that will get done after skb_segment()
> - at the end of tcp_gso_segment() (when tcp segment is available),
> set things up for xfrm_output_one and trigger the esp_output..
> I have to be very careful about setting up skb pointers here, since
> it looks like esp_output overloads the mac_header pointer e.g., for
> setting up the ip protocol field
>
> If I do all these things, the ipsec+iperf improves slightly- for
> esp-null, I move from approx 1.8 Gbps to about 3 Gbps, but clearly,
> this is still quite far from the 8 - 9 Gbps that I can get with just
> GSO+GRO for non-ipsec traffic.
>
> There are some inefficiencies that I can see in the xfrm code,
> that I am inheriting in my patch, e.g.,:
> memory management in the xfrm code has room for improvement. Every
> pass through xfrm_transport_output ends up doing a (avoidable?) memmove,
> and each pass through esp_output ends up doing a kmalloc/free of the
> "tmp" buffer.
> But these are all still relatively small things - tweaking them
> doesnt get me significantly past the 3 Gbps limit. Any suggestions
> on how to make this budge (or design criticism of the patch) would
> be welcome.
>
Thanks for the nice data! We could certainly implement GRO/GSO for
esp-null to get your numbers up but I don't think that would be very
useful to anyone. Do you have the performance numbers using real
encryption?

> --Sowmini
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-12-01 18:37:20

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/01/15 10:18), Tom Herbert wrote:
> > 8.5-9.5 Gbps clear traffic, TSO disabled, so GSO, GRO is in effect
> > 3-4 Gbps clear traffic, with both TSO/GSO disabled
> > 1.8-2 Gbps for esp-null.
>
> Are you losing checksum offload also?

I tried with both checksum offload on and off.
For the GSO case, doesnt make a huge difference to perf.
For my patch, I disable h/w cksum offload, so that I can leverage
from the existing cksum calculations in the existing GSO code. That helps
a bit (goes from 3 Gbps -> 3.2 Gbps, but I need a 2x jump here)


> Thanks for the nice data! We could certainly implement GRO/GSO for
> esp-null to get your numbers up but I don't think that would be very
> useful to anyone. Do you have the performance numbers using real
> encryption?

I was using esp-null merely to not have the crypto itself perturb
the numbers (i.e., just focus on the s/w overhead for now), but here
are the numbers for the stock linux kernel stack
Gbps peak cpu util
esp-null 1.8 71%
aes-gcm-c-256 1.6 79%
aes-ccm-a-128 0.7 96%

That trend made me think that if we can get esp-null to be as close
as possible to GSO/GRO, the rest will follow closely behind.

So is my patch in the right direction? Anything obvious I am missing?
I'd like to budge that number beyond 3 Gbps :-)

--Sowmini

2015-12-01 18:45:09

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/01/15 10:17), Rick Jones wrote:
>
> What do the perf profiles show? Presumably, loss of TSO/GSO means
> an increase in the per-packet costs, but if the ipsec path
> significantly increases the per-byte costs...

For ESP-null, there's actually very little work to do - we just
need to add the 8 byte ESP header with an spi and a seq#.. no
crypto work to do.. so the overhead *should* be minimal, else
we've painted ourself into a corner where we can't touch anything
including TCP options like md5.

perf profiles: I used perf tracepoints to instrument latency.
Yes, there is function call overhead for the xfrm path. So, for example,
the stack ends up being like this:
:
e5d2f2 ip_finish_output ([kerne.kallsyms])
75d6d0 ip_output ([kernel.kallsyms])
7c08ad xfrm_output_resume ([kernel.kallsyms])
7c0aae xfrm_output ([kernel.kallsyms])
7b1bdd xfrm4_output_finish ([kernel.kallsyms])
7b1c7e __xfrm4_output ([kernel.kallsyms])
7b1dbe xfrm4_output ([kernel.kallsyms])
75bac4 ip_local_out ([kernel.kallsyms])
75c012 ip_queue_xmit ([kernel.kallsyms])
7736a3 tcp_transmit_skb ([kernel.kallsyms])
:
where the detour into xfrm has been indented out, and esp_output
gets called out of xfrm_output_resume(). And as I said, there's
some nickels-and-dimes of perf to be squeezed out from
better memory management in xfrm, but the fact that it doesnt move
beyond 3 Gbps strikes me as some other bottleneck/serialization.

> Short of a perf profile, I suppose one way to probe for per-packet
> versus per-byte would be to up the MTU. That should reduce the
> per-packet costs while keeping the per-byte roughly the same.

actually the hack/rfc I sent out does help (in that it almost
doubles the existing 1.8 Gbps). Problem is that this cliff is much
steeper than that, and there's more hidden somewhere.

--Sowmini

2015-12-01 18:50:33

by Rick Jones

[permalink] [raw]
Subject: Re: ipsec impact on performance

On 12/01/2015 10:45 AM, Sowmini Varadhan wrote:
> On (12/01/15 10:17), Rick Jones wrote:
>>
>> What do the perf profiles show? Presumably, loss of TSO/GSO means
>> an increase in the per-packet costs, but if the ipsec path
>> significantly increases the per-byte costs...
>
> For ESP-null, there's actually very little work to do - we just
> need to add the 8 byte ESP header with an spi and a seq#.. no
> crypto work to do.. so the overhead *should* be minimal, else
> we've painted ourself into a corner where we can't touch anything
> including TCP options like md5.

Something of a longshot, but are you sure you are still getting
effective CKO/GRO on the receiver?

rick jones

2015-12-01 19:01:34

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/01/15 10:50), Rick Jones wrote:
>
> Something of a longshot, but are you sure you are still getting
> effective CKO/GRO on the receiver?

Good question. With ipsec, GRO (like GSO) gets implicitly disabled.

But when I explictly disable GRO on receiver, leaving only GSO
on sender, I can still get about 6 Gbps for clear traffic.

Thus if I could get closer to 6 Gbps for my patch, I'd at least
know that it was just GRO that was missing.. but I'm only getting
3 Gbps, which makes me think I missed something else on the sender
itself.

--Sowmini

2015-12-02 00:56:24

by David Ahern

[permalink] [raw]
Subject: Re: ipsec impact on performance

On 12/1/15 10:17 AM, Rick Jones wrote:
> On 12/01/2015 09:59 AM, Sowmini Varadhan wrote:
>> But these are all still relatively small things - tweaking them
>> doesnt get me significantly past the 3 Gbps limit. Any suggestions
>> on how to make this budge (or design criticism of the patch) would
>> be welcome.
>
> What do the perf profiles show? Presumably, loss of TSO/GSO means an
> increase in the per-packet costs, but if the ipsec path significantly
> increases the per-byte costs...
>
> Short of a perf profile, I suppose one way to probe for per-packet
> versus per-byte would be to up the MTU. That should reduce the
> per-packet costs while keeping the per-byte roughly the same.

Using iperf3 and AH with NULL algorithm between 2 peers connected by a
10G link.

Without AH configured I get a steady 9.9 Gbps with iperf3 consuming
about 55% cpu.

With AH I get ~1.5 Gbps with MTU at 1500:

[ 4] 0.00-1.01 sec 160 MBytes 1.33 Gbits/sec 23 905 KBytes
[ 4] 1.01-2.00 sec 211 MBytes 1.79 Gbits/sec 0 996 KBytes

iperf3 runs about 60% CPU and ksoftirqd/2 is at 86%.


Bumping the MTU to 9000:

[ 4] 3.00-4.00 sec 914 MBytes 7.67 Gbits/sec 260 1.01 MBytes
[ 4] 4.00-5.00 sec 1012 MBytes 8.49 Gbits/sec 0 1.23 MBytes
[ 4] 5.00-6.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.23 MBytes

At this rate iperf3 was at 95% CPU and ksoftirqd was not relevant.

2015-12-02 01:09:36

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/01/15 16:56), David Ahern wrote:
>
> Using iperf3 and AH with NULL algorithm between 2 peers connected by
> a 10G link.
>
I'm using esp-null, not AH, and iperf2, which I understand is
quite different from, and more aggressive than, iperf3 (though I'm not
sure that it matters for this single-stream case).

> With AH I get ~1.5 Gbps with MTU at 1500:

But yes, I get approx that too.

The "good" news is that I can get about 3 Gbps with my patch. So one
could say that I've 2x-ed the perf. Except that:

The "bad" news is that even GSO/GRO can do way better, so we
need to be able to extend that perf to also be available
to some key TCP and IP extensions (like md5 and ipsec, maybe)
and beyond (i.e need to de-ossify the stack so we can extend
TCP/IP features without sacrificing perf along the way).

The not-so-great news is that I see that just adding perf tracepoints
(not even enabling them!) seems to make a small diff (3 Gbps vs 3.2 Gbps)
to my numbers. Is that mere standard-deviation, or something
one should be aware of, about perf?

> iperf3 runs about 60% CPU and ksoftirqd/2 is at 86%.

yes, not surprising. You really need to compare this to GSO/GRO
for a pure-s/w, apples-apples comparison.

> Bumping the MTU to 9000:

Yes that's not always an option. See also the comments from Eric/Rick
about latency [http://lists.openwall.net/netdev/2015/11/24/111].

--Sowmini

2015-12-02 01:25:24

by David Ahern

[permalink] [raw]
Subject: Re: ipsec impact on performance

On 12/1/15 5:09 PM, Sowmini Varadhan wrote:
> The not-so-great news is that I see that just adding perf tracepoints
> (not even enabling them!) seems to make a small diff (3 Gbps vs 3.2 Gbps)
> to my numbers. Is that mere standard-deviation, or something
> one should be aware of, about perf?

existence of traepoints has no overhead until activated (ie., launch
perf or start ftrace for those tracepoints).

2015-12-02 06:53:05

by Steffen Klassert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Tue, Dec 01, 2015 at 12:59:53PM -0500, Sowmini Varadhan wrote:
>
> I instrumented iperf with and without ipsec, just using esp-null,
> and 1 thread, to keep things simple. I'm seeing some pretty dismal
> performance numbers with ipsec, and trying to think of ways to
> improve this. Here are my findings, please share feedback.
>
> I suspect that a big part of the problem is the implicit loss of GSO,
> and this is made worse by some inefficiencies in the xfrm code:
> for single stream iperf (to avoid effects of rx-hash), I see the
> following on a 10G p2p ethernet link.
> 8.5-9.5 Gbps clear traffic, TSO disabled, so GSO, GRO is in effect
> 3-4 Gbps clear traffic, with both TSO/GSO disabled
> 1.8-2 Gbps for esp-null.
> So the above numbers suggest that losing TSO/GSO results in one
> big drop in performance, and then there's another cliff for the
> clear -> esp-null transition. And those cliffs apply even if you are
> merely doing TCP-MD5 or AO for basic protection of the TCP connection.
>
> I tried moving things about a bit to defer the ipsec after GSO - I'll
> share my experimental patch as an RFC in a separate thread. (Disclaimer:
> the patch is just an experiment at this point).
>
> In that patch, I'm only focussing on esp-null and transp-mode ipsec
> for now, just to get some basic performance numbers to see if this is
> at all interesting. Essentially my hack mainly involves the following
>
> - don't disable TSO in sk_setup_caps() if a dst->header_len is found
> - in xfrm4_output, if GSO is applicable, bail out without esp header
> addition - that will get done after skb_segment()
> - at the end of tcp_gso_segment() (when tcp segment is available),
> set things up for xfrm_output_one and trigger the esp_output..
> I have to be very careful about setting up skb pointers here, since
> it looks like esp_output overloads the mac_header pointer e.g., for
> setting up the ip protocol field

I'm currently working on a GRO/GSO codepath for IPsec too. The GRO part
works already. I decapsulate/decrypt the packets on layer2 with a esp GRO
callback function and reinject them into napi_gro_receive(). So in case
the decapsulated packet is TCP, GRO can aggregate big packets.

My approach to GSO is a bit different to yours. I focused on tunnel mode,
but transport mode should work too. I encapsulate the big GSO packets
but don't do the encryption. Then I've added a esp_gso_segment() function,
so the (still not encrypted ESP packets) get segmented with GSO. Finally I
do encryption for all segments. This works well as long as I do sync crypto.
The hard part is when crypto returns async. This is what I'm working on now.
I hope to get this ready during the next weeks that I can post a RFC version
and some numbers.

Also I tried to consider the IPsec GRO/GSO codepath as a software fallback.
So I added hooks for the encapsulation, encryption etc. If a NIC can do
IPsec, it can use this hooks to prepare the packets the way it needs it.
There are NICs that can do IPsec, it's just that our stack does not support
it.

Another thing, I thought about setting up an IPsec BoF/workshop at
netdev1.1. My main topic is GRO/GSO for IPsec. I'll send out a mail
to the list later this week to see if there is enough interest and
maybe some additional topics.

2015-12-02 11:56:45

by David Laight

[permalink] [raw]
Subject: RE: ipsec impact on performance

From: Sowmini Varadhan
> Sent: 01 December 2015 18:37
...
> I was using esp-null merely to not have the crypto itself perturb
> the numbers (i.e., just focus on the s/w overhead for now), but here
> are the numbers for the stock linux kernel stack
> Gbps peak cpu util
> esp-null 1.8 71%
> aes-gcm-c-256 1.6 79%
> aes-ccm-a-128 0.7 96%
>
> That trend made me think that if we can get esp-null to be as close
> as possible to GSO/GRO, the rest will follow closely behind.

That's not how I read those figures.
They imply to me that there is a massive cost for the actual encryption
(particularly for aes-ccm-a-128) - so whatever you do to the esp-null
case won't help.

One way to get a view of the cost of the encryption (and copies)
is to do the operation twice.

David

2015-12-02 12:05:46

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 07:53), Steffen Klassert wrote:
>
> I'm currently working on a GRO/GSO codepath for IPsec too. The GRO part
> works already. I decapsulate/decrypt the packets on layer2 with a esp GRO
> callback function and reinject them into napi_gro_receive(). So in case
> the decapsulated packet is TCP, GRO can aggregate big packets.

Would you be able to share your patch with me? I'd like to give that a try
just to get preliminary numbers (and I could massage it as needed
for transport mode too).

> My approach to GSO is a bit different to yours. I focused on tunnel mode,
> but transport mode should work too. I encapsulate the big GSO packets
> but don't do the encryption. Then I've added a esp_gso_segment() function,
> so the (still not encrypted ESP packets) get segmented with GSO. Finally I
> do encryption for all segments. This works well as long as I do sync crypto.
> The hard part is when crypto returns async. This is what I'm working on now.
> I hope to get this ready during the next weeks that I can post a RFC version
> and some numbers.

I see. My thought for attacking tunnel mode would have been to
callout the esp code at the tail of gre_gso_segment, but I did not
yet consider this carefully - clearly you've spent more time on it,
and know more about all the gotchas there.

> Also I tried to consider the IPsec GRO/GSO codepath as a software fallback.
> So I added hooks for the encapsulation, encryption etc. If a NIC can do
> IPsec, it can use this hooks to prepare the packets the way it needs it.
> There are NICs that can do IPsec, it's just that our stack does not support
> it.

yes, this is one of the things I wanted to bring up at netdev 1.1.
Evidently many of the 10G NICS (Niantic, Twinville, Sageville) already
support ipsec offload but that feature is not enabled for BSD or linux
because the stack does not support it (though Microsoft does. The intel
folks pointed me at this doc:
https://msdn.microsoft.com/en-us/library/windows/hardware/ff556996%28v=vs.85%29.aspx)

But quite independant of h/w offload, the s/w stack can already do
a very good job for 10G with just GSO and GRO, so being able to extend
that path to do encryption after segmentation should at least bridge
the huge gap between the ipsec and non-ipsec mech.

And that gap should be as small as possible for esp-null, so that
the only big hit we take is for the complexity of encryption itself!

> Another thing, I thought about setting up an IPsec BoF/workshop at
> netdev1.1. My main topic is GRO/GSO for IPsec. I'll send out a mail
> to the list later this week to see if there is enough interest and
> maybe some additional topics.

Sounds like an excellent idea. I'm certainly interested.

--Sowmini
>

2015-12-02 12:12:08

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 11:56), David Laight wrote:
> > Gbps peak cpu util
> > esp-null 1.8 71%
> > aes-gcm-c-256 1.6 79%
> > aes-ccm-a-128 0.7 96%
> >
> > That trend made me think that if we can get esp-null to be as close
> > as possible to GSO/GRO, the rest will follow closely behind.
>
> That's not how I read those figures.
> They imply to me that there is a massive cost for the actual encryption
> (particularly for aes-ccm-a-128) - so whatever you do to the esp-null
> case won't help.

I'm not a crypto expert, but my understanding is that the CCM mode
is the "older" encryption algorithm, and GCM is the way of the future.
Plus, I think the GCM mode has some type of h/w support (hence the
lower cpu util)

I'm sure that crypto has a cost, not disputing that, but my point
was that 1.8 -> 1.6 -> 0.7 is a curve with a much gentler slope than
the 9 Gbps (clear traffic, GSO, GRO)
-> 4 Gbps (clear, no gro, gso)
-> 1.8 (esp-null)
That steeper slope smells of s/w perf that we need to resolve first,
before getting into the work of faster crypto?

> One way to get a view of the cost of the encryption (and copies)
> is to do the operation twice.

I could also just instrument it with perf tracepoints, if that
data is interesting

--Sowmini

2015-12-02 12:41:40

by David Laight

[permalink] [raw]
Subject: RE: ipsec impact on performance

From: Sowmini Varadhan
> Sent: 02 December 2015 12:12
> On (12/02/15 11:56), David Laight wrote:
> > > Gbps peak cpu util
> > > esp-null 1.8 71%
> > > aes-gcm-c-256 1.6 79%
> > > aes-ccm-a-128 0.7 96%
> > >
> > > That trend made me think that if we can get esp-null to be as close
> > > as possible to GSO/GRO, the rest will follow closely behind.
> >
> > That's not how I read those figures.
> > They imply to me that there is a massive cost for the actual encryption
> > (particularly for aes-ccm-a-128) - so whatever you do to the esp-null
> > case won't help.
>
> I'm not a crypto expert, but my understanding is that the CCM mode
> is the "older" encryption algorithm, and GCM is the way of the future.
> Plus, I think the GCM mode has some type of h/w support (hence the
> lower cpu util)
>
> I'm sure that crypto has a cost, not disputing that, but my point
> was that 1.8 -> 1.6 -> 0.7 is a curve with a much gentler slope than
> the 9 Gbps (clear traffic, GSO, GRO)
> -> 4 Gbps (clear, no gro, gso)
> -> 1.8 (esp-null)
> That steeper slope smells of s/w perf that we need to resolve first,
> before getting into the work of faster crypto?

That isn't the way cpu cost works.
You are getting 0.7 Gbps with ass-ccm-a-128, scale the esp-null back to
that and it would use 7/18*71 = 27% of the cpu.
So 69% of the cpu in the a-128 case is probably caused by the
encryption itself.
Even if the rest of the code cost nothing you'd not increase
above 1Gbps.

The sums for aes-gcm-c-256 are slightly better, about 15%.

Ok, things aren't quite that simple since you are probably changing
the way data flows through the system as well.

Also what/how are you measuring cpu use.
I'm not sure anything on Linux gives you a truly accurate value
when processes are running for very short periods.

On an SMP system you also get big effects when work is switched
between cpus. I've got some tests that run a lot faster if I
put all but one of the cpus into a busy-loop in userspace
(eg: while :; do :; done)!

David

2015-12-02 13:25:26

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 12:41), David Laight wrote:
>
> Also what/how are you measuring cpu use.
> I'm not sure anything on Linux gives you a truly accurate value
> when processes are running for very short periods.

I was using mpstat, while running iperf. Should I be using
something else? or running it for longer intervals?

but I hope we are not doomed at 1 Gbps, or else security itself would
come at a very unattractive cost. Anyway, even aside from crypto.
we need to have some way to add TCP options (that depend on the
contents of the tcp header) etc post-GSO, in the interest of not
ossifying the stack.

> On an SMP system you also get big effects when work is switched
> between cpus. I've got some tests that run a lot faster if I
> put all but one of the cpus into a busy-loop in userspace
> (eg: while :; do :; done)!

yes Rick Jones also pointed the same thing to me, and one of the
things I was going to try out later today is to instrument the
effects of pinning irqs and iperf threads to a specific cpu.

--Sowmini

2015-12-02 20:50:48

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 12:41), David Laight wrote:
> You are getting 0.7 Gbps with ass-ccm-a-128, scale the esp-null back to
> that and it would use 7/18*71 = 27% of the cpu.
> So 69% of the cpu in the a-128 case is probably caused by the
> encryption itself.
> Even if the rest of the code cost nothing you'd not increase
> above 1Gbps.

Fortunately, the situation is not quite hopeless yet.

Thanks to Rick Jones for supplying the hints for this, but with
some careful manual pinning of irqs and iperf processes to cpus,
I can get to 4.5 Gbps for the esp-null case.

Given that the [clear traffic + GSO without GRO] gets me about 5-7 Gbps,
the 4.5 Gbps is not that far off (and at that point, the nickel-and-dime
tweaks may help even more).

For AES-GCM, I'm able to go from 1.8 Gbps (no GSO) to 2.8 Gbps.
Still not great, but proves that we haven't yet hit any upper bounds
yet.

I think a lot of the manual tweaking of irq/process placement
is needed because the existing rps/rfs flow steering is looking
for TCP/UDP flow numbers to do the steering. It can just as easily
use the IPsec SPI numbers to do this, and that's another place where
we can make this more ipsec-friendly.

--Sowmini

2015-12-02 21:07:09

by Tom Herbert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Wed, Dec 2, 2015 at 12:50 PM, Sowmini Varadhan
<[email protected]> wrote:
> On (12/02/15 12:41), David Laight wrote:
>> You are getting 0.7 Gbps with ass-ccm-a-128, scale the esp-null back to
>> that and it would use 7/18*71 = 27% of the cpu.
>> So 69% of the cpu in the a-128 case is probably caused by the
>> encryption itself.
>> Even if the rest of the code cost nothing you'd not increase
>> above 1Gbps.
>
> Fortunately, the situation is not quite hopeless yet.
>
> Thanks to Rick Jones for supplying the hints for this, but with
> some careful manual pinning of irqs and iperf processes to cpus,
> I can get to 4.5 Gbps for the esp-null case.
>
> Given that the [clear traffic + GSO without GRO] gets me about 5-7 Gbps,
> the 4.5 Gbps is not that far off (and at that point, the nickel-and-dime
> tweaks may help even more).
>
> For AES-GCM, I'm able to go from 1.8 Gbps (no GSO) to 2.8 Gbps.
> Still not great, but proves that we haven't yet hit any upper bounds
> yet.
>
> I think a lot of the manual tweaking of irq/process placement
> is needed because the existing rps/rfs flow steering is looking
> for TCP/UDP flow numbers to do the steering. It can just as easily
> use the IPsec SPI numbers to do this, and that's another place where
> we can make this more ipsec-friendly.
>
That's easy enough to add to flow dissector, but is SPI really
intended to be used an L4 entropy value? We would need to consider the
effects of running multiple TCP connections over an IPsec. Also, you
might want to try IPv6, the flow label should provide a good L4 hash
for RPS/RFS, it would be interesting to see what the effects are with
IPsec processing. (ESP/UDP could also if RSS/ECMP is critical)

Tom

2015-12-02 21:12:01

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 13:07), Tom Herbert wrote:
> That's easy enough to add to flow dissector, but is SPI really
> intended to be used an L4 entropy value? We would need to consider the

yes. To quote https://en.wikipedia.org/wiki/Security_Parameter_Index
"This works like port numbers in TCP and UDP connections. What it means
is that there could be different SAs used to provide security to one
connection. An SA could therefore act as a set of rules."

> effects of running multiple TCP connections over an IPsec. Also, you
> might want to try IPv6, the flow label should provide a good L4 hash
> for RPS/RFS, it would be interesting to see what the effects are with
> IPsec processing. (ESP/UDP could also if RSS/ECMP is critical)

IPv6 would be an interesting academic exercise, but it's going
to be a while before we get RDS-TCP to go over IPv6.

--Sowmini

2015-12-02 21:44:20

by Tom Herbert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Wed, Dec 2, 2015 at 1:12 PM, Sowmini Varadhan
<[email protected]> wrote:
> On (12/02/15 13:07), Tom Herbert wrote:
>> That's easy enough to add to flow dissector, but is SPI really
>> intended to be used an L4 entropy value? We would need to consider the
>
> yes. To quote https://en.wikipedia.org/wiki/Security_Parameter_Index
> "This works like port numbers in TCP and UDP connections. What it means
> is that there could be different SAs used to provide security to one
> connection. An SA could therefore act as a set of rules."
>
>> effects of running multiple TCP connections over an IPsec. Also, you
>> might want to try IPv6, the flow label should provide a good L4 hash
>> for RPS/RFS, it would be interesting to see what the effects are with
>> IPsec processing. (ESP/UDP could also if RSS/ECMP is critical)
>
> IPv6 would be an interesting academic exercise, but it's going
> to be a while before we get RDS-TCP to go over IPv6.
>
Huh? Who said anything about RDS-TCP? I thought you were trying to
improve IPsec performance...

2015-12-02 21:47:22

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 13:44), Tom Herbert wrote:
> > IPv6 would be an interesting academic exercise, but it's going
> > to be a while before we get RDS-TCP to go over IPv6.
> >
> Huh? Who said anything about RDS-TCP? I thought you were trying to
> improve IPsec performance...

yes, and it would be nice to find out that IPsec for IPv6 is
fast, but I'm afraid there are a lot of IPv4 use cases out there that
need the same thing for IPv4 too (first?).

--Sowmini

2015-12-02 22:01:43

by Tom Herbert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Wed, Dec 2, 2015 at 1:47 PM, Sowmini Varadhan
<[email protected]> wrote:
> On (12/02/15 13:44), Tom Herbert wrote:
>> > IPv6 would be an interesting academic exercise, but it's going
>> > to be a while before we get RDS-TCP to go over IPv6.
>> >
>> Huh? Who said anything about RDS-TCP? I thought you were trying to
>> improve IPsec performance...
>
> yes, and it would be nice to find out that IPsec for IPv6 is
> fast, but I'm afraid there are a lot of IPv4 use cases out there that
> need the same thing for IPv4 too (first?).
>
No, please don't persist is this myopic "we'll get to IPv6 later"
model! IPv6 is a real protocol, it has significant deployment of the
Internet, and there are now whole data centers that are IPv6 only
(e.g. FB), and there are plenty of use cases of IPSEC/IPv6 that could
benefit for performance improvements just as much IPv4. This vendor
mentality that IPv6 is still not important simply doesn't help
matters. :-(

Tom

2015-12-02 22:08:03

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/02/15 14:01), Tom Herbert wrote:
> No, please don't persist is this myopic "we'll get to IPv6 later"
> model! IPv6 is a real protocol, it has significant deployment of the
> Internet, and there are now whole data centers that are IPv6 only
> (e.g. FB), and there are plenty of use cases of IPSEC/IPv6 that could
> benefit for performance improvements just as much IPv4. This vendor
> mentality that IPv6 is still not important simply doesn't help
> matters. :-(

Ok, I'll get you the numbers for this later, and sure, if we do
this, we should solve the ipv6 problem too.

BTW, the ipv6 nov3 paths have severe alignment issues. I flagged
this a long time ago http://www.spinics.net/lists/netdev/msg336257.html

I think all of it is triggered by mld. Someone needs to do
something about that too. I dont think those paths are using
NET_ALIGN very well, and I dont think this is the most wholesome
thing for perf.

--Sowmini

2015-12-02 22:25:20

by Eric Dumazet

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Wed, 2015-12-02 at 16:12 -0500, Sowmini Varadhan wrote:

> IPv6 would be an interesting academic exercise

Really, you made my day !

2015-12-03 01:31:39

by Rick Jones

[permalink] [raw]
Subject: Re: ipsec impact on performance

On 12/02/2015 03:56 AM, David Laight wrote:
> From: Sowmini Varadhan
>> Sent: 01 December 2015 18:37
> ...
>> I was using esp-null merely to not have the crypto itself perturb
>> the numbers (i.e., just focus on the s/w overhead for now), but here
>> are the numbers for the stock linux kernel stack
>> Gbps peak cpu util
>> esp-null 1.8 71%
>> aes-gcm-c-256 1.6 79%
>> aes-ccm-a-128 0.7 96%
>>
>> That trend made me think that if we can get esp-null to be as close
>> as possible to GSO/GRO, the rest will follow closely behind.
>
> That's not how I read those figures.
> They imply to me that there is a massive cost for the actual encryption
> (particularly for aes-ccm-a-128) - so whatever you do to the esp-null
> case won't help.
>

To build on the whole "importance of normalizing throughput and CPU
utilization in some way" theme, the following are some non-IPSec netperf
TCP_STREAM runs between a pair of 2xIntel E5-2603 v3 systems using
Broadcom BCM57810-based NICs, 4.2.0-19 kernel, 7.10.72 firmware and
bnx2x driver version 1.710.51-0:


root@htx-scale300-258:~# ./take_numbers.sh
Baseline
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.12.49.1 () port 0 AF_INET : +/-2.500% @ 99% conf. : demo : cpu bind
Throughput Local Local Local Remote Remote Remote Throughput Local
Remote
CPU Service Peak CPU Service Peak Confidence CPU
CPU
Util Demand Per CPU Util Demand Per CPU Width (%)
Confidence Confidence
% Util % % Util %
Width (%) Width (%)
9414.11 1.87 0.195 26.54 3.70 0.387 45.42 0.002 7.073
1.276
Disable TSO/GSO
5651.25 8.36 1.454 100.00 2.46 0.428 30.35 1.093 1.101
4.889
Disable tx CKO
5287.69 8.46 1.573 100.00 2.34 0.435 29.66 0.428 7.710
3.518
Disable remote LRO/GRO
4148.76 8.32 1.971 99.97 5.95 1.409 71.98 3.656 0.735
3.491
Disable remote rx CKO
4204.49 8.31 1.942 100.00 6.68 1.563 82.05 2.015 0.437
4.921

You can see that as the offloads are disabled, the service demands (usec
of CPU time consumed systemwide per KB of data transferred) go up, and
until one hits a bottleneck (eg one of the CPUs pegs at 100%), go up
faster than the throughputs go down.

To aid in reproducibility those tests were with irqbalance disabled, all
the IRQs for the NICs pointed at CPU 0, netperf/netserver bound to CPU
0, and the power management set to static high performance.

Assuming I've created a "matching" ipsec.conf, here is what I see with
esp=null-null on the TCP_STREAM test - again, keeping all the binding in
place etc:

3077.37 8.01 2.560 97.78 8.21 2.625 99.41 4.869 1.876
0.955

You can see that even with the null-null, there is a rather large
increase in service demand.

And this is what I see when I run netperf TCP_RR (first is without
ipsec, second is with. I didn't ask for confidence intervals this time
around and I didn't try to tweak interrupt coalescing settings)

# HDR="-P 1";for i in 10.12.49.1 192.168.0.2; do ./netperf -H $i -t
TCP_RR -c -C -l 30 -T 0 $HDR; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 10.12.49.1 () port 0 AF_INET : demo : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr

16384 87380 1 1 30.00 30419.75 1.72 1.68 6.783 6.617
16384 87380
16384 87380 1 1 30.00 20711.39 2.15 2.05 12.450 11.882
16384 87380

The service demand increases ~83% on the netperf side and almost 80% on
the netserver side. That is pure "effective" path-length increase.

happy benchmarking,

rick jones

PS - the netperf commands were varations on this theme:
./netperf -P 0 -T 0 -H 10.12.49.1 -c -C -l 30 -i 30,3 -- -O
throughput,local_cpu_util,local_sd,local_cpu_peak_util,remote_cpu_util,remote_sd,remote_cpu_peak_util,throughput_confid,local_cpu_confid,remote_cpu_confid
altering IP address or test as appropriate. -P 0 disables printing the
test banner/headers. -T 0 binds netperf and netserver to CPU0 on their
respective systems. -H sets the destination, -c and -C ask for local
and remote CPU measurements respectively. -l 30 says each test
iteration should be 30 seconds long and -i 30,3 says to run at least
three iterations and no more than 30 when trying to hit the confidence
interval - by default 99% confident the average reported is within +/-
2.5% of the "actual" average. The -O stuff is selecting specific values
to be emitted.

2015-12-03 08:45:12

by Steffen Klassert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Wed, Dec 02, 2015 at 07:05:38AM -0500, Sowmini Varadhan wrote:
> On (12/02/15 07:53), Steffen Klassert wrote:
> >
> > I'm currently working on a GRO/GSO codepath for IPsec too. The GRO part
> > works already. I decapsulate/decrypt the packets on layer2 with a esp GRO
> > callback function and reinject them into napi_gro_receive(). So in case
> > the decapsulated packet is TCP, GRO can aggregate big packets.
>
> Would you be able to share your patch with me? I'd like to give that a try
> just to get preliminary numbers (and I could massage it as needed
> for transport mode too).

I've got the final bits to work today, I can do async crypto now.
I can push the patches to a public tree after some polishing.
But I have to warn, it has still bugs and no usefull commit messages.

I did a first test with forwaring esp in tunnel mode. The crypto
algorithm I used was:

pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))

Result:

iperf -c 10.0.0.12 -t 60
------------------------------------------------------------
Client connecting to 10.0.0.12, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 32.8 GBytes 4.70 Gbits/sec

I provide more informatios as soon as the code is available.

>
> > Another thing, I thought about setting up an IPsec BoF/workshop at
> > netdev1.1. My main topic is GRO/GSO for IPsec. I'll send out a mail
> > to the list later this week to see if there is enough interest and
> > maybe some additional topics.
>
> Sounds like an excellent idea. I'm certainly interested.

Great, than we are at least two :)

2015-12-03 11:38:26

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/03/15 09:45), Steffen Klassert wrote:
> pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))
>
> Result:
>
> iperf -c 10.0.0.12 -t 60
> ------------------------------------------------------------
> Client connecting to 10.0.0.12, TCP port 5001
> TCP window size: 45.0 KByte (default)
> ------------------------------------------------------------
> [ 3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-60.0 sec 32.8 GBytes 4.70 Gbits/sec
>
> I provide more informatios as soon as the code is available.

that's pretty good compared to the baseline.
I'd like to try out our patches, when they are ready.

I think you may get some more improvement if you manually pin the irq
and iperf to specific cpus (at least that was my observation for transp
mode)

--Sowmini

2015-12-03 11:47:23

by Steffen Klassert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Thu, Dec 03, 2015 at 06:38:20AM -0500, Sowmini Varadhan wrote:
> On (12/03/15 09:45), Steffen Klassert wrote:
> > pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))
> >
> > Result:
> >
> > iperf -c 10.0.0.12 -t 60
> > ------------------------------------------------------------
> > Client connecting to 10.0.0.12, TCP port 5001
> > TCP window size: 45.0 KByte (default)
> > ------------------------------------------------------------
> > [ 3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001
> > [ ID] Interval Transfer Bandwidth
> > [ 3] 0.0-60.0 sec 32.8 GBytes 4.70 Gbits/sec
> >
> > I provide more informatios as soon as the code is available.
>
> that's pretty good compared to the baseline.
> I'd like to try out our patches, when they are ready.
>
> I think you may get some more improvement if you manually pin the irq
> and iperf to specific cpus (at least that was my observation for transp
> mode)

I do that already. I have dedicated crypto and IO cpus, 2 cpus
do networking IO and 4 cpus do crypto (parallelized with pcrypt).

The bottleneck is now the cpu that does the TX path (checksumming
of the GSO segments).

2015-12-03 19:33:30

by David Miller

[permalink] [raw]
Subject: Re: ipsec impact on performance

From: Sowmini Varadhan <[email protected]>
Date: Tue, 1 Dec 2015 12:59:53 -0500

> I instrumented iperf with and without ipsec, just using esp-null,
> and 1 thread, to keep things simple. I'm seeing some pretty dismal
> performance numbers with ipsec, and trying to think of ways to
> improve this. Here are my findings, please share feedback.

Doesn't skb_cow_data() contribute significantly to the ESP base cost,
especially for TCP packets?

I mean, we're copying every TCP data frame.

If this is the case, even with GSO/whatever offloads, I expect that
performance will be roughly halfed.

2015-12-03 20:08:42

by Eric Dumazet

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Thu, 2015-12-03 at 14:33 -0500, David Miller wrote:
> From: Sowmini Varadhan <[email protected]>
> Date: Tue, 1 Dec 2015 12:59:53 -0500
>
> > I instrumented iperf with and without ipsec, just using esp-null,
> > and 1 thread, to keep things simple. I'm seeing some pretty dismal
> > performance numbers with ipsec, and trying to think of ways to
> > improve this. Here are my findings, please share feedback.
>
> Doesn't skb_cow_data() contribute significantly to the ESP base cost,
> especially for TCP packets?
>
> I mean, we're copying every TCP data frame.
>
> If this is the case, even with GSO/whatever offloads, I expect that
> performance will be roughly halfed.

This reminds me this thing I noticed is that we (un)clone all xmit GRE
GSO packets because of following code in iptunnel_handle_offloads() :

if (skb_is_gso(skb)) {
err = skb_unclone(skb, GFP_ATOMIC);
if (unlikely(err))
goto error;
skb_shinfo(skb)->gso_type |= gso_type_mask;
return skb;
}

This is certainly something we should avoid, since we have ~1500 bytes
of payload in skb->head per TCP skb

Ideally, part of gso_type should belong to skb, not skb_shinfo(skb) :(

2015-12-03 20:32:30

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/03/15 14:33), David Miller wrote:
>
> Doesn't skb_cow_data() contribute significantly to the ESP base cost,
> especially for TCP packets?

Indeed. For esp-null, it's about half of the total time spent
in esp_output (for one run that I just instrumented with perf
tracepoints 2.5 ms compared to 5.8 ms)

It never goes into the slow path of skb_cow_data (the path
with comment about mincer fragments) because whether or not you
do this after GSO, TCP makes sure to fit within the MSS, so (unless you
have Jumbo enabled?) you'd send it something that does not have
a fraglist.

Was the cow_data call just intended to handle the fraglist case?
Or is there something else more generic going on here (it's hard
to tell because esp_output doesnt have too many comments to explain
what it thinks its doing, as it juggles different len fields around)

> I mean, we're copying every TCP data frame.
>
> If this is the case, even with GSO/whatever offloads, I expect that
> performance will be roughly halfed.

The other obvious "low-hanging fruit" is to address the TODO in the
comment above esp_alloc_tmp.

--Sowmini

2015-12-04 05:18:44

by Sandy Harris

[permalink] [raw]
Subject: Re: ipsec impact on performance

This article is old (turn of the century) but it may have numbers
worth comparing to
http://www.freeswan.org/freeswan_trees/CURRENT-TREE/doc/performance.html

2015-12-07 08:40:50

by Steffen Klassert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Thu, Dec 03, 2015 at 06:38:20AM -0500, Sowmini Varadhan wrote:
> On (12/03/15 09:45), Steffen Klassert wrote:
> > pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))
> >
> > Result:
> >
> > iperf -c 10.0.0.12 -t 60
> > ------------------------------------------------------------
> > Client connecting to 10.0.0.12, TCP port 5001
> > TCP window size: 45.0 KByte (default)
> > ------------------------------------------------------------
> > [ 3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001
> > [ ID] Interval Transfer Bandwidth
> > [ 3] 0.0-60.0 sec 32.8 GBytes 4.70 Gbits/sec
> >
> > I provide more informatios as soon as the code is available.
>
> that's pretty good compared to the baseline.

This is GRO in combination with a pcrypt parallelized
crypto algorithm, without the parallelization GRO/GSO
does not help because crypto is the bottleneck then.

> I'd like to try out our patches, when they are ready.

I've pushed it to

https://git.kernel.org/cgit/linux/kernel/git/klassert/linux-stk.git/log/?h=net-next-ipsec-offload

It is just example code, nothing that I would show usually.
But you asked for it, so here is it :)

The GRO part seems to work well, the GSO part is just a hack at the
moment.

2015-12-07 11:27:56

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/07/15 09:40), Steffen Klassert wrote:
>
> I've pushed it to
>
> https://git.kernel.org/cgit/linux/kernel/git/klassert/linux-stk.git/log/?h=net-next-ipsec-offload
>
> It is just example code, nothing that I would show usually.
> But you asked for it, so here is it :)

that's fine, I dont expect more at this point, just want to
test-drive it, and see how it compares to my approach.

thanks!

> The GRO part seems to work well, the GSO part is just a hack at the
> moment.

2015-12-08 11:32:48

by Steffen Klassert

[permalink] [raw]
Subject: Re: ipsec impact on performance

On Mon, Dec 07, 2015 at 06:27:48AM -0500, Sowmini Varadhan wrote:
> On (12/07/15 09:40), Steffen Klassert wrote:
> >
> > I've pushed it to
> >
> > https://git.kernel.org/cgit/linux/kernel/git/klassert/linux-stk.git/log/?h=net-next-ipsec-offload
> >
> > It is just example code, nothing that I would show usually.
> > But you asked for it, so here is it :)
>
> that's fine, I dont expect more at this point, just want to
> test-drive it, and see how it compares to my approach.

Would be nice if you could share the results. Comments are
welcome too, of course.

2015-12-08 11:51:21

by Sowmini Varadhan

[permalink] [raw]
Subject: Re: ipsec impact on performance

On (12/08/15 12:32), Steffen Klassert wrote:
>
> Would be nice if you could share the results. Comments are

Sure, not a problem. Give me some time though, I'm also looking
into the skb_cow_data and other memory-management issues that
were flagged on this thread.

I'll have all this info by netdev, at the latest.

--Sowmini