dev_gro_receive() uses indirect calls for IP GRO functions, but
it works only for the outermost headers and untagged frames.
Simple VLAN tag before an IP header restores the performance hit.
This simple series straightens the GRO calls for IP headers going
after VLAN tag or inner Ethernet header (GENEVE, NvGRE, VxLAN)
for retpolined kernels.
Alexander Lobakin (4):
gro: make net/gro.h self-contained
gro: add combined call_gro_receive() + INDIRECT_CALL_INET() helper
vlan/8021q: avoid retpoline overhead on GRO
ethernet: avoid retpoline overhead on TEB (GENEVE, NvGRE, VxLAN) GRO
include/net/gro.h | 13 +++++++++++++
net/8021q/vlan_core.c | 10 ++++++++--
net/ethernet/eth.c | 11 ++++++++---
3 files changed, 29 insertions(+), 5 deletions(-)
--
2.31.0
call_gro_receive() is used to limit GRO recursion, but it works only
with callback pointers.
There's a combined version of call_gro_receive() + INDIRECT_CALL_2()
in <net/inet_common.h>, but it doesn't check for IPv6 modularity.
Add a similar new helper to cover both of these. It can and will be
used to avoid retpoline overhead when IP header lies behind another
offloaded proto.
Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/gro.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/net/gro.h b/include/net/gro.h
index 27c38b36df16..01edaf3fdda0 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -14,4 +14,12 @@ INDIRECT_CALLABLE_DECLARE(int ipv6_gro_complete(struct sk_buff *, int));
INDIRECT_CALLABLE_DECLARE(struct sk_buff *inet_gro_receive(struct list_head *,
struct sk_buff *));
INDIRECT_CALLABLE_DECLARE(int inet_gro_complete(struct sk_buff *, int));
+
+#define indirect_call_gro_receive_inet(cb, f2, f1, head, skb) \
+({ \
+ unlikely(gro_recursion_inc_test(skb)) ? \
+ NAPI_GRO_CB(skb)->flush |= 1, NULL : \
+ INDIRECT_CALL_INET(cb, f2, f1, head, skb); \
+})
+
#endif /* _NET_IPV6_GRO_H */
--
2.31.0
If some source file includes <net/gro.h>, but doesn't include
<linux/indirect_call_wrapper.h>:
In file included from net/8021q/vlan_core.c:7:
./include/net/gro.h:6:1: warning: data definition has no type or storage class
6 | INDIRECT_CALLABLE_DECLARE(struct sk_buff *ipv6_gro_receive(struct list_head *,
| ^~~~~~~~~~~~~~~~~~~~~~~~~
./include/net/gro.h:6:1: error: type defaults to ‘int’ in declaration of ‘INDIRECT_CALLABLE_DECLARE’ [-Werror=implicit-int]
[...]
Include <linux/indirect_call_wrapper.h> directly. It's small and
won't pull lots of dependencies.
Also add some incomplete struct declarations to be fully stacked.
Fixes: 04f00ab2275f ("net/core: move gro function declarations to separate header ")
Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/gro.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/net/gro.h b/include/net/gro.h
index 8a6eb5303cc4..27c38b36df16 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -3,6 +3,11 @@
#ifndef _NET_IPV6_GRO_H
#define _NET_IPV6_GRO_H
+#include <linux/indirect_call_wrapper.h>
+
+struct list_head;
+struct sk_buff;
+
INDIRECT_CALLABLE_DECLARE(struct sk_buff *ipv6_gro_receive(struct list_head *,
struct sk_buff *));
INDIRECT_CALLABLE_DECLARE(int ipv6_gro_complete(struct sk_buff *, int));
--
2.31.0
The two most popular headers going after VLAN are IPv4 and IPv6.
Retpoline overhead for them is addressed only in dev_gro_receive(),
when they lie right after the outermost Ethernet header.
Use the indirect call wrappers in VLAN GRO receive code to reduce
the penalty on receiving tagged frames (when hardware stripping is
off or not available).
Signed-off-by: Alexander Lobakin <[email protected]>
---
net/8021q/vlan_core.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 78ec2e1b14d1..59bc13b5f14f 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -4,6 +4,7 @@
#include <linux/if_vlan.h>
#include <linux/netpoll.h>
#include <linux/export.h>
+#include <net/gro.h>
#include "vlan.h"
bool vlan_do_receive(struct sk_buff **skbp)
@@ -495,7 +496,10 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head,
skb_gro_pull(skb, sizeof(*vhdr));
skb_gro_postpull_rcsum(skb, vhdr, sizeof(*vhdr));
- pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
+
+ pp = indirect_call_gro_receive_inet(ptype->callbacks.gro_receive,
+ ipv6_gro_receive, inet_gro_receive,
+ head, skb);
out_unlock:
rcu_read_unlock();
@@ -515,7 +519,9 @@ static int vlan_gro_complete(struct sk_buff *skb, int nhoff)
rcu_read_lock();
ptype = gro_find_complete_by_type(type);
if (ptype)
- err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*vhdr));
+ err = INDIRECT_CALL_INET(ptype->callbacks.gro_complete,
+ ipv6_gro_complete, inet_gro_complete,
+ skb, nhoff + sizeof(*vhdr));
rcu_read_unlock();
return err;
--
2.31.0
The two most popular headers going after Ethernet are IPv4 and IPv6.
Retpoline overhead for them is addressed only in dev_gro_receive(),
when they lie right after the outermost Ethernet header.
Use the indirect call wrappers in TEB (Transparent Ethernet Bridging,
such as GENEVE, NvGRE, VxLAN etc.) GRO receive code to reduce the
penalty when processing the inner headers.
Signed-off-by: Alexander Lobakin <[email protected]>
---
net/ethernet/eth.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index e01cf766d2c5..933b427122be 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -58,6 +58,7 @@
#include <net/ip.h>
#include <net/dsa.h>
#include <net/flow_dissector.h>
+#include <net/gro.h>
#include <linux/uaccess.h>
#include <net/pkt_sched.h>
@@ -449,7 +450,10 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb)
skb_gro_pull(skb, sizeof(*eh));
skb_gro_postpull_rcsum(skb, eh, sizeof(*eh));
- pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
+
+ pp = indirect_call_gro_receive_inet(ptype->callbacks.gro_receive,
+ ipv6_gro_receive, inet_gro_receive,
+ head, skb);
out_unlock:
rcu_read_unlock();
@@ -473,8 +477,9 @@ int eth_gro_complete(struct sk_buff *skb, int nhoff)
rcu_read_lock();
ptype = gro_find_complete_by_type(type);
if (ptype != NULL)
- err = ptype->callbacks.gro_complete(skb, nhoff +
- sizeof(struct ethhdr));
+ err = INDIRECT_CALL_INET(ptype->callbacks.gro_complete,
+ ipv6_gro_complete, inet_gro_complete,
+ skb, nhoff + sizeof(*eh));
rcu_read_unlock();
return err;
--
2.31.0
Hello,
On Thu, 2021-03-18 at 18:42 +0000, Alexander Lobakin wrote:
> call_gro_receive() is used to limit GRO recursion, but it works only
> with callback pointers.
> There's a combined version of call_gro_receive() + INDIRECT_CALL_2()
> in <net/inet_common.h>, but it doesn't check for IPv6 modularity.
AFAICS, ip6_offload is builtin even when IPv6 is a module, so the above
should not be needed.
Cheers,
Paolo
From: Paolo Abeni <[email protected]>
Date: Fri, 19 Mar 2021 11:53:42 +0100
> Hello,
Hi!
> On Thu, 2021-03-18 at 18:42 +0000, Alexander Lobakin wrote:
> > call_gro_receive() is used to limit GRO recursion, but it works only
> > with callback pointers.
> > There's a combined version of call_gro_receive() + INDIRECT_CALL_2()
> > in <net/inet_common.h>, but it doesn't check for IPv6 modularity.
>
> AFAICS, ip6_offload is builtin even when IPv6 is a module, so the above
> should not be needed.
Aww, you are right. I overlooked that since dev_gro_receive() still
use INDIRECT_CALL_INET(), though all GRO callbacks were made
built-in.
Seems like more code can be optimized, thanks!
> Cheers,
>
> Paolo
Al
From: Alexander Lobakin <[email protected]>
Date: Fri, 19 Mar 2021 11:13:25 +0000
> From: Paolo Abeni <[email protected]>
> Date: Fri, 19 Mar 2021 11:53:42 +0100
>
> > Hello,
>
> Hi!
>
> > On Thu, 2021-03-18 at 18:42 +0000, Alexander Lobakin wrote:
> > > call_gro_receive() is used to limit GRO recursion, but it works only
> > > with callback pointers.
> > > There's a combined version of call_gro_receive() + INDIRECT_CALL_2()
> > > in <net/inet_common.h>, but it doesn't check for IPv6 modularity.
> >
> > AFAICS, ip6_offload is builtin even when IPv6 is a module, so the above
> > should not be needed.
>
> Aww, you are right. I overlooked that since dev_gro_receive() still
> use INDIRECT_CALL_INET(), though all GRO callbacks were made
> built-in.
I'm not sure if you did it on purpose in commit aaa5d90b395a7
("net: use indirect call wrappers at GRO network layer").
Was that intentional for the sake of more optimized path for the
kernels with moduled IPv6, or I can replace INDIRECT_CALL_INET()
with INDIRECT_CALL_2() here too? I want to keep GRO callbacks that
make use of indirect call wrappers unified.
> Seems like more code can be optimized, thanks!
>
> > Cheers,
> >
> > Paolo
>
> Al
Thanks,
Al
On Fri, 2021-03-19 at 11:43 +0000, Alexander Lobakin wrote:
> I'm not sure if you did it on purpose in commit aaa5d90b395a7
> ("net: use indirect call wrappers at GRO network layer").
> Was that intentional
I must admit that 2y+ later my own intentions are not so clear to me
too;)
> for the sake of more optimized path for the
> kernels with moduled IPv6,
Uhm... no I guess that was more an underlook on my side.
> or I can replace INDIRECT_CALL_INET()
> with INDIRECT_CALL_2() here too?
If that build with IPV6=nmy, I would say yes.
> I want to keep GRO callbacks that
> make use of indirect call wrappers unified.
L4 will still need some special handling as ipv6 udp gro callbacks are
not builtin with CONFIG_IPV6=m :(
Cheers,
Paolo
From: Paolo Abeni <[email protected]>
Date: Fri, 19 Mar 2021 13:35:41 +0100
> On Fri, 2021-03-19 at 11:43 +0000, Alexander Lobakin wrote:
> > I'm not sure if you did it on purpose in commit aaa5d90b395a7
> > ("net: use indirect call wrappers at GRO network layer").
> > Was that intentional
>
> I must admit that 2y+ later my own intentions are not so clear to me
> too;)
Heh, know that feel (=
> > for the sake of more optimized path for the
> > kernels with moduled IPv6,
>
> Uhm... no I guess that was more an underlook on my side.
>
> > or I can replace INDIRECT_CALL_INET()
> > with INDIRECT_CALL_2() here too?
>
> If that build with IPV6=nmy, I would say yes.
I think you used INDIRECT_CALL_INET() to protect from CONFIG_INET=n.
But this also hurts with retpoline when CONFIG_IPV6=m. Not so common
case, but still.
Plain INDIRECT_CALL_2() won't build without CONFIG_INET, so we either
introduce a new one (e.g. _INET_2() similarly to _INET_1()), or leave
it as it is for now (Dave's already picked this series to net-next).
> > I want to keep GRO callbacks that
> > make use of indirect call wrappers unified.
>
> L4 will still need some special handling as ipv6 udp gro callbacks are
> not builtin with CONFIG_IPV6=m :(
Yep, I remember. I meant {inet,ipv6}_gro_{complete,receive}()
callers, but didn't mention that for some reason.
> Cheers,
>
> Paolo
Thanks,
Al