LinuxLists.cc - [PATCH net v2 0/3] net: gro: add flush/flush

2024-04-19 15:36:54

Subject: [PATCH net v2 0/3] net: gro: add flush/flush_id checks and fix wrong offset in udp

This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to add network_offset and inner_network_offset to
napi_gro_cb and use these offsets for socket lookup.

In addition p->flush/flush_id should be checked in all UDP flows. The
same logic from tcp_gro_receive is applied for all the flows in
udp_gro_receive_segment. This prevents packets with mismatching network
headers (flush/flush_id turned on) from merging in UDP GRO.

The original series includes a change to vxlan test which adds the local
parameter to prevent similar future bugs. I plan to submit it separately to
net-next.

This series is part of a previously submitted series to net-next:
https://lore.kernel.org/all/[email protected]/

v1 -> v2:
- Use network_offsets instead of p_poff param as suggested by Willem
- Check flush before postpull, and for all UDP GRO flows (previously
reviewed by Willem, and was changed since)
- v1:
https://lore.kernel.org/netdev/[email protected]/

Richard Gobert (3):
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: fix udp bad offset in socket lookup
net: gro: add flush check in udp_gro_receive_segment

drivers/net/geneve.c | 1 +
drivers/net/vxlan/vxlan_core.c | 1 +
include/net/gro.h | 18 ++++++++++++++++--
net/8021q/vlan_core.c | 2 ++
net/core/gro.c | 1 +
net/ethernet/eth.c | 1 +
net/ipv4/af_inet.c | 5 +----
net/ipv4/gre_offload.c | 1 +
net/ipv4/udp.c | 3 ++-
net/ipv4/udp_offload.c | 15 +++++++++++++--
net/ipv6/ip6_offload.c | 8 ++++----
net/ipv6/udp.c | 3 ++-
net/ipv6/udp_offload.c | 3 ++-
13 files changed, 47 insertions(+), 15 deletions(-)

--
2.36.1

2024-04-19 15:37:18

by Richard Gobert

[permalink] [raw]

Subject: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb

This patch adds network_offset and inner_network_offset to napi_gro_cb, and
makes sure both are set correctly. In the common path there's only one
write (skb_gro_reset_offset, which replaces skb_set_network_header).

Signed-off-by: Richard Gobert <[email protected]>
---
drivers/net/geneve.c | 1 +
drivers/net/vxlan/vxlan_core.c | 1 +
include/net/gro.h | 18 ++++++++++++++++--
net/8021q/vlan_core.c | 2 ++
net/core/gro.c | 1 +
net/ethernet/eth.c | 1 +
net/ipv4/af_inet.c | 5 +----
net/ipv4/gre_offload.c | 1 +
net/ipv6/ip6_offload.c | 8 ++++----
9 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 6c2835086b57..6549348cc24e 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -542,6 +542,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk,
if (!ptype)
goto out;

+ NAPI_GRO_CB(skb)->inner_network_offset = hlen;
pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
flush = 0;

diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index ba319fc21957..c649a82eeca7 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -754,6 +754,7 @@ static struct sk_buff *vxlan_gpe_gro_receive(struct sock *sk,

vh = vxlan_gro_prepare_receive(sk, head, skb, &grc);
if (vh) {
+ NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
if (!vxlan_parse_gpe_proto(vh, &protocol))
goto out;
ptype = gro_find_receive_by_type(protocol);
diff --git a/include/net/gro.h b/include/net/gro.h
index 50f1e403dbbb..1faff23b66e8 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -87,6 +87,15 @@ struct napi_gro_cb {

/* used to support CHECKSUM_COMPLETE for tunneling protocols */
__wsum csum;
+
+ /* L3 offsets */
+ union {
+ struct {
+ u16 network_offset;
+ u16 inner_network_offset;
+ };
+ u16 network_offsets[2];
+ };
};

#define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)
@@ -172,12 +181,17 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen,
return ptr;
}

+static inline int skb_gro_network_offset(const struct sk_buff *skb)
+{
+ return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
+}
+
static inline void *skb_gro_network_header(const struct sk_buff *skb)
{
if (skb_gro_may_pull(skb, skb_gro_offset(skb)))
- return skb_gro_header_fast(skb, skb_network_offset(skb));
+ return skb_gro_header_fast(skb, skb_gro_network_offset(skb));

- return skb_network_header(skb);
+ return skb->data + skb_gro_network_offset(skb);
}

static inline __wsum inet_gro_compute_pseudo(const struct sk_buff *skb,
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index f00158234505..9404dd551dfd 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -478,6 +478,8 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head,
if (unlikely(!vhdr))
goto out;

+ NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = hlen;
+
type = vhdr->h_vlan_encapsulated_proto;

ptype = gro_find_receive_by_type(type);
diff --git a/net/core/gro.c b/net/core/gro.c
index 83f35d99a682..c7901253a1a8 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -371,6 +371,7 @@ static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff)
const skb_frag_t *frag0;
unsigned int headlen;

+ NAPI_GRO_CB(skb)->network_offset = 0;
NAPI_GRO_CB(skb)->data_offset = 0;
headlen = skb_headlen(skb);
NAPI_GRO_CB(skb)->frag0 = skb->data;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 2edc8b796a4e..ea589e8cde2a 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -441,6 +441,7 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb)

skb_gro_pull(skb, sizeof(*eh));
skb_gro_postpull_rcsum(skb, eh, sizeof(*eh));
+ NAPI_GRO_CB(skb)->inner_network_offset = hlen;

pp = indirect_call_gro_receive_inet(ptype->callbacks.gro_receive,
ipv6_gro_receive, inet_gro_receive,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 55bd72997b31..7899cbd5b263 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1568,10 +1568,6 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)

NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
NAPI_GRO_CB(skb)->flush |= flush;
- skb_set_network_header(skb, off);
- /* The above will be needed by the transport layer if there is one
- * immediately following this IP hdr.
- */

/* Note : No need to call skb_gro_postpull_rcsum() here,
* as we already checked checksum over ipv4 header was 0
@@ -1597,6 +1593,7 @@ static struct sk_buff *ipip_gro_receive(struct list_head *head,
}

NAPI_GRO_CB(skb)->encap_mark = 1;
+ NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);

return inet_gro_receive(head, skb);
}
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 5028c72d494a..a1ff2bdf6206 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -224,6 +224,7 @@ static struct sk_buff *gre_gro_receive(struct list_head *head,
/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
skb_gro_postpull_rcsum(skb, greh, grehlen);

+ NAPI_GRO_CB(skb)->inner_network_offset = hlen;
pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
flush = 0;

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index b41e35af69ea..765797ca729c 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -67,7 +67,7 @@ static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto)
off += len;
}

- skb_gro_pull(skb, off - skb_network_offset(skb));
+ skb_gro_pull(skb, off - skb_gro_network_offset(skb));
return proto;
}

@@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
if (unlikely(!iph))
goto out;

- skb_set_network_header(skb, off);
-
flush += ntohs(iph->payload_len) != skb->len - hlen;

proto = iph->nexthdr;
@@ -259,7 +257,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
NAPI_GRO_CB(skb)->proto = proto;

flush--;
- nlen = skb_network_header_len(skb);
+ nlen = skb_gro_offset(skb) - off;

list_for_each_entry(p, head, list) {
const struct ipv6hdr *iph2;
@@ -327,6 +325,7 @@ static struct sk_buff *sit_ip6ip6_gro_receive(struct list_head *head,
}

NAPI_GRO_CB(skb)->encap_mark = 1;
+ NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);

return ipv6_gro_receive(head, skb);
}
@@ -342,6 +341,7 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
}

NAPI_GRO_CB(skb)->encap_mark = 1;
+ NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);

return inet_gro_receive(head, skb);
}
--
2.36.1

2024-04-19 15:37:32

by Richard Gobert

[permalink] [raw]

Subject: [PATCH net v2 2/3] net: gro: fix udp bad offset in socket lookup

Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
complete phase of gro. The functions always return skb->network_header,
which in the case of encapsulated packets at the gro complete phase, is
always set to the innermost L3 of the packet. That means that calling
{ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
L3/L4 may return an unexpected value.

This incorrect usage leads to a bug in GRO's UDP socket lookup.
udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
*_hdr functions return network_header which will point to the innermost L3,
resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
encapsulated packets.

To fix this issue, network_offsets union is used inside napi_gro_cb, in
which both the outer and the inner network offsets are saved.

Reproduction example:

Endpoint configuration example (fou + local address bind)

# ip fou add port 6666 ipproto 4
# ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip
# ip link set tun1 up
# ip a add 1.1.1.2/24 dev tun1

Netperf TCP_STREAM result on net-next before patch is applied:

net-next main, GRO enabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

131072 16384 16384 5.28 2.37

net-next main, GRO disabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

131072 16384 16384 5.01 2745.06

patch applied, GRO enabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

131072 16384 16384 5.01 2877.38

Fixes: 57c67ff4bd92 ("udp: additional GRO support")
Signed-off-by: Richard Gobert <[email protected]>
---
net/ipv4/udp.c | 3 ++-
net/ipv4/udp_offload.c | 3 ++-
net/ipv6/udp.c | 3 ++-
net/ipv6/udp_offload.c | 3 ++-
4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c02bf011d4a6..1399fce82b3f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -532,7 +532,8 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
__be16 sport, __be16 dport)
{
- const struct iphdr *iph = ip_hdr(skb);
+ const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+ const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
struct net *net = dev_net(skb->dev);
int iif, sdif;

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 3498dd1d0694..fd29d21d579c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -718,7 +718,8 @@ EXPORT_SYMBOL(udp_gro_complete);

INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
{
- const struct iphdr *iph = ip_hdr(skb);
+ const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+ const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);

/* do fraglist only if there is no outer UDP encap (or we already processed it) */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8b1dd7f51249..f7880e306410 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -272,7 +272,8 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
__be16 sport, __be16 dport)
{
- const struct ipv6hdr *iph = ipv6_hdr(skb);
+ const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+ const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset);
struct net *net = dev_net(skb->dev);
int iif, sdif;

diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index bbd347de00b4..b41152dd4246 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -164,7 +164,8 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)

INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
{
- const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+ const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+ const struct ipv6hdr *ipv6h = (struct ipv6hdr *)(skb->data + offset);
struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);

/* do fraglist only if there is no outer UDP encap (or we already processed it) */
--
2.36.1

2024-04-19 15:37:57

by Richard Gobert

[permalink] [raw]

Subject: [PATCH net v2 3/3] net: gro: add flush check in udp_gro_receive_segment

GRO-GSO path is supposed to be transparent and as such L3 flush checks are
relevant to all UDP flows merging in GRO. This patch uses the same logic
and code from tcp_gro_receive, terminating merge if flush is non zero.

Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Signed-off-by: Richard Gobert <[email protected]>
---
net/ipv4/udp_offload.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index fd29d21d579c..8721fe5beca2 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -471,6 +471,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
struct sk_buff *p;
unsigned int ulen;
int ret = 0;
+ int flush;

/* requires non zero csum, for symmetry with GSO */
if (!uh->check) {
@@ -504,13 +505,22 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
return p;
}

+ flush = NAPI_GRO_CB(p)->flush;
+
+ if (NAPI_GRO_CB(p)->flush_id != 1 ||
+ NAPI_GRO_CB(p)->count != 1 ||
+ !NAPI_GRO_CB(p)->is_atomic)
+ flush |= NAPI_GRO_CB(p)->flush_id;
+ else
+ NAPI_GRO_CB(p)->is_atomic = false;
+
/* Terminate the flow on len mismatch or if it grow "too much".
* Under small packet flood GRO count could elsewhere grow a lot
* leading to excessive truesize values.
* On len mismatch merge the first packet shorter than gso_size,
* otherwise complete the GRO packet.
*/
- if (ulen > ntohs(uh2->len)) {
+ if (ulen > ntohs(uh2->len) || flush) {
pp = p;
} else {
if (NAPI_GRO_CB(skb)->is_flist) {
--
2.36.1

2024-04-19 18:51:57

by Willem de Bruijn

[permalink] [raw]

Subject: Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb

Richard Gobert wrote:
> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> makes sure both are set correctly. In the common path there's only one
> write (skb_gro_reset_offset, which replaces skb_set_network_header).
>
> Signed-off-by: Richard Gobert <[email protected]>
> ---
> drivers/net/geneve.c | 1 +
> drivers/net/vxlan/vxlan_core.c | 1 +
> include/net/gro.h | 18 ++++++++++++++++--
> net/8021q/vlan_core.c | 2 ++
> net/core/gro.c | 1 +
> net/ethernet/eth.c | 1 +
> net/ipv4/af_inet.c | 5 +----
> net/ipv4/gre_offload.c | 1 +
> net/ipv6/ip6_offload.c | 8 ++++----
> 9 files changed, 28 insertions(+), 10 deletions(-)
>

> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> +{
> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> +}
> +

> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> if (unlikely(!iph))
> goto out;
>
> - skb_set_network_header(skb, off);
> -

Especially for net, this is still a large patch.

Can we avoid touching all those tunnel callbacks and just set the
offsets in inet_gro_receive and ipv6_gro_receive themselves, just
as skb_set_network_header now:

@@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
if (unlikely(!iph))
goto out;

- skb_set_network_header(skb, off);
+ NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;

2024-04-22 08:32:41

by Richard Gobert

[permalink] [raw]

Subject: Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb

Willem de Bruijn wrote:
> Richard Gobert wrote:
>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
>> makes sure both are set correctly. In the common path there's only one
>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
>>
>> Signed-off-by: Richard Gobert <[email protected]>
>> ---
>> drivers/net/geneve.c | 1 +
>> drivers/net/vxlan/vxlan_core.c | 1 +
>> include/net/gro.h | 18 ++++++++++++++++--
>> net/8021q/vlan_core.c | 2 ++
>> net/core/gro.c | 1 +
>> net/ethernet/eth.c | 1 +
>> net/ipv4/af_inet.c | 5 +----
>> net/ipv4/gre_offload.c | 1 +
>> net/ipv6/ip6_offload.c | 8 ++++----
>> 9 files changed, 28 insertions(+), 10 deletions(-)
>>
>
>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
>> +{
>> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
>> +}
>> +
>
>
>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>> if (unlikely(!iph))
>> goto out;
>>
>> - skb_set_network_header(skb, off);
>> -
>
> Especially for net, this is still a large patch.
>
> Can we avoid touching all those tunnel callbacks and just set the
> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> as skb_set_network_header now:
>
> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> if (unlikely(!iph))
> goto out;
>
> - skb_set_network_header(skb, off);
> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
>

Thanks for the reply!

Setting network_offset on dev_gro_receive and inner_network_offset only
in the tunnel callbacks is the best option IMO. I agree that
we want a small patch to net that solves the problem, although I
think always using ->encap_mark in the common path is not ideal.

We can avoid changing all the tunnel callbacks by always setting
inner_network_offset in {ipv6,inet}_gro_receive and initialize
network_offset to 0 in dev_gro_receive. It will result in a small
change, without using ->encap_mark.

What are your thoughts?

2024-04-22 16:56:37

by Willem de Bruijn

[permalink] [raw]

Subject: Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb

Richard Gobert wrote:
> Willem de Bruijn wrote:
> > Richard Gobert wrote:
> >> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> >> makes sure both are set correctly. In the common path there's only one
> >> write (skb_gro_reset_offset, which replaces skb_set_network_header).
> >>
> >> Signed-off-by: Richard Gobert <[email protected]>
> >> ---
> >> drivers/net/geneve.c | 1 +
> >> drivers/net/vxlan/vxlan_core.c | 1 +
> >> include/net/gro.h | 18 ++++++++++++++++--
> >> net/8021q/vlan_core.c | 2 ++
> >> net/core/gro.c | 1 +
> >> net/ethernet/eth.c | 1 +
> >> net/ipv4/af_inet.c | 5 +----
> >> net/ipv4/gre_offload.c | 1 +
> >> net/ipv6/ip6_offload.c | 8 ++++----
> >> 9 files changed, 28 insertions(+), 10 deletions(-)
> >>
> >
> >> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> >> +{
> >> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> >> +}
> >> +
> >
> >
> >> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >> if (unlikely(!iph))
> >> goto out;
> >>
> >> - skb_set_network_header(skb, off);
> >> -
> >
> > Especially for net, this is still a large patch.
> >
> > Can we avoid touching all those tunnel callbacks and just set the
> > offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> > as skb_set_network_header now:
> >
> > @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> > if (unlikely(!iph))
> > goto out;
> >
> > - skb_set_network_header(skb, off);
> > + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> >
>
> Thanks for the reply!
>
> Setting network_offset on dev_gro_receive and inner_network_offset only
> in the tunnel callbacks is the best option IMO. I agree that
> we want a small patch to net that solves the problem, although I
> think always using ->encap_mark in the common path is not ideal.
>
> We can avoid changing all the tunnel callbacks by always setting
> inner_network_offset in {ipv6,inet}_gro_receive and initialize
> network_offset to 0 in dev_gro_receive. It will result in a small
> change, without using ->encap_mark.
>
> What are your thoughts?

That works. It's a bit ugly that inner_network_offset will always be
set, even if a packet only traverses inet_gro_receive once. What is
your concern with testing encap_mark?

How do you want to detect in udp[46]_lib_lookup_skb which of the two
offsets to use? That would still be encap_mark based?

2024-04-23 09:33:10

by Richard Gobert

[permalink] [raw]

Subject: Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb

Willem de Bruijn wrote:
> Richard Gobert wrote:
>> Willem de Bruijn wrote:
>>> Richard Gobert wrote:
>>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
>>>> makes sure both are set correctly. In the common path there's only one
>>>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
>>>>
>>>> Signed-off-by: Richard Gobert <[email protected]>
>>>> ---
>>>> drivers/net/geneve.c | 1 +
>>>> drivers/net/vxlan/vxlan_core.c | 1 +
>>>> include/net/gro.h | 18 ++++++++++++++++--
>>>> net/8021q/vlan_core.c | 2 ++
>>>> net/core/gro.c | 1 +
>>>> net/ethernet/eth.c | 1 +
>>>> net/ipv4/af_inet.c | 5 +----
>>>> net/ipv4/gre_offload.c | 1 +
>>>> net/ipv6/ip6_offload.c | 8 ++++----
>>>> 9 files changed, 28 insertions(+), 10 deletions(-)
>>>>
>>>
>>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
>>>> +{
>>>> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
>>>> +}
>>>> +
>>>
>>>
>>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>>>> if (unlikely(!iph))
>>>> goto out;
>>>>
>>>> - skb_set_network_header(skb, off);
>>>> -
>>>
>>> Especially for net, this is still a large patch.
>>>
>>> Can we avoid touching all those tunnel callbacks and just set the
>>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
>>> as skb_set_network_header now:
>>>
>>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>>> if (unlikely(!iph))
>>> goto out;
>>>
>>> - skb_set_network_header(skb, off);
>>> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
>>>
>>
>> Thanks for the reply!
>>
>> Setting network_offset on dev_gro_receive and inner_network_offset only
>> in the tunnel callbacks is the best option IMO. I agree that
>> we want a small patch to net that solves the problem, although I
>> think always using ->encap_mark in the common path is not ideal.
>>
>> We can avoid changing all the tunnel callbacks by always setting
>> inner_network_offset in {ipv6,inet}_gro_receive and initialize
>> network_offset to 0 in dev_gro_receive. It will result in a small
>> change, without using ->encap_mark.
>>
>> What are your thoughts?
>
> That works. It's a bit ugly that inner_network_offset will always be
> set, even if a packet only traverses inet_gro_receive once. What is
> your concern with testing encap_mark?
>
> How do you want to detect in udp[46]_lib_lookup_skb which of the two
> offsets to use? That would still be encap_mark based?
>

I'd like to minimize any potential overhead, even a small one, and this way
we do not need to access encap_mark at all in the common path.

NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;

compiles to:

movzx eax, byte ptr [rbx+46h]
shr al, 1
and eax, 1
mov [rbx+rax*2+4Ch], r14w

while

NAPI_GRO_CB(skb)->inner_network_offset = off;

compiles to:

mov [rbx+4Eh], r14w

I do plan to add a patch to net-next after this to remove the access
entirely from inet gro callbacks, in the meantime, it looks to me like a
reasonable patch and small enough to not raise concerns.

For udp_lib_lookup I don't see a way around it so yes, it would still be
dependent on encap_mark. Since this runs in the complete phase it's less
concerning.

Let me know that you're ok with that and I'll post a v3.

2024-04-23 13:55:38

by Willem de Bruijn

[permalink] [raw]

Subject: Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb

Richard Gobert wrote:
> Willem de Bruijn wrote:
> > Richard Gobert wrote:
> >> Willem de Bruijn wrote:
> >>> Richard Gobert wrote:
> >>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> >>>> makes sure both are set correctly. In the common path there's only one
> >>>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
> >>>>
> >>>> Signed-off-by: Richard Gobert <[email protected]>
> >>>> ---
> >>>> drivers/net/geneve.c | 1 +
> >>>> drivers/net/vxlan/vxlan_core.c | 1 +
> >>>> include/net/gro.h | 18 ++++++++++++++++--
> >>>> net/8021q/vlan_core.c | 2 ++
> >>>> net/core/gro.c | 1 +
> >>>> net/ethernet/eth.c | 1 +
> >>>> net/ipv4/af_inet.c | 5 +----
> >>>> net/ipv4/gre_offload.c | 1 +
> >>>> net/ipv6/ip6_offload.c | 8 ++++----
> >>>> 9 files changed, 28 insertions(+), 10 deletions(-)
> >>>>
> >>>
> >>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> >>>> +{
> >>>> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> >>>> +}
> >>>> +
> >>>
> >>>
> >>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>>> if (unlikely(!iph))
> >>>> goto out;
> >>>>
> >>>> - skb_set_network_header(skb, off);
> >>>> -
> >>>
> >>> Especially for net, this is still a large patch.
> >>>
> >>> Can we avoid touching all those tunnel callbacks and just set the
> >>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> >>> as skb_set_network_header now:
> >>>
> >>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>> if (unlikely(!iph))
> >>> goto out;
> >>>
> >>> - skb_set_network_header(skb, off);
> >>> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> >>>
> >>
> >> Thanks for the reply!
> >>
> >> Setting network_offset on dev_gro_receive and inner_network_offset only
> >> in the tunnel callbacks is the best option IMO. I agree that
> >> we want a small patch to net that solves the problem, although I
> >> think always using ->encap_mark in the common path is not ideal.
> >>
> >> We can avoid changing all the tunnel callbacks by always setting
> >> inner_network_offset in {ipv6,inet}_gro_receive and initialize
> >> network_offset to 0 in dev_gro_receive. It will result in a small
> >> change, without using ->encap_mark.
> >>
> >> What are your thoughts?
> >
> > That works. It's a bit ugly that inner_network_offset will always be
> > set, even if a packet only traverses inet_gro_receive once. What is
> > your concern with testing encap_mark?
> >
> > How do you want to detect in udp[46]_lib_lookup_skb which of the two
> > offsets to use? That would still be encap_mark based?
> >
>
> I'd like to minimize any potential overhead, even a small one, and this way
> we do not need to access encap_mark at all in the common path.
>
> NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
>
> compiles to:
>
> movzx eax, byte ptr [rbx+46h]
> shr al, 1
> and eax, 1
> mov [rbx+rax*2+4Ch], r14w
>
> while
>
> NAPI_GRO_CB(skb)->inner_network_offset = off;
>
> compiles to:
>
> mov [rbx+4Eh], r14w
>
> I do plan to add a patch to net-next after this to remove the access
> entirely from inet gro callbacks, in the meantime, it looks to me like a
> reasonable patch and small enough to not raise concerns.
>
> For udp_lib_lookup I don't see a way around it so yes, it would still be
> dependent on encap_mark. Since this runs in the complete phase it's less
> concerning.
>
> Let me know that you're ok with that and I'll post a v3.

Yes, looks fine.

Main cost is memory access, and that encap_mark will be
accessed soon after in udp4_lib_lookup.

I don't expect two arithmetic instructions to matter. But this code
does now have one more store: the one in dev_gro_receive.

Either way, in the noise. Both approaches look fine to me: very
concise and essentially equivalent. Choose your preferred option.