2023-07-20 16:34:19

by Richard Gobert

[permalink] [raw]
Subject: [PATCH v2 0/1] net: gro: fix misuse of CB in udp socket lookup

GRO stack uses `udp_lib_lookup_skb` which relies on IP/IPv6 CB's info, and
at the GRO stage, CB holds `napi_gro_cb` info. Specifically,
`udp_lib_lookup_skb` tries to fetch `iff` and `flags` information from the
CB, to find the relevant udp tunnel socket (GENEVE/VXLAN/..). Up until a
patch I submitted recently [0], it worked merely by luck, due
to the layouts of `napi_gro_cb` and IP6CB.

AFAIU it worked because:
`IP6CB(skb)->flags` is at offset 16 inside IP6CB:
- Before the patch: `flags` was mapped to `flush`.
- After the patch: `flags` was mapped to `data_offset`.

`IP6CB(skb)->iff` is at offset 0 inside IP6CB:
- Before the patch: `iif` was mapped to `frag0`.
- After the patch: `iif` was mapped to a union of `frag0` and `last`.

After my patch, on the receive phase, while `data_offset` is 40 (since IPv6
header is 40 bytes), `inet_iif` calls `ipv6_l3mdev_skb`, which checks
whether `IP6CB(skb)->flags`'s `IP6SKB_L3SLAVE` bit is on or off (in our
case its off). If it is off, `inet_iif` returns `IP6CB(skb)->iif`, which is
mapped to `napi_gro_cb->frag0`, making `inet_iif` return 0 most of the
times. `inet_sdif` returns zero due to a similar reason caused by
`data_offset` being equal to 40 (and less than 64).

On the other hand, the complete phase behaves differently.
`data_offset` is usually greater than 64 and less than 128 so the
`IP6SKB_L3SLAVE` flag is on. Thus, `inet_sdif` returns `IP6CB(skb)->iif`,
which is mapped to `last` which contains a pointer. This causes
`udp_sk_bound_dev_eq` to fail, which leads to `udp6_lib_lookup2` failing
and not returning a socket. This leads the receive phase of GRO
to find the right socket, and on the complete phase, it fails to find it
and makes the throughput go down to nearly zero.

Before [0] `flags` was mapped to `flush`. `flush`'s possible
values were 1 and 0, making `inet6_iff` always returning `skb->skb_iif` and
`inet6_sdif` returning 0, and leading to `udp_sk_bound_dev_eq` returning
true.

A fix is to not rely on CB, and get `iff` and `sdif` using skb->dev. l3mdev
case requires special attention since it has a master and a slave device.

[0] https://lore.kernel.org/netdev/20230601160924.GA9194@debian/

Changelog:

v1 -> v2:
* make functions inline
* fix logical bug
* add a comment when we can use the new functions
* checkpatch fixes

Richard Gobert (1):
net: gro: fix misuse of CB in udp socket lookup

include/net/udp.h | 2 ++
net/ipv4/udp.c | 28 ++++++++++++++++++++++++++--
net/ipv4/udp_offload.c | 7 +++++--
net/ipv6/udp.c | 29 +++++++++++++++++++++++++++--
net/ipv6/udp_offload.c | 7 +++++--
5 files changed, 65 insertions(+), 8 deletions(-)

--
2.36.1



2023-07-20 16:35:04

by Richard Gobert

[permalink] [raw]
Subject: [PATCH v2 1/1] net: gro: fix misuse of CB in udp socket lookup

This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.

Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
Reported-by: Gal Pressman <[email protected]>
Signed-off-by: Richard Gobert <[email protected]>
---
include/net/udp.h | 2 ++
net/ipv4/udp.c | 28 ++++++++++++++++++++++++++--
net/ipv4/udp_offload.c | 7 +++++--
net/ipv6/udp.c | 29 +++++++++++++++++++++++++++--
net/ipv6/udp_offload.c | 7 +++++--
5 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 4d13424f8f72..48af1479882f 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -299,6 +299,7 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
int udp_lib_setsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, unsigned int optlen,
int (*push_pending_frames)(struct sock *));
+void udp4_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif);
struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
__be32 daddr, __be16 dport, int dif);
struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
@@ -310,6 +311,7 @@ struct sock *udp6_lib_lookup(struct net *net,
const struct in6_addr *saddr, __be16 sport,
const struct in6_addr *daddr, __be16 dport,
int dif);
+void udp6_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif);
struct sock *__udp6_lib_lookup(struct net *net,
const struct in6_addr *saddr, __be16 sport,
const struct in6_addr *daddr, __be16 dport,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8c3ebd95f5b9..85eb9977db2c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -550,15 +550,39 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
inet_sdif(skb), udptable, skb);
}

+/* This function is the alternative to 'inet_iif' and 'inet_sdif'
+ * functions in case we can not rely on fields of IPCB.
+ *
+ * The caller must verify skb_valid_dst(skb) is false and skb->dev is initialized.
+ * The caller must hold the RCU read lock.
+ */
+inline void udp4_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif)
+{
+ *iif = inet_iif(skb) ?: skb->dev->ifindex;
+ *sdif = 0;
+
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+ if (netif_is_l3_slave(skb->dev)) {
+ struct net_device *master = netdev_master_upper_dev_get_rcu(skb->dev);
+
+ *sdif = *iif;
+ *iif = master ? master->ifindex : 0;
+ }
+#endif
+}
+
struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
__be16 sport, __be16 dport)
{
const struct iphdr *iph = ip_hdr(skb);
struct net *net = dev_net(skb->dev);
+ int iif, sdif;
+
+ udp4_get_iif_sdif(skb, &iif, &sdif);

return __udp4_lib_lookup(net, iph->saddr, sport,
- iph->daddr, dport, inet_iif(skb),
- inet_sdif(skb), net->ipv4.udp_table, NULL);
+ iph->daddr, dport, iif,
+ sdif, net->ipv4.udp_table, NULL);
}

/* Must be called under rcu_read_lock().
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 75aa4de5b731..70d760b271db 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -603,10 +603,13 @@ static struct sock *udp4_gro_lookup_skb(struct sk_buff *skb, __be16 sport,
{
const struct iphdr *iph = skb_gro_network_header(skb);
struct net *net = dev_net(skb->dev);
+ int iif, sdif;
+
+ udp4_get_iif_sdif(skb, &iif, &sdif);

return __udp4_lib_lookup(net, iph->saddr, sport,
- iph->daddr, dport, inet_iif(skb),
- inet_sdif(skb), net->ipv4.udp_table, NULL);
+ iph->daddr, dport, iif,
+ sdif, net->ipv4.udp_table, NULL);
}

INDIRECT_CALLABLE_SCOPE
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b7c972aa09a7..6dbcadc3bf8f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -295,15 +295,40 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
inet6_sdif(skb), udptable, skb);
}

+/* This function is the alternative to 'inet6_iif' and 'inet6_sdif'
+ * functions in case we can not rely on fields of IP6CB.
+ *
+ * The caller must verify skb_valid_dst(skb) is false and skb->dev is initialized.
+ * The caller must hold the RCU read lock.
+ */
+inline void udp6_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif)
+{
+ /* using skb->dev->ifindex because skb_dst(skb) is not initialized */
+ *iif = skb->dev->ifindex;
+ *sdif = 0;
+
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+ if (netif_is_l3_slave(skb->dev)) {
+ struct net_device *master = netdev_master_upper_dev_get_rcu(skb->dev);
+
+ *sdif = *iif;
+ *iif = master ? master->ifindex : 0;
+ }
+#endif
+}
+
struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
__be16 sport, __be16 dport)
{
const struct ipv6hdr *iph = ipv6_hdr(skb);
struct net *net = dev_net(skb->dev);
+ int iif, sdif;
+
+ udp6_get_iif_sdif(skb, &iif, &sdif);

return __udp6_lib_lookup(net, &iph->saddr, sport,
- &iph->daddr, dport, inet6_iif(skb),
- inet6_sdif(skb), net->ipv4.udp_table, NULL);
+ &iph->daddr, dport, iif,
+ sdif, net->ipv4.udp_table, NULL);
}

/* Must be called under rcu_read_lock().
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index ad3b8726873e..88191d79002e 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -119,10 +119,13 @@ static struct sock *udp6_gro_lookup_skb(struct sk_buff *skb, __be16 sport,
{
const struct ipv6hdr *iph = skb_gro_network_header(skb);
struct net *net = dev_net(skb->dev);
+ int iif, sdif;
+
+ udp6_get_iif_sdif(skb, &iif, &sdif);

return __udp6_lib_lookup(net, &iph->saddr, sport,
- &iph->daddr, dport, inet6_iif(skb),
- inet6_sdif(skb), net->ipv4.udp_table, NULL);
+ &iph->daddr, dport, iif,
+ sdif, net->ipv4.udp_table, NULL);
}

INDIRECT_CALLABLE_SCOPE
--
2.36.1


2023-07-21 10:15:43

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 1/1] net: gro: fix misuse of CB in udp socket lookup

From: Richard Gobert
> Sent: 20 July 2023 17:26
>
> This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
> `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
> device from CB. The fix changes it to fetch the device from `skb->dev`.
> l3mdev case requires special attention since it has a master and a slave
> device.
>
...
> +/* This function is the alternative to 'inet_iif' and 'inet_sdif'
> + * functions in case we can not rely on fields of IPCB.
> + *
> + * The caller must verify skb_valid_dst(skb) is false and skb->dev is initialized.
> + * The caller must hold the RCU read lock.
> + */
> +inline void udp4_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif)
> +{
> + *iif = inet_iif(skb) ?: skb->dev->ifindex;
> + *sdif = 0;
> +
> +#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
> + if (netif_is_l3_slave(skb->dev)) {
> + struct net_device *master = netdev_master_upper_dev_get_rcu(skb->dev);
> +
> + *sdif = *iif;
> + *iif = master ? master->ifindex : 0;
> + }
> +#endif
> +}

You need to make that a 'static inline' in the .h file.
Otherwise the code generated will be horrid.

It would be much better to use the return value - say for 'iif'
then have:
{
iif = inet_iif(skb) ?: skb->dev->ifindex;

if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
if (netif_is_l3_slave(skb->dev)) {
struct net_device *master = netdev_master_upper_dev_get_rcu(skb->dev);

*sdif = iif;
return master ? master->ifindex : 0;
}
#endif
*sdif = 0;
return iif;
}

The compiler might generate that is inlined, not inlined
it is definitely better.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2023-07-23 12:25:25

by Gal Pressman

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] net: gro: fix misuse of CB in udp socket lookup

On 20/07/2023 19:26, Richard Gobert wrote:
> This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
> `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
> device from CB. The fix changes it to fetch the device from `skb->dev`.
> l3mdev case requires special attention since it has a master and a slave
> device.
>
> Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
> Reported-by: Gal Pressman <[email protected]>
> Signed-off-by: Richard Gobert <[email protected]>

Thanks Richard!
Tested-by: Nimrod Oren <[email protected]>

2023-07-25 07:31:24

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] net: gro: fix misuse of CB in udp socket lookup

Hi Richard,

On Thu, 2023-07-20 at 18:26 +0200, Richard Gobert wrote:
> This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
> `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
> device from CB. The fix changes it to fetch the device from `skb->dev`.
> l3mdev case requires special attention since it has a master and a slave
> device.
>
> Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
> Reported-by: Gal Pressman <[email protected]>
> Signed-off-by: Richard Gobert <[email protected]>
> ---
> include/net/udp.h | 2 ++
> net/ipv4/udp.c | 28 ++++++++++++++++++++++++++--
> net/ipv4/udp_offload.c | 7 +++++--
> net/ipv6/udp.c | 29 +++++++++++++++++++++++++++--
> net/ipv6/udp_offload.c | 7 +++++--
> 5 files changed, 65 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 4d13424f8f72..48af1479882f 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -299,6 +299,7 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
> int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> sockptr_t optval, unsigned int optlen,
> int (*push_pending_frames)(struct sock *));
> +void udp4_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif);
> struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
> __be32 daddr, __be16 dport, int dif);
> struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
> @@ -310,6 +311,7 @@ struct sock *udp6_lib_lookup(struct net *net,
> const struct in6_addr *saddr, __be16 sport,
> const struct in6_addr *daddr, __be16 dport,
> int dif);
> +void udp6_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif);
> struct sock *__udp6_lib_lookup(struct net *net,
> const struct in6_addr *saddr, __be16 sport,
> const struct in6_addr *daddr, __be16 dport,
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 8c3ebd95f5b9..85eb9977db2c 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -550,15 +550,39 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
> inet_sdif(skb), udptable, skb);
> }
>
> +/* This function is the alternative to 'inet_iif' and 'inet_sdif'
> + * functions in case we can not rely on fields of IPCB.
> + *
> + * The caller must verify skb_valid_dst(skb) is false and skb->dev is initialized.
> + * The caller must hold the RCU read lock.
> + */
> +inline void udp4_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif)

I think you misread David Ahern's suggestion on v1. The idea would be
to move this function (and udp6_get_iif_sdif) in an header file, as
'static inline'[1]. Additionally there is nothing specific about UDP
here so I would rename them inet{,6}_gro_iif_sdif and place them in
include/net/gro.h.

Otherwise LGTM.

Thanks,

Paolo

[1] the usage of the "inline" keyword is basically allowed only in
header files