From: Peilin Ye <[email protected]>
Hi all,
As pointed out [1] by Jakub Kicinski, currently using TUNNEL_SEQ in
collect_md mode is racy for [IP6]GRE[TAP] devices, since they (typically,
e.g. if created using "ip") use lockless TX.
Patch [3/3] fixes it by making o_seqno atomic_t.
As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless
xmit"), making o_seqno atomic_t increases "chance for packets being out
of order at receiver" when using lockless TX.
Another way to fix it would be: users must specify "external" and "oseq"
at the same time if they want the kernel to allow using TUNNEL_SEQ (e.g.
via eBPF) in collect_md mode, but that would break userspace.
I found another issue while reading the code: patches [1,2/3] make o_seqno
start from 0 in native mode, as described in RFC 2890 [2] section 2.2.:
"The first datagram is sent with a sequence number of 0."
Now we could make [IP6]GRE[TAP] (and probably [IP6]ERSPAN ?) devices
completely NETIF_F_LLTX, but that's out of scope of this fix and will be
sent as separate [net-next] patches.
[1] https://lore.kernel.org/netdev/[email protected]/
[2] https://datatracker.ietf.org/doc/html/rfc2890#section-2.2
Thanks,
Peilin Ye (3):
ip_gre: Make o_seqno start from 0 in native mode
ip6_gre: Make o_seqno start from 0 in native mode
ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode
include/net/ip6_tunnel.h | 2 +-
include/net/ip_tunnels.h | 2 +-
net/ipv4/ip_gre.c | 12 +++++-------
net/ipv6/ip6_gre.c | 16 ++++++++--------
4 files changed, 15 insertions(+), 17 deletions(-)
--
2.20.1
From: Peilin Ye <[email protected]>
For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
native mode. According to RFC 2890 2.2., "The first datagram is sent
with a sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
where tunnel->o_seqno is passed to gre_build_header() before getting
incremented.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Peilin Ye <[email protected]>
---
net/ipv6/ip6_gre.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 976236736146..d9e4ac94eab4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -724,6 +724,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
{
struct ip6_tnl *tunnel = netdev_priv(dev);
__be16 protocol;
+ __be16 flags;
if (dev->type == ARPHRD_ETHER)
IPCB(skb)->flags = 0;
@@ -739,7 +740,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
if (tunnel->parms.collect_md) {
struct ip_tunnel_info *tun_info;
const struct ip_tunnel_key *key;
- __be16 flags;
int tun_hlen;
tun_info = skb_tunnel_info_txcheck(skb);
@@ -770,15 +770,14 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
: 0);
} else {
- if (tunnel->parms.o_flags & TUNNEL_SEQ)
- tunnel->o_seqno++;
-
if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
return -ENOMEM;
- gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags,
+ flags = tunnel->parms.o_flags;
+
+ gre_build_header(skb, tunnel->tun_hlen, flags,
protocol, tunnel->parms.o_key,
- htonl(tunnel->o_seqno));
+ (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0);
}
return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu,
--
2.20.1
On Thu, Apr 21, 2022 at 3:08 PM Peilin Ye <[email protected]> wrote:
>
> From: Peilin Ye <[email protected]>
>
> For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
> native mode. According to RFC 2890 2.2., "The first datagram is sent
> with a sequence number of 0." Fix it.
>
> It is worth mentioning that o_seqno already starts from 0 in collect_md
> mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
> where tunnel->o_seqno is passed to gre_build_header() before getting
> incremented.
>
> Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
> Signed-off-by: Peilin Ye <[email protected]>
LGTM
Acked-by: William Tu <[email protected]>
From: Peilin Ye <[email protected]>
For GRE and GRETAP devices, currently o_seqno starts from 1 in native
mode. According to RFC 2890 2.2., "The first datagram is sent with a
sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to
gre_build_header() before getting incremented.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peilin Ye <[email protected]>
---
net/ipv4/ip_gre.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 99db2e41ed10..ca70b92e80d9 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -459,14 +459,12 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
__be16 proto)
{
struct ip_tunnel *tunnel = netdev_priv(dev);
-
- if (tunnel->parms.o_flags & TUNNEL_SEQ)
- tunnel->o_seqno++;
+ __be16 flags = tunnel->parms.o_flags;
/* Push GRE header. */
gre_build_header(skb, tunnel->tun_hlen,
- tunnel->parms.o_flags, proto, tunnel->parms.o_key,
- htonl(tunnel->o_seqno));
+ flags, proto, tunnel->parms.o_key,
+ (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0);
ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
}
--
2.20.1
From: Peilin Ye <[email protected]>
As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the
following sequence of events:
1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
add ... external". "ip" ignores "[o]seq" if "external" is specified,
so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
it uses lockless TX);
2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
gre_build_header(skb, tun_hlen,
flags, protocol,
tunnel_id_to_key32(tun_info->key.tun_id),
(flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
: 0); ^^^^^^^^^^^^^^^^^
Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by
making o_seqno atomic_t.
As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless
xmit"), making o_seqno atomic_t increases "chance for packets being out
of order at receiver" when NETIF_F_LLTX is on.
Maybe a better fix would be:
1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if
they want the kernel to allow sequencing of outgoing packets;
2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
with "oseq".
Unfortunately, that would break userspace.
We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
do it in separate patches to keep this fix minimal.
Suggested-by: Jakub Kicinski <[email protected]>
Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.")
Signed-off-by: Peilin Ye <[email protected]>
---
include/net/ip6_tunnel.h | 2 +-
include/net/ip_tunnels.h | 2 +-
net/ipv4/ip_gre.c | 6 +++---
net/ipv6/ip6_gre.c | 7 ++++---
4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index a38c4f1e4e5c..74b369bddf49 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -58,7 +58,7 @@ struct ip6_tnl {
/* These fields used only by GRE */
__u32 i_seqno; /* The last seen seqno */
- __u32 o_seqno; /* The last output seqno */
+ atomic_t o_seqno; /* The last output seqno */
int hlen; /* tun_hlen + encap_hlen */
int tun_hlen; /* Precalculated header length */
int encap_hlen; /* Encap header length (FOU,GUE) */
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 0219fe907b26..3ec6146f8734 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -116,7 +116,7 @@ struct ip_tunnel {
/* These four fields used only by GRE */
u32 i_seqno; /* The last seen seqno */
- u32 o_seqno; /* The last output seqno */
+ atomic_t o_seqno; /* The last output seqno */
int tun_hlen; /* Precalculated header length */
/* These four fields used only by ERSPAN */
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index ca70b92e80d9..8cf86e42c1d1 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -464,7 +464,7 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
/* Push GRE header. */
gre_build_header(skb, tunnel->tun_hlen,
flags, proto, tunnel->parms.o_key,
- (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0);
+ (flags & TUNNEL_SEQ) ? htonl(atomic_fetch_inc(&tunnel->o_seqno)) : 0);
ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
}
@@ -502,7 +502,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
(TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);
gre_build_header(skb, tunnel_hlen, flags, proto,
tunnel_id_to_key32(tun_info->key.tun_id),
- (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0);
+ (flags & TUNNEL_SEQ) ? htonl(atomic_fetch_inc(&tunnel->o_seqno)) : 0);
ip_md_tunnel_xmit(skb, dev, IPPROTO_GRE, tunnel_hlen);
@@ -579,7 +579,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev)
}
gre_build_header(skb, 8, TUNNEL_SEQ,
- proto, 0, htonl(tunnel->o_seqno++));
+ proto, 0, htonl(atomic_fetch_inc(&tunnel->o_seqno)));
ip_md_tunnel_xmit(skb, dev, IPPROTO_GRE, tunnel_hlen);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index d9e4ac94eab4..5136959b3dc5 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -766,7 +766,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
gre_build_header(skb, tun_hlen,
flags, protocol,
tunnel_id_to_key32(tun_info->key.tun_id),
- (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
+ (flags & TUNNEL_SEQ) ? htonl(atomic_fetch_inc(&tunnel->o_seqno))
: 0);
} else {
@@ -777,7 +777,8 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
gre_build_header(skb, tunnel->tun_hlen, flags,
protocol, tunnel->parms.o_key,
- (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0);
+ (flags & TUNNEL_SEQ) ? htonl(atomic_fetch_inc(&tunnel->o_seqno))
+ : 0);
}
return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu,
@@ -1055,7 +1056,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
/* Push GRE header. */
proto = (t->parms.erspan_ver == 1) ? htons(ETH_P_ERSPAN)
: htons(ETH_P_ERSPAN2);
- gre_build_header(skb, 8, TUNNEL_SEQ, proto, 0, htonl(t->o_seqno++));
+ gre_build_header(skb, 8, TUNNEL_SEQ, proto, 0, htonl(atomic_fetch_inc(&t->o_seqno)));
/* TooBig packet may have updated dst->dev's mtu */
if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
--
2.20.1
Hello:
This series was applied to netdev/net.git (master)
by David S. Miller <[email protected]>:
On Thu, 21 Apr 2022 15:06:39 -0700 you wrote:
> From: Peilin Ye <[email protected]>
>
> Hi all,
>
> As pointed out [1] by Jakub Kicinski, currently using TUNNEL_SEQ in
> collect_md mode is racy for [IP6]GRE[TAP] devices, since they (typically,
> e.g. if created using "ip") use lockless TX.
>
> [...]
Here is the summary with links:
- [net,1/3] ip_gre: Make o_seqno start from 0 in native mode
https://git.kernel.org/netdev/net/c/ff827beb706e
- [net,2/3] ip6_gre: Make o_seqno start from 0 in native mode
https://git.kernel.org/netdev/net/c/fde98ae91f79
- [net,3/3] ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode
https://git.kernel.org/netdev/net/c/31c417c948d7
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html