From: Peilin Ye <[email protected]>
Feng reported an skb_under_panic BUG triggered by running
test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
[ 82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
<...>
[ 82.607380] Call Trace:
[ 82.609389] <TASK>
[ 82.611136] skb_push.cold.109+0x10/0x10
[ 82.614289] __gre6_xmit+0x41e/0x590
[ 82.617169] ip6gre_tunnel_xmit+0x344/0x3f0
[ 82.620526] dev_hard_start_xmit+0xf1/0x330
[ 82.623882] sch_direct_xmit+0xe4/0x250
[ 82.626961] __dev_queue_xmit+0x720/0xfe0
<...>
[ 82.633431] packet_sendmsg+0x96a/0x1cb0
[ 82.636568] sock_sendmsg+0x30/0x40
<...>
Reproducer:
OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
ip netns add at_ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns at_ns0
ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip link set dev veth1 up mtu 1500
ip addr add dev veth1 172.16.1.200/24
ip netns exec at_ns0 ip addr add ::11/96 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip addr add dev veth1 ::22/96
ip link set dev veth1 up
ip netns exec at_ns0 \
ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
local ::11 remote ::22
ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
ip netns exec at_ns0 ip link set dev ip6gretap00 up
ip link add dev ip6gretap11 type ip6gretap external
ip addr add dev ip6gretap11 10.1.1.200/24
ip addr add dev ip6gretap11 fc80::200/24
ip link set dev ip6gretap11 up
tc qdisc add dev ip6gretap11 clsact
tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
ping6 -c 3 -w 10 -q ::11
The following sequence of events caused the BUG:
1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
calculated based on old flags (see ip6gre_calc_hlen());
2. packet_snd() reserves header room for skb A, assuming
tunnel->tun_hlen is 4;
3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
4. __gre6_xmit() detects the new tunnel key, and recalculates
"tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
TUNNEL_SEQ);
5. gre_build_header() calls skb_push() with insufficient reserved header
room, triggering the BUG.
As sugguested by Cong, fix it by moving the call to skb_cow_head() after
the recalculation of tun_hlen.
Reported-by: Feng Zhou <[email protected]>
Co-developed-by: Cong Wang <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Peilin Ye <[email protected]>
---
Hi all,
I couldn't find a proper Fixes: tag for this fix; please comment if you
have any sugguestions. Thanks!
Peilin Ye
net/ipv6/ip6_gre.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index b43a46449130..976236736146 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
else
fl6->daddr = tunnel->parms.raddr;
- if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
- return -ENOMEM;
-
/* Push GRE header. */
protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
@@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
(TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);
tun_hlen = gre_calc_hlen(flags);
+ if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen))
+ return -ENOMEM;
+
gre_build_header(skb, tun_hlen,
flags, protocol,
tunnel_id_to_key32(tun_info->key.tun_id),
@@ -773,6 +773,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
if (tunnel->parms.o_flags & TUNNEL_SEQ)
tunnel->o_seqno++;
+ if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
+ return -ENOMEM;
+
gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags,
protocol, tunnel->parms.o_key,
htonl(tunnel->o_seqno));
--
2.20.1
On Mon, 11 Apr 2022 15:33:00 -0700 Peilin Ye wrote:
> The following sequence of events caused the BUG:
>
> 1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
> calculated based on old flags (see ip6gre_calc_hlen());
> 2. packet_snd() reserves header room for skb A, assuming
> tunnel->tun_hlen is 4;
> 3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
> skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
> 4. __gre6_xmit() detects the new tunnel key, and recalculates
> "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
> TUNNEL_SEQ);
> 5. gre_build_header() calls skb_push() with insufficient reserved header
> room, triggering the BUG.
>
> As sugguested by Cong, fix it by moving the call to skb_cow_head() after
> the recalculation of tun_hlen.
>
> Reported-by: Feng Zhou <[email protected]>
> Co-developed-by: Cong Wang <[email protected]>
> Signed-off-by: Cong Wang <[email protected]>
> Signed-off-by: Peilin Ye <[email protected]>
> ---
> Hi all,
>
> I couldn't find a proper Fixes: tag for this fix; please comment if you
> have any sugguestions. Thanks!
What's wrong with
Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")
?
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index b43a46449130..976236736146 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> else
> fl6->daddr = tunnel->parms.raddr;
>
> - if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
> - return -ENOMEM;
> -
> /* Push GRE header. */
> protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
>
> @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);
We should also reject using SEQ with collect_md, but that's a separate
issue.
> tun_hlen = gre_calc_hlen(flags);
>
> + if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen))
> + return -ENOMEM;
> +
> gre_build_header(skb, tun_hlen,
> flags, protocol,
> tunnel_id_to_key32(tun_info->key.tun_id),
On Thu, 14 Apr 2022 13:08:54 -0700 Peilin Ye wrote:
> > We should also reject using SEQ with collect_md, but that's a separate
> > issue.
>
> Could you explain this a bit more? It seems that commit 77a5196a804e
> ("gre: add sequence number for collect md mode.") added this
> intentionally.
Interesting. Maybe a better way of dealing with the problem would be
rejecting SEQ if it's not set on the device itself.
When the device is set up without the SEQ bit enabled it disables Tx
locking (look for LLTX). This means that multiple CPUs can try to do
the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.
Hi Jakub,
On Thu, Apr 14, 2022 at 01:14:24PM +0200, Jakub Kicinski wrote:
> On Mon, 11 Apr 2022 15:33:00 -0700 Peilin Ye wrote:
> > I couldn't find a proper Fixes: tag for this fix; please comment if you
> > have any sugguestions. Thanks!
>
> What's wrong with
>
> Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")
>
> ?
Thanks! I will add this in v2 soon.
> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index b43a46449130..976236736146 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> > else
> > fl6->daddr = tunnel->parms.raddr;
> >
> > - if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
> > - return -ENOMEM;
> > -
> > /* Push GRE header. */
> > protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
> >
> > @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> > (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);
>
> We should also reject using SEQ with collect_md, but that's a separate
> issue.
Could you explain this a bit more? It seems that commit 77a5196a804e
("gre: add sequence number for collect md mode.") added this
intentionally.
Thanks,
Peilin Ye
On Fri, 15 Apr 2022 23:56:33 -0700 Peilin Ye wrote:
> On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote:
> > > Could you explain this a bit more? It seems that commit 77a5196a804e
> > > ("gre: add sequence number for collect md mode.") added this
> > > intentionally.
> >
> > Interesting. Maybe a better way of dealing with the problem would be
> > rejecting SEQ if it's not set on the device itself.
>
> According to ip-link(8), the 'external' option is mutually exclusive
> with the '[o]seq' option. In other words, a collect_md mode IP6GRETAP
> device should always have the TUNNEL_SEQ flag off in its
> 'tunnel->parms.o_flags'.
>
> (However, I just tried:
>
> $ ip link add dev ip6gretap11 type ip6gretap oseq external
> ^^^^ ^^^^^^^^
> ...and my 'ip' executed it with no error. I will take a closer look at
> iproute2 later; maybe it's undefined behavior...)
>
> How about:
>
> 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so
> it's okay to set TUNNEL_SEQ in e.g. eBPF";
>
> 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a
> TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then
> return -EINVAL.
Maybe pr_warn_once(), no need for a stacktrace.
> > When the device is set up without the SEQ bit enabled it disables Tx
> > locking (look for LLTX). This means that multiple CPUs can try to do
> > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.
>
> Thanks for the explanation! At first glance, I was wondering why don't
> we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre:
> lockless xmit"). I quote:
>
> """
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
> """
>
> I don't fully understand this out-of-order yet, but it seems that making
> 'o_seqno' atomic is not an option?
atomic_t would also work (if it has enough bits). Whatever is simplest
TBH. It's just about correctness, I don't think seq is widely used.
Thanks!
On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote:
> On Thu, 14 Apr 2022 13:08:54 -0700 Peilin Ye wrote:
> > > We should also reject using SEQ with collect_md, but that's a separate
> > > issue.
> >
> > Could you explain this a bit more? It seems that commit 77a5196a804e
> > ("gre: add sequence number for collect md mode.") added this
> > intentionally.
>
> Interesting. Maybe a better way of dealing with the problem would be
> rejecting SEQ if it's not set on the device itself.
According to ip-link(8), the 'external' option is mutually exclusive
with the '[o]seq' option. In other words, a collect_md mode IP6GRETAP
device should always have the TUNNEL_SEQ flag off in its
'tunnel->parms.o_flags'.
(However, I just tried:
$ ip link add dev ip6gretap11 type ip6gretap oseq external
^^^^ ^^^^^^^^
...and my 'ip' executed it with no error. I will take a closer look at
iproute2 later; maybe it's undefined behavior...)
How about:
1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so
it's okay to set TUNNEL_SEQ in e.g. eBPF";
2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a
TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then
return -EINVAL.
?
> When the device is set up without the SEQ bit enabled it disables Tx
> locking (look for LLTX). This means that multiple CPUs can try to do
> the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.
Thanks for the explanation! At first glance, I was wondering why don't
we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre:
lockless xmit"). I quote:
"""
Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.
"""
I don't fully understand this out-of-order yet, but it seems that making
'o_seqno' atomic is not an option?
Thanks,
Peilin Ye
On Sat, Apr 16, 2022 at 09:33:20AM +0200, Jakub Kicinski wrote:
> On Fri, 15 Apr 2022 23:56:33 -0700 Peilin Ye wrote:
> > On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote:
> > > > Could you explain this a bit more? It seems that commit 77a5196a804e
> > > > ("gre: add sequence number for collect md mode.") added this
> > > > intentionally.
> > >
> > > Interesting. Maybe a better way of dealing with the problem would be
> > > rejecting SEQ if it's not set on the device itself.
> >
> > According to ip-link(8), the 'external' option is mutually exclusive
> > with the '[o]seq' option. In other words, a collect_md mode IP6GRETAP
> > device should always have the TUNNEL_SEQ flag off in its
> > 'tunnel->parms.o_flags'.
> >
> > (However, I just tried:
> >
> > $ ip link add dev ip6gretap11 type ip6gretap oseq external
> > ^^^^ ^^^^^^^^
> > ...and my 'ip' executed it with no error. I will take a closer look at
> > iproute2 later; maybe it's undefined behavior...)
> >
> > How about:
> >
> > 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so
> > it's okay to set TUNNEL_SEQ in e.g. eBPF";
> >
> > 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a
> > TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then
> > return -EINVAL.
>
> Maybe pr_warn_once(), no need for a stacktrace.
Ah, thanks, coffee needed...
> > > When the device is set up without the SEQ bit enabled it disables Tx
> > > locking (look for LLTX). This means that multiple CPUs can try to do
> > > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.
> >
> > Thanks for the explanation! At first glance, I was wondering why don't
> > we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre:
> > lockless xmit"). I quote:
> >
> > """
> > Even using an atomic_t o_seq, we would increase chance for packets being
> > out of order at receiver.
> > """
> >
> > I don't fully understand this out-of-order yet, but it seems that making
> > 'o_seqno' atomic is not an option?
>
> atomic_t would also work (if it has enough bits). Whatever is simplest
> TBH. It's just about correctness, I don't think seq is widely used.
I see, I will work on this, thanks!
Peilin Ye