On Thu, Apr 21, 2022 at 3:09 PM Peilin Ye <[email protected]> wrote:
>
> From: Peilin Ye <[email protected]>
>
> As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
> collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the
> following sequence of events:
>
> 1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
> add ... external". "ip" ignores "[o]seq" if "external" is specified,
> so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
> it uses lockless TX);
> 2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
> bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
> 3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
>
> gre_build_header(skb, tun_hlen,
> flags, protocol,
> tunnel_id_to_key32(tun_info->key.tun_id),
> (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
> : 0); ^^^^^^^^^^^^^^^^^
>
> Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
> try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by
> making o_seqno atomic_t.
>
> As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless
> xmit"), making o_seqno atomic_t increases "chance for packets being out
> of order at receiver" when NETIF_F_LLTX is on.
>
> Maybe a better fix would be:
>
> 1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if
> they want the kernel to allow sequencing of outgoing packets;
> 2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
> with "oseq".
>
> Unfortunately, that would break userspace.
>
> We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
> do it in separate patches to keep this fix minimal.
>
> Suggested-by: Jakub Kicinski <[email protected]>
> Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.")
> Signed-off-by: Peilin Ye <[email protected]>
> ---
LGTM
Acked-by: William Tu <[email protected]>