Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp586631pxb; Tue, 15 Feb 2022 22:59:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJz7Zt4lHxeP9z/pPtp8qncVPEqHp/8ez90va+dsDciBKaBEfEUhbuiXaly1/TaFHL7XfDiX X-Received: by 2002:a63:5055:0:b0:372:f0e7:c034 with SMTP id q21-20020a635055000000b00372f0e7c034mr1197451pgl.4.1644994740273; Tue, 15 Feb 2022 22:59:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644994740; cv=none; d=google.com; s=arc-20160816; b=RB4Y9sywb/z3KpBFQYt4WdyYUFe6kD1BmVq6p2VM/a6nqcxnFzQfPS+xky4fbGu8xM /qnE37VoRWODMBhWoc+9ASagwFFeFUwiqhxUfv+49qRvUDongUFwwqdBzCScpzgBCW6j MbIOYMerm9wIfethWwKIhBYyMGIBOs45ZXGBJqfe97Yd8LkRA7qYRlID1trpUoTNdg7G 74hzfi3wbOudEk63Q5iYvfprkmBR80t2IC9SBoRL2iigu/ZM2GyHdV1WL30E0y4f7Sx4 BXOHtvBHi1i9EtLvZ4aUT1CYZ1P92/WXNk909DUv97oKHxhcXveHnYzEXhZkOAD9SWmB F+Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=O4eypergZV3m6Zzdl4fNq7xKdAiZHF46BD1z85Nc8jg=; b=zt3x5ZNKlDUkl6h90ihmCXXQfAEvtU7opNmtrAi+ze6oTAbmLPk/qarvMxfP7EKd4v oXsJgXPihBN2N3Of9PHOgPMZzWnmhpIKNNlMqSphVnW/emZmygQsqhJYbvmY3Cczxn8k 7P9KnsWBj72OMewVY8gBn+hz4QcxkSr37Ly18+uaRYQ7BT8m4ai1CgnhD+7Tslr2FZnn toAdLMDDWvqMIkhKZBhiBQt4SAnyV65F132CbGfG8sG9c0P0Sl4nFizLAm7CmDDSngMk v8slZRxnXtFAv9uDvkcZx5Ns2/ab8lBYY2OQcR4FnlgVRhn/zubLGiJGt0qm+ycB9qOK GM5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QXVXjkwW; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id j10si18387582pjl.107.2022.02.15.22.58.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Feb 2022 22:59:00 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QXVXjkwW; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 21DE32409C1; Tue, 15 Feb 2022 22:41:00 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240481AbiBOPml (ORCPT + 99 others); Tue, 15 Feb 2022 10:42:41 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:43492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241533AbiBOPlb (ORCPT ); Tue, 15 Feb 2022 10:41:31 -0500 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B015DC1146 for ; Tue, 15 Feb 2022 07:35:23 -0800 (PST) Received: by mail-yb1-xb33.google.com with SMTP id v186so57267857ybg.1 for ; Tue, 15 Feb 2022 07:35:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=O4eypergZV3m6Zzdl4fNq7xKdAiZHF46BD1z85Nc8jg=; b=QXVXjkwWhHxjoLXOmL4F9kN9NOD62pL1mWBMaTYzAESbupVXKNh5VqF9KgeTTv80NX 8qM68EjdMMonVbDhSSRJ/xxdMD/ZnUyHwl/MAXaBsdLu5vBPwCP3zq2uW6MXcDeKBWMg LYAVlC8HVQcez0QQi1WnMtf35FtsRNXzyzL1S2Ty5WGzqw7aPbVl9hB3LSoWvi0p+LPb m3hfDp+37V/UnGccbWP2lFtEQk3mAItXOd8GiHBVZWv2LGwEa8oyhACoUx/U+U1wpY8Q s6evXE37CnbGzrC3sZPINlGMGyVjjpql9t1fBcWMI8aqIguTkyXmX3UXiePkMlSN6qMm 6k1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=O4eypergZV3m6Zzdl4fNq7xKdAiZHF46BD1z85Nc8jg=; b=sBkN4KjfLSRppCISU2TUO0zD/44gTTkA7n3ys7Eu18E6CDX2IjWGHVIRVq0Co3dbrY K5R7ErsdP/sXT6pQqiKdyI1O0h+JZlFy6uWiegyYy3bvBsuhXpi0+3F94yU3DqRVch7k XdKOdDRG4Oig9gKx3frG0DOQNWTI4NRmK3Fl69+Tq68hvWI4BGvLrmKW9qeipv3nG9FO yRuZ3gdm6n8nR40AMI/efURr/k3iwxJZqZdGKX6MwPWdxF8ZEutaOA13vHhqIRIdcNSC MHzGhIX5JUaGJuz1xE9alpF2p1j3CCyMNTQgVY2BqDvf2o0A7bZO0LvhKaR1cQDobGbA XZXw== X-Gm-Message-State: AOAM530Tp9YXaTpXH1cW5mimwx1lNZNAdeQumDQbtmGJNITfdcWunaS/ 0IZEaH1EGS6kCDQUhZo8r1gRvU3TAzjznbnk3S438Q== X-Received: by 2002:a25:8885:: with SMTP id d5mr4091214ybl.383.1644939322315; Tue, 15 Feb 2022 07:35:22 -0800 (PST) MIME-Version: 1.0 References: <20220213150234.31602-1-thomas.liu@ucloud.cn> <42554FCB-9180-4B32-B5CF-6D3236237D99@ucloud.cn> In-Reply-To: From: Eric Dumazet Date: Tue, 15 Feb 2022 07:35:10 -0800 Message-ID: Subject: Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover To: Willem de Bruijn Cc: Tao Liu , David Miller , Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , "Samudrala, Sridhar" , Network Development , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 15, 2022 at 7:01 AM Willem de Bruijn wrote: > > On Mon, Feb 14, 2022 at 8:38 PM Tao Liu wrote: > > > > Sorry to resend it. > > > > 2022=E5=B9=B42=E6=9C=8814=E6=97=A5 12:27=EF=BC=8CWillem de Bruijn =E5=86=99=E9=81=93=EF=BC=9A > > > > On Sun, Feb 13, 2022 at 11:03 PM Tao Liu wrote: > > > > > > Sorry for bothering, just repost it. > > > > 2022=E5=B9=B42=E6=9C=8814=E6=97=A5 09:28=EF=BC=8CWillem de Bruijn =E5=86=99=E9=81=93=EF=BC=9A > > > > On Sun, Feb 13, 2022 at 10:10 AM Tao Liu wrote: > > > > > > We encouter a tcp drop issue in our cloud environment. Packet GROed in = host > > forwards to a VM virtio_net nic with net_failover enabled. VM acts as a > > IPVS LB with ipip encapsulation. The full path like: > > host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat > > -> ipip encap -> net_failover tx -> virtio_net tx > > > > When net_failover transmits a ipip pkt (gso_type =3D 0x0103), there is = no gso > > performed because it supports TSO and GSO_IPXIP4. But network_header ha= s > > been pointing to inner ip header. > > > > > > If the packet is configured correctly, and net_failover advertises > > that it can handle TSO packets with IPIP encap, then still virtio_net > > should not advertise it and software GSO be applied on its > > dev_queue_xmit call. > > > > This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly > > set, but also tunneling fields like skb->encapsulated and > > skb_inner_network_header. > > > > Thanks very much for your comment! > > > > Yes, the packet is correct. Another thing i have not pointed directly i= s > > that the pkt has SKB_GSO_DODGY. net_failover do not advertises GSO_ROBU= ST > > but virtio_net do. > > > > > > If net_failover does not advertise NETIF_F_GSO_ROBUST, then > > tcp_gso_segment will pass a packet with SKB_GSO_DODGY to the > > software gso stack, not taking the branch > > > > if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { > > > > As i tested, packet with SKB_GSO_DODGY hits this branch. packet's gso_t= ype=3D0x0103, which > > means SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4. net_failover mat= ches > > the condition. > > > > Consequently, tcp_gso_segment returns NULL, there is no software gso di= d here. And > > network_header points to inner iph. > > > > Software gso is did by virtio_net which not advertises NETIF_F_GSO_IPXI= P4. It skips the outer > > iph, and keeps it unchanged. > > > > --- > > net/ipv4/af_inet.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index 9c465ba..f8b3f8a 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff= *skb, > > static struct sk_buff *ipip_gso_segment(struct sk_buff *skb, > > netdev_features_t features) > > { > > + struct sk_buff *segs; > > + int nhoff; > > + > > if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4)) > > return ERR_PTR(-EINVAL); > > > > - return inet_gso_segment(skb, features); > > + nhoff =3D skb_network_header(skb) - skb_mac_header(skb); > > + segs =3D inet_gso_segment(skb, features); > > + if (!segs) > > + skb->network_header =3D skb_mac_header(skb) + nhoff - s= kb->head; > > + > > + return segs; > > } > > > > > > If this would be needed for IPIP, then the same would be needed for SIT= , etc. > > > > Is the skb_network_header > > > > 1. correctly pointing to the outer header of the TSO packet before the > > call to inet_gso_segment > > 2. incorrectly pointing to the inner header of the (still) TSO packet > > after the call to inet_gso_segment > > > > inet_gso_segment already does the same operation: save nhoff, pull > > network header, call callbacks.gso_segment (which can be > > ipip_gso_segment->inet_gso_segment), then place the network header > > back at nhoff. > > > > values print in skb_mac_gso_segment() before callbacks.gso_segment: > > ipip: vlan_depth=3D0 mac_len=3D0 skb->network_header=3D20= 6 > > net_failover: vlan_depth=3D14 mac_len=3D14 skb->network_header=3D186 > > virtio_net: vlan_depth=3D34 mac_len=3D34 skb->network_header=3D206 > > > > agree to add sit/ip4ip6/ip6ip6, and patch can be simplified as: > > > > > > If IPIP GSO was so broken, I think we would have found it long before. > > > > As said, inet_gso_segment should already do the right thing for ipip: > > it will be called twice. > > > > > > SKB_GSO_DODGY flag and net_failover conduct this issue. local traffic j= ust works fine. > > Got it. That is an uncommon combination. SKB_GSO_DODGY is set from > external virtio_net, which does not support tunnels. But a path with > an added tunnel might cause this combination. > > And inet_gso_segment resets the network header, both times, before > calling callbacks.gso_segment() > > skb_reset_network_header(skb); > nhoff =3D skb_network_header(skb) - skb_mac_header(skb); > > [...] > > if (likely(ops && ops->callbacks.gso_segment)) > segs =3D ops->callbacks.gso_segment(skb, features); > > And resets that after for each skb in segs. > > skb =3D segs; > do { > [...] > skb->network_header =3D (u8 *)iph - skb->head; > > But does not do this if segs =3D=3D NULL. > > The packet has to be restored before it is passed to the device. I > think we have to handle this case correctly in inet_gso_segment, > instead of patching it up in all the various tunnel devices. > > The same holds for ipv6_gso_segment. Back in the days, GRO was modified so that we passed a context (nhoff) in called functions, instead of changing skb offsets. The concept of outer/inner header only works with 1 encap. Perhaps it is time to do the same in GSO, to allow arbitrary levels of encapsulation. Then we no longer mess with these limited 'network_header/inner_network_header' fields in the skb. Stuffing state in the skb has been a mistake I think.