Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1486292pxb; Wed, 30 Mar 2022 04:50:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQdKajmzrwMFXWbD2GJNmIaCCgz7A2sktltPr32wA6LwjmhKGmjQEV8hHqujByRaJJOnWZ X-Received: by 2002:a05:6a00:3309:b0:4fa:950b:d011 with SMTP id cq9-20020a056a00330900b004fa950bd011mr32216793pfb.24.1648641033306; Wed, 30 Mar 2022 04:50:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648641033; cv=none; d=google.com; s=arc-20160816; b=p7I+jYSj7eLeeW/P9YTh75ZdrZqLuJu6Hk95ibv5aijZR7eGRsl9Pru0lvD9/lsDdb LKlXjn9ulHyvn3XOJjgElnAUT2HUN5pHgAi6LSZ1VRVtXHM4KpIrp5sjj3iKK4PSnS/c jIwVSPsSAEB9pnpCofl02+IBo9ikfeWtf0FVKlQHr5yd7PyUairKOyR8UkqwiaQQ+TVb ndpM4bJgxRp/NW5ORb06LfG99JnZg164SMC71uNOD5ZDrVzmdD2n+hocqQMOZin47WA1 IygUX0s4KqQIkkUSkZ2nxMltPG2Wca3yuNLAPv20E3I1mzmClCifXE3P+wwHvfx221hD KF+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=OLJhu1X5FtoQkwHn6+oh/mMllWrHxAekfRIfQBt2vdU=; b=vVqxQTnlmmTgIufOxOJ8cnORkT5vck3mlFZKduOIBf3Xz712uhxv4vOcqrysVtfAO0 DO5cl2gD83phN+v75+ZQEIOrVa+mj5Hip8qEWhbbXEbfq0jCds5k7/T/br0tAx9jOlGm iEGFdLxA/Jv1gUMUDy3CF4EpQP1FsdDo1cCCDOhAtvgsglE8p4Zis2FqFFjnmW5BfQ1g l0qt+24NqRAEb/4LxqGX2NErFpZMGf5K0TdcVl1onCs7TRuFA91qKc9mpbMkMy9vv1uI /oELJ3JJ9Jk+EMKoSh3LY9fRAvmVgP+Cbx48dgswvBPeJmgbT+BJ5t+PQLmiBJ6JbHJD QUbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eoJh77Bx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ot3-20020a17090b3b4300b001c664b49d24si7582180pjb.168.2022.03.30.04.50.18; Wed, 30 Mar 2022 04:50:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eoJh77Bx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242414AbiC3Dt7 (ORCPT + 99 others); Tue, 29 Mar 2022 23:49:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234735AbiC3Dt6 (ORCPT ); Tue, 29 Mar 2022 23:49:58 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7363CB72 for ; Tue, 29 Mar 2022 20:48:14 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-2db2add4516so205146987b3.1 for ; Tue, 29 Mar 2022 20:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OLJhu1X5FtoQkwHn6+oh/mMllWrHxAekfRIfQBt2vdU=; b=eoJh77BxtbynP4AkXAv2COxjAUwLCayLNehSxMrAC80VGyDfodWC6PqB3mduSktWQw luB/XUQxChj5dMkxweIz9+O5P+qjrUcmdN9jILUkCkEXTEzIrNXX2kWkp2sHZ/pixNcD IpPmdDSM2L989f+igazLU+YN4mDLmhECsxjxN5ZoH3MVf7NGTQwAA8oRKqzjXqt5tp1q LzJnDL4R/hJOG+IYRs0sSVzKyVvmmtvEoSS+kJt/cm6Zb7DB2vsN1O/BhbCz/o2cVQM5 2AN+gyz/93tpgEgCuPlJUgDhbsN0vE2aicuF8q+gIhoD1/wDyUmsOb46PtKqlGa0/S3W /q3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OLJhu1X5FtoQkwHn6+oh/mMllWrHxAekfRIfQBt2vdU=; b=bkUC4IvhFdB3y5Tbj3XLsw4KaNCjhfbWyvkjqPkwyrjeNJ/QukiABfAwDDbV9+VvIr UAMxfW3zvr6NjjfIDnftbEyWkuNuYjxTxW4/bqx8QDsr3x/9316uhDVA9f8BDmDbV1Gk 0L4EYzsWFupNVOwOEQfVrLtF4hdntUyqLR8PCuTnZxxuCwCT7FkXnFKjXl3dnnTmFVRe rUnYLuZT/VBNBtwe2dexyyEoWMoy/Hlt2ZCXZqUPDxHDd5faxelOt/r+/sSvNmMB0zgC g9ANOidXmxQMyCETbzZkjAueGLQXV/WCkw26VNosMxd2ohaWLGQeeuCqBYBmptwn8wkp XomQ== X-Gm-Message-State: AOAM533zFLGpX5Jh5jY8NoBVgjUhkPhCIdkMKmV4i7Pgb52y2VeOvIbm +UeHiHwsBgYS5qV5o51p8hksdhlB2284LAKwFzOtFg== X-Received: by 2002:a81:1693:0:b0:2e5:874a:c060 with SMTP id 141-20020a811693000000b002e5874ac060mr34180512yww.489.1648612093325; Tue, 29 Mar 2022 20:48:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Eric Dumazet Date: Tue, 29 Mar 2022 20:48:02 -0700 Message-ID: Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections To: Jaco Kroon Cc: Neal Cardwell , LKML , Netdev , Yuchung Cheng Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 29, 2022 at 7:58 PM Jaco Kroon wrote: > > Hi Neal, > > > Thanks for the report! I have CC-ed the netdev list, since it is > > probably a better forum for this discussion. > Awesome thank you. > > > > Can you please attach (or link to) a tcpdump raw .pcap file (produced > > with the -w flag)? There are a number of tools that will make this > > easier to visualize and analyze if we can see the raw .pcap file. You > > may want to anonymize the trace and/or capture just headers, etc (for > > example, the -s flag can control how much of each packet tcpdump > > grabs). > > Attached. > > The traffic itself should be mostly encrypted but stripped with -s100 > anyway. At this point SACK was still on. > > I don't know how, or why, but this relates to TFO. After sending report > on a hunch (based on comparing the exim logs of a successful delivery > compared to a non-successful) and the only difference was that the > non-working was stating: > > TFO mode sendto, no data: EINPROGRESS > > and then specifically: > > TCP_FASTOPEN tcpi_unacked 2 > > The working connections never had the latter line in the output. > > The moment I set sysctl -w net.ipv4.tcp_fastopen=0 (default is 1) I've > managed to flood out about 1200 emails to google in a matter of no more > than 15 minutes. > > In the kernel sources: git log v5.8..v5.17 net/ > > And searching for TFO only gives so many possible commits that broke > this, just looking at changelogs I'm not sure if any of them are > relevant. I'm guessing the issue possibly relates to congestion > control, as such this is probably the most relevant: > > commit be5d1b61a2ad28c7e57fe8bfa277373e8ecffcdc > Author: Nguyen Dinh Phi > Date: Tue Jul 6 07:19:12 2021 +0800 > > tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized > > Just looking at the diff it removes a icsk->icsk_ca_initialized = 0; - > the only other place this gets set to 0 is in tcp_disconnect() ... and > to 1 in tcp_init_congestion_control() - so I think we might have an > uninitialized variable here ... then again tcp_init_socket mentions > explicitly that sk_alloc set lots of stuff to 0 - still bugs me that the > original commit (8919a9b31eb4) felt the need to set an explicit 0 in > tcp_init_transfer(). I do not think this commit is related to the issue you have. I guess you could try a revert ? Then, if you think old linux versions were ok, start a bisection ? Thank you. (I do not see why a successful TFO would lead to a freeze after ~70 KB of data has been sent) > > > > > Can you please share the exact kernel version of the client machine? > Our side (client) is 5.17.1 (side that initiates TCP/IP connection), I > obviously can't comment for the Google side (server). > > Also, can you please summarize/clarify whether you think the client, > > server, or both are misbehaving? > > client is re-transmitting frames for which it has already received an > ACK from the server. In pcap from frames 105 onwards one can start > seeing retransmits, then first "spurious retransmission" as wireshark > labels it from frames 122 onwards. > > Kind Regards, > Jaco