Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp2431910pxb; Sun, 3 Apr 2022 06:43:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz0FlYUOJYWqQEMnAINDflRZFe5rcb0Ni5AjFGHOJ61Uttvjajhw1fmVx1HXo6C5R4eTGz7 X-Received: by 2002:a05:6402:350c:b0:419:3d18:7dd2 with SMTP id b12-20020a056402350c00b004193d187dd2mr29569659edd.148.1648993396543; Sun, 03 Apr 2022 06:43:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648993396; cv=none; d=google.com; s=arc-20160816; b=xZlw16tdz23Teqe8fQlVFOJ/tkNDZR61TZ1UyYbfnxW5gkxoxSsE1ksLbUcRxth2dP eFrKdsGIo5ME8slXQWXBkojzjiTAi2S/WbNWgOrLPgJatGDx1qFCj4FjMaVz8WCQzDmh 90T3eF90bk8aa2KA2eciMAA4C8ne2g4JFonqRcgLJQjZ+RRJAgt0JoR8/wZLZaZ9tB6L 0tzBpm3sad19wy3Tc8DTKMVtOCv4dTfT1hCGuZ8ol4tSd+aZZjBFhTScbR3580vMCSj/ NfH2G5qvXu2fp4TD+mc3Tan3KKh4wEga7/Qg8x5n+Eq30CpEZhVrOOYfokFngYliH04f hdHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=C8/TSHx2WIRZ4mg+7W1FRBHfUQ9TPY5mCHEMry5F6Oo=; b=lWPYlCJUu2WoUh2z8QSX3v3sC8JrBM3V4zsPxss4G/8r99WnRmlcjT2iCIXFCrypx1 NAHD1r8ZDeRRdTLqPbGQWj4BDCixXCSKkllwF2y+UjPhhCQCAZAvSIZ0mOhA4sNVWh// up1qBKozLMs2pkb87ZFjyI7AdBUBuZDl76gmYBwyjUAP1Mbt60IuxHA5tckW3UGi5Wco 3gUftHNlC+PV8xyrSvrxTj6TpcL6f/LYkEDfTAC9APL0EFapDeWHY3CmBy1EEEfVXEL0 MAxMvjTGSef1zW2p0s8ONnnhMydthU2RVQY3pfeeWGbueTpXpJUqAB4QnV/bSCMoG8Yw iwpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=STG9gBRS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o9-20020a50c909000000b0041cc9514712si628482edh.446.2022.04.03.06.42.51; Sun, 03 Apr 2022 06:43:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=STG9gBRS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231703AbiDBSGY (ORCPT + 99 others); Sat, 2 Apr 2022 14:06:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231678AbiDBSGW (ORCPT ); Sat, 2 Apr 2022 14:06:22 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E23176467 for ; Sat, 2 Apr 2022 11:04:25 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id r127so4532564qke.13 for ; Sat, 02 Apr 2022 11:04:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C8/TSHx2WIRZ4mg+7W1FRBHfUQ9TPY5mCHEMry5F6Oo=; b=STG9gBRSziyPzGSlvVuVU1RLRV5O6nMSwYsfGL1kHXqs+OVkpUsnLx3GDZ1/x7G+ml 7xG6xfInwz1zfBM0RfinPkCXcN/WKDMCN48Zip+qn2An3DKtUX8UcuSUH47ZRI53++12 Eh0n3+6xVKgPoybXNPJ4xdR8BwCfulthmGk6feDgVrwqdI1EAtYO2FRiJ146l8Naf2n1 0QuhafZlOPtwEoQRApcPQt7SQ5sUH/7rlJPh4eYUNwBCTklAUnzCXIUkTKW7Yyx6XaHI uXhcB5o/EYDQAirMzCMuhBURUVdbQI/wVqUVva3xqvkxUKegnwUtKZxFIbtjBPjqjBYU i4+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C8/TSHx2WIRZ4mg+7W1FRBHfUQ9TPY5mCHEMry5F6Oo=; b=bz+cfruKcRM7fL7rx2MWh7bGbf/e5LB0pZcowA4zpHZhQXJbg0lAEE3suD/guS4wab 6wdEkMTK4v5Gvg3r2UPGz4a+aTDxKPpm0F3J+GdPsO+SljRdzT/fHDbk1cHVUPLa85no yxafbkBm0eeqUQwHHqJVpd0ejPyRCopDCSewNM4MmTi0zbMz109DQw0Co+rcsFjKTV97 PSdPgJvyK89mSjhsPLCjEdJzdaFlCK1HVhM+/vNSZ66CJJtKFv92/t10isum9Fz1UpaG Vh+H4V1KOG0dGYh9on5f/OD+wGZebe+/sF9nNwrBzppUaA8jYpdgplBkIBqc6J+lTyvw S0yA== X-Gm-Message-State: AOAM532uDiH6N0qHI8/ovm6RggjizPPIAxv2bpmtcxBMAY1Ip6sY9zoW pQxXPJKj0S6Zt8IRgjcbuXTaP20PiCTI8criIGlU8Q== X-Received: by 2002:a05:620a:1424:b0:67d:2bc6:856b with SMTP id k4-20020a05620a142400b0067d2bc6856bmr10190075qkj.434.1648922664855; Sat, 02 Apr 2022 11:04:24 -0700 (PDT) MIME-Version: 1.0 References: <10c1e561-8f01-784f-c4f4-a7c551de0644@uls.co.za> <5f1bbeb2-efe4-0b10-bc76-37eff30ea905@uls.co.za> <429dd56b-8a6c-518f-ccb4-fa5beae30953@uls.co.za> In-Reply-To: From: Neal Cardwell Date: Sat, 2 Apr 2022 14:04:08 -0400 Message-ID: Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections To: Eric Dumazet Cc: Jaco Kroon , Florian Westphal , LKML , Netdev , Yuchung Cheng , Wei Wang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet wrote: > > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell wrote: > > > > FWIW those log entries indicate netfilter on the mail client machine > > dropping consecutive outbound skbs with 2*MSS of payload. So that > > explains the large consecutive losses of client data packets to the > > e-mail server. That seems to confirm my earlier hunch that those drops > > of consecutive client data packets "do not look like normal congestive > > packet loss". > > > This also explains why we have all these tiny 2-MSS packets in the pcap. > > Under normal conditions, autocorking should kick in, allowing TCP to > build bigger TSO packets. I have not looked at the conntrack code before today, but AFAICT this is the buggy section of nf_conntrack_proto_tcp.c: } else if (((state->state == TCP_CONNTRACK_SYN_SENT && dir == IP_CT_DIR_ORIGINAL) || (state->state == TCP_CONNTRACK_SYN_RECV && dir == IP_CT_DIR_REPLY)) && after(end, sender->td_end)) { /* * RFC 793: "if a TCP is reinitialized ... then it need * not wait at all; it must only be sure to use sequence * numbers larger than those recently used." */ sender->td_end = sender->td_maxend = end; sender->td_maxwin = (win == 0 ? 1 : win); tcp_options(skb, dataoff, tcph, sender); Note that the tcp_options() function implicitly assumes it is being called on a SYN, because it sets state->td_scale to 0 and only sets state->td_scale to something non-zero if it sees a wscale option. So if we ever call that on an skb that's not a SYN, we will forget that the connection is using the wscale option. But at this point in the code it is calling tcp_options() without first checking that this is a SYN. For this TFO scenario like the one in the trace, where the server sends its first data packet after the SYNACK packet and before the client's first ACK, presumably the conntrack state machine is (correctly) SYN_RECV, and then (incorrectly) executes this code, including the call to tcp_options(), on this first data packet, which has no SYN bit, and no wscale option. Thus tcp_options() zeroes out the server's sending state td_scale and does not set it to a non-zero value. So now conntrack thinks the server is not using the wscale option. So when conntrack interprets future receive windows from the server, it does not scale them (with: win <<= sender->td_scale;), so in this scenario the estimated right edge of the server's receive window (td_maxend) is never advanced past the roughly 64KB value offered in the SYN. Thus when the client sends data packets beyond 64KBytes, conntrack declares them invalid and drops them, due to failing the condition Eric noted above: before(seq, sender->td_maxend + 1), This explains my previous observation that the client's original data packet transmissions are always dropped after the first 64KBytes. Someone more familiar with conntrack may have a good idea about how to best fix this? neal