Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp27402pxb; Wed, 30 Mar 2022 21:57:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWjT01XlnfjtwQkjf0/OhNeDRNBDCWaY1HtxuGzkHkAso4eeXiqELaw2ZYN0MipNQVTxH6 X-Received: by 2002:a17:902:ec8d:b0:154:7f38:cd40 with SMTP id x13-20020a170902ec8d00b001547f38cd40mr3548796plg.40.1648702674662; Wed, 30 Mar 2022 21:57:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648702674; cv=none; d=google.com; s=arc-20160816; b=OgX2lZG/9+1mpkd5Yc0EVjYxAblcZ1z1AuXAAtMKgeLce0kc388bNC/sIKMQqDj6pW cSCdPp3Y0jbtIYaf783iK9720zp0dZCyK5Pz/UF3wGhSsvWxJmubNHxNWNiU16cIGzuM YxtcM0xqoyzXCIc3gv4IHDXj1fCIsP0NdXIxHttABN/35IrDimnmb3gmLV3ktM7j6kEB sQ7NJGccDqSdIPtRYLmBaBbaF7eaUpkO3H/9qyPtk2oEUaJN4/ITZ15QxpOQ8hHfj1Mu gj/Y2HQBkZjSkBHkMig66VvMGBqmE1w3kMwG7Ej+p0xLs873L599oH8OJ3dVYIzEa3zN WS5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id; bh=3nMssKbz0AZC+93UujoFd4Ze37UCKQNqChUISrRSwCE=; b=be+9Iuo5bmk09pnBXusT2+bf9hPrzRCceHHr1RQRC883LRvOIHs0ANqJV/xBFKSCNI wrRdF5SsraDP+Q+YlOQeWtApP1c3EVN1Sfk0q4D/ri5Ql1HmqB1j45DnRlShVSqLmYqh uj3WWQ/mmgGAbfrYjVnAVeh1lihX+S/ult+ShAUmX/IjBDsK1TxKeYR05o7bAY93WQ3M WYFmLOGfxVAZGG86NkEilRYaDW3xq9NX1T7uinwNMFOllXZsCTB236E4tYxMfD3U5dEV 6xa8ga6oPV0IefvOKaVGkmtGyt2UQ20oTKMHTj62uBZeakYr8jk4xsH2TGZQVsD9QDmg Qt4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=uls.co.za Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e16-20020a635450000000b003824fa8e997si7134067pgm.177.2022.03.30.21.57.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 21:57:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=uls.co.za Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2A5A323EC45; Wed, 30 Mar 2022 20:36:20 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348600AbiC3QGZ (ORCPT + 99 others); Wed, 30 Mar 2022 12:06:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348621AbiC3QGW (ORCPT ); Wed, 30 Mar 2022 12:06:22 -0400 Received: from bagheera.iewc.co.za (bagheera.iewc.co.za [IPv6:2c0f:f720:0:3:be30:5bff:feec:6f99]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5C7023D77F; Wed, 30 Mar 2022 09:04:33 -0700 (PDT) Received: from [165.16.203.119] (helo=tauri.local.uls.co.za) by bagheera.iewc.co.za with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nZaYE-0007la-NE; Wed, 30 Mar 2022 17:48:02 +0200 Received: from [192.168.42.207] by tauri.local.uls.co.za with esmtp (Exim 4.94.2) (envelope-from ) id 1nZZoV-00018E-Pe; Wed, 30 Mar 2022 17:00:47 +0200 Message-ID: Date: Wed, 30 Mar 2022 17:00:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections Content-Language: en-GB To: Neal Cardwell Cc: Eric Dumazet , LKML , Netdev , Yuchung Cheng References: <10c1e561-8f01-784f-c4f4-a7c551de0644@uls.co.za> From: Jaco Kroon Organization: Ultimate Linux Solutions (Pty) Ltd In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2022/03/30 15:56, Neal Cardwell wrote: > On Wed, Mar 30, 2022 at 2:22 AM Jaco Kroon wrote: >> Hi Eric, >> >> On 2022/03/30 05:48, Eric Dumazet wrote: >>> On Tue, Mar 29, 2022 at 7:58 PM Jaco Kroon wrote: >>> >>> I do not think this commit is related to the issue you have. >>> >>> I guess you could try a revert ? >>> >>> Then, if you think old linux versions were ok, start a bisection ? >> That'll be interesting, will see if I can reproduce on a non-production >> host. >>> Thank you. >>> >>> (I do not see why a successful TFO would lead to a freeze after ~70 KB >>> of data has been sent) >> I do actually agree with this in that it makes no sense, but disabling >> TFO definitely resolved the issue for us. >> >> Kind Regards, >> Jaco > Thanks for the pcap trace! That's a pretty strange trace. I agree with > Eric's theory that this looks like one or more bugs in a firewall, > middlebox, or netfilter rule. From the trace it looks like the buggy > component is sometimes dropping packets and sometimes corrupting them > so that the client's TCP stack ignores them. The capture was taken on the client.  So the only firewall there is iptables, and I redirected all -j DROP statements to a L_DROP chain which did a -j LOG prior to -j DROP - didn't pick up any drops here. > > Interestingly, in that trace the client SYN has a TFO option and > cookie, but no data in the SYN. So this allows the SMTP server which in the conversation speaks first to identify itself to respond with data in the SYN (not sure that was actually happening but if I recall I did see it send data prior to receiving the final ACK on the handshake. > > The last packet that looks sane/normal is the ACK from the SMTP server > that looks like: > > 00:00:00.000010 IP6 2a00:1450:4013:c16::1a.25 > > 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . 6260:6260(0) ack 66263 win > 774 > > That's the first ACK that crosses past 2^16. Maybe that is a > coincidence, or maybe not. Perhaps the buggy firewall/middlebox/etc is I believe it should be because we literally had this on every single connection going out to Google's SMTP ... probably 1/100 connections managed to deliver an email over the connection.  Then again ... 64KB isn't that much ... When you state sane/normal, do you mean there is fault with the other frames that could not be explained by packet loss in one or both of the directions? > confused by the TFO option, corrupts its state, and thereafter behaves > incorrectly past the first 64 KBytes of data from the client. Only firewalls we've got are netfilter based, and these packets all passed through the dedicated firewalls at least by the time they reach here.  No middleboxes on our end, and if this was Google's side there would be crazy noise be heard, not just me.  I think the trigger is packet loss between us (as indicated we know they have link congestion issues in JHB area, it took us the better part of two weeks to get the first line tech on their side to just query the internal teams and probably another week to get the response acknowledging this - mybroadband.co.za has an article about other local ISPs also complaining). > > In addition to checking for checksum failures, mentioned by Eric, you > could look for PAWS failures, something like: > > nstat -az | egrep -i 'TcpInCsumError|PAWS' TcpInCsumErrors                 0                  0.0 TcpExtPAWSActive                0                  0.0 TcpExtPAWSEstab                 90092              0.0 TcpExtTCPACKSkippedPAWS         81317              0.0 Not sure what these mean, but i should probably investigate, the latter two are definitely incrementing. Appreciate the feedback and for looking at the traces. Kind Regards, Jaco