Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751985AbdI1IOP (ORCPT ); Thu, 28 Sep 2017 04:14:15 -0400 Received: from vulcan.natalenko.name ([104.207.131.136]:15530 "EHLO vulcan.natalenko.name" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751129AbdI1ION (ORCPT ); Thu, 28 Sep 2017 04:14:13 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 vulcan.natalenko.name A61F125C523 Authentication-Results: vulcan.natalenko.name; dmarc=fail (p=none dis=none) header.from=natalenko.name From: Oleksandr Natalenko To: Yuchung Cheng Cc: Roman Gushchin , Hideaki YOSHIFUJI , Alexey Kuznetsov , netdev , "linux-kernel@vger.kernel.org" Subject: Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Date: Thu, 28 Sep 2017 10:14:10 +0200 Message-ID: <2325466.Xo6SG5M5hd@natalenko.name> In-Reply-To: References: <20170921014620.GA20906@castle> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=natalenko.name; s=arc-20170712; t=1506586450; h=from:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:in-reply-to:references; bh=nyb7Mv3lYbDIqYa9WZdVpY1nAaUsyA6uz2T4/nPUPbE=; b=O77yZhm2EPGlpYlaNCvQJzgvpXLayjQ61rxDl8qGFwiy48a4mc0YUmqu6yCPxEl7TktNV1 tqi+eENw8nJzCRDlo0dbOiVqgze9DvVf/4vXEbHEF4fSRradJ669ExpE3LdnMjY7fgRVH6 Z7kkUMGvDV6M0q13LwxfkQ9MyL7Zh+I= ARC-Seal: i=1; s=arc-20170712; d=natalenko.name; t=1506586450; a=rsa-sha256; cv=none; b=KU7E6EULvt6lji4xIInlREXxIS5TUOBeS89ZEGouys7h9U0J2CEOLzK6mZfbdp1lyY2FsA1MTfpJD1BVxaneFj5lgvGyxvSU7c9ODlnvpAj2zGU+cuTN+gFQXHlL691xJZbS+3dzn596VpvUdSFL9pp5QDqnb0MvTUSvYDoEMYQ= ARC-Authentication-Results: i=1; auth=pass smtp.auth=oleksandr@natalenko.name smtp.mailfrom=oleksandr@natalenko.name Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v8S8EOYq001017 Content-Length: 8661 Lines: 145 Hi. Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix warning in tcp_fastretrans_alert() for me. On středa 27. září 2017 2:18:32 CEST Yuchung Cheng wrote: > On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng wrote: > > On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin wrote: > >>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin wrote: > >>> > > Hello. > >>> > > > >>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting > >>> > > in the > >>> > > warning shown below. Most of the time it is harmless, but rarely it > >>> > > just > >>> > > causes either freeze or (I believe, this is related too) panic in > >>> > > tcp_sacktag_walk() (because sk_buff passed to this function is > >>> > > NULL). > >>> > > Unfortunately, I still do not have proper stacktrace from panic, but > >>> > > will try to capture it if possible. > >>> > > > >>> > > Also, I have custom settings regarding TCP stack, shown below as > >>> > > well. ifb is used to shape traffic with tc. > >>> > > > >>> > > Please note this regression was already reported as BZ [1] and as a > >>> > > letter to ML [2], but got neither attention nor resolution. It is > >>> > > reproducible for (not only) me on my home router since v4.11 till > >>> > > v4.13.1 incl. > >>> > > > >>> > > Please advise on how to deal with it. I'll provide any additional > >>> > > info if > >>> > > necessary, also ready to test patches if any. > >>> > > > >>> > > Thanks. > >>> > > > >>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835 > >>> > > [2] > >>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.ne > >>> > > t_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJ > >>> > > YgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s > >>> > > =-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=>>> > > >>> > We're experiencing the same problems on some machines in our fleet. > >>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and > >>> > sometimes panics in tcp_sacktag_walk(). > >> > >>> > Here is an example of a backtrace with the panic log: > >> Hi Yuchung! > >> > >>> do you still see the panics if you disable RACK? > >>> sysctl net.ipv4.tcp_recovery=0? > >> > >> No, we haven't seen any crash since that. > > > > I am out of ideas how RACK can potentially cause tcp_sacktag_walk to > > take an empty skb :-( Do you have stack trace or any hint on which call > > to tcp-sacktag_walk triggered the panic? internally at Google we never > > see that. > > hmm something just struck me: could you try > sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0 > and see if kernel still panics on sack processing? > > >>> also have you experience any sack reneg? could you post the output of > >>> ' nstat |grep -i TCP' thanks > >> > >> hostname TcpActiveOpens 2289680 0.0 > >> hostname TcpPassiveOpens 3592758 0.0 > >> hostname TcpAttemptFails 746910 0.0 > >> hostname TcpEstabResets 154988 0.0 > >> hostname TcpInSegs 16258678255 0.0 > >> hostname TcpOutSegs 46967011611 0.0 > >> hostname TcpRetransSegs 13724310 0.0 > >> hostname TcpInErrs 2 0.0 > >> hostname TcpOutRsts 9418798 0.0 > >> hostname TcpExtEmbryonicRsts 2303 0.0 > >> hostname TcpExtPruneCalled 90192 0.0 > >> hostname TcpExtOfoPruned 57274 0.0 > >> hostname TcpExtOutOfWindowIcmps 3 0.0 > >> hostname TcpExtTW 1164705 0.0 > >> hostname TcpExtTWRecycled 2 0.0 > >> hostname TcpExtPAWSEstab 159 0.0 > >> hostname TcpExtDelayedACKs 209207209 0.0 > >> hostname TcpExtDelayedACKLocked 508571 0.0 > >> hostname TcpExtDelayedACKLost 1713248 0.0 > >> hostname TcpExtListenOverflows 625 0.0 > >> hostname TcpExtListenDrops 625 0.0 > >> hostname TcpExtTCPHPHits 9341188489 0.0 > >> hostname TcpExtTCPPureAcks 1434646465 0.0 > >> hostname TcpExtTCPHPAcks 5733614672 0.0 > >> hostname TcpExtTCPSackRecovery 3261698 0.0 > >> hostname TcpExtTCPSACKReneging 12203 0.0 > >> hostname TcpExtTCPSACKReorder 433189 0.0 > >> hostname TcpExtTCPTSReorder 22694 0.0 > >> hostname TcpExtTCPFullUndo 45092 0.0 > >> hostname TcpExtTCPPartialUndo 22016 0.0 > >> hostname TcpExtTCPLossUndo 2150040 0.0 > >> hostname TcpExtTCPLostRetransmit 60119 0.0 > >> hostname TcpExtTCPSackFailures 2626782 0.0 > >> hostname TcpExtTCPLossFailures 182999 0.0 > >> hostname TcpExtTCPFastRetrans 4334275 0.0 > >> hostname TcpExtTCPSlowStartRetrans 3453348 0.0 > >> hostname TcpExtTCPTimeouts 1070997 0.0 > >> hostname TcpExtTCPLossProbes 2633545 0.0 > >> hostname TcpExtTCPLossProbeRecovery 941647 0.0 > >> hostname TcpExtTCPSackRecoveryFail 336302 0.0 > >> hostname TcpExtTCPRcvCollapsed 461354 0.0 > >> hostname TcpExtTCPAbortOnData 349196 0.0 > >> hostname TcpExtTCPAbortOnClose 3395 0.0 > >> hostname TcpExtTCPAbortOnTimeout 51201 0.0 > >> hostname TcpExtTCPMemoryPressures 2 0.0 > >> hostname TcpExtTCPSpuriousRTOs 2120503 0.0 > >> hostname TcpExtTCPSackShifted 2613736 0.0 > >> hostname TcpExtTCPSackMerged 21358743 0.0 > >> hostname TcpExtTCPSackShiftFallback 8769387 0.0 > >> hostname TcpExtTCPBacklogDrop 5 0.0 > >> hostname TcpExtTCPRetransFail 843 0.0 > >> hostname TcpExtTCPRcvCoalesce 949068035 0.0 > >> hostname TcpExtTCPOFOQueue 470118 0.0 > >> hostname TcpExtTCPOFODrop 9915 0.0 > >> hostname TcpExtTCPOFOMerge 9 0.0 > >> hostname TcpExtTCPChallengeACK 90 0.0 > >> hostname TcpExtTCPSYNChallenge 3 0.0 > >> hostname TcpExtTCPFastOpenActive 2089 0.0 > >> hostname TcpExtTCPSpuriousRtxHostQueues 896596 0.0 > >> hostname TcpExtTCPAutoCorking 547386735 0.0 > >> hostname TcpExtTCPFromZeroWindowAdv 28757 0.0 > >> hostname TcpExtTCPToZeroWindowAdv 28761 0.0 > >> hostname TcpExtTCPWantZeroWindowAdv 322431 0.0 > >> hostname TcpExtTCPSynRetrans 3026 0.0 > >> hostname TcpExtTCPOrigDataSent 40976870977 0.0 > >> hostname TcpExtTCPHystartTrainDetect 453920 0.0 > >> hostname TcpExtTCPHystartTrainCwnd 11586273 0.0 > >> hostname TcpExtTCPHystartDelayDetect 10943 0.0 > >> hostname TcpExtTCPHystartDelayCwnd 763554 0.0 > >> hostname TcpExtTCPACKSkippedPAWS 30 0.0 > >> hostname TcpExtTCPACKSkippedSeq 218 0.0 > >> hostname TcpExtTCPWinProbe 2408 0.0 > >> hostname TcpExtTCPKeepAlive 213768 0.0 > >> hostname TcpExtTCPMTUPFail 69 0.0 > >> hostname TcpExtTCPMTUPSuccess 8811 0.0 > >> > >> Thanks!