Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp9456455pxu; Mon, 28 Dec 2020 17:13:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJzHBramBobI9UiLJ34yHqJrhcFPPuWkS/8JFfghjmYfy3HFMloR8QBjhjJhdH+u7utoTrxv X-Received: by 2002:a17:906:fb0e:: with SMTP id lz14mr44823584ejb.232.1609204407868; Mon, 28 Dec 2020 17:13:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609204407; cv=none; d=google.com; s=arc-20160816; b=AG36Ji3LR7bf0or1eSDuQIDEfEEufBHTxEo6y3/kAL6FckyQnIyX8CP07TnWAreoHq 2rMdKCfu0kyuWqfw32T0Q185Q0a1Ylnjto83lGPFaAqdG75/InqPn4pxKT+ZTKMoupAZ xwgXu18ImhAVWMeW7Uz1Mo0lIyWha27qF2Va4elR3wVvoXz51l78NmzmpkSrfYE1gayA MYioJGKzqUmZhP3E8IylF1fpxVKvl0HeGamLmUZNAmJDo0R7uyk/t31z6sBXL04ouQ1p t6LebN7jZ/EHgE3nEFjbS/Z3e+NHsUIy2OOnguAqKaPrIUfDhnXrFtnI1Fdq1wzttC/f kETw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=0e9n5fPjTJChaJsQpLmOrS0vXlTe//iWR7DIBcQVz18=; b=DjZFgyL2wyKOzLRkE2Jmi2Ln/Gxoaue5hCBhTeMhHfDYvT4Z4beD7ca7i4Cfkua0jL IytUXPqm6ORINECNijSB7ooGVLRgWl1dbFwZfoKWiTnnH5zu5p9hOJc4f2Cza+zJLf52 YhUCcDnwbterpHmf/rFZnxnV1kNyAGRW/qAK+oUJmEgmNxVRaEJr3fsaSXdbUq0W8PfM GhXg0NQ/trEjnMDaFuMG4c1CUch9lCRnbfpbOLyiBIKT4WtFdmlUy1EuLR8dXB8+LbaK mWOgCAQyTMR0K7gK8E671tZdDlwE+5RCRoOBE0uAEyv6dFH+O6BpmcZTuJSTHLpqIHhf Aalw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=bX2980jw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r22si19836234eji.295.2020.12.28.17.13.05; Mon, 28 Dec 2020 17:13:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=bX2980jw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2633844AbgL1Qkv (ORCPT + 99 others); Mon, 28 Dec 2020 11:40:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:55656 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728329AbgL1M7Z (ORCPT ); Mon, 28 Dec 2020 07:59:25 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id EE311208BA; Mon, 28 Dec 2020 12:58:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1609160324; bh=8BlM9+Bc7Rb34ZSU6tW67cQGLiecLkh4a16RFTii0sw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bX2980jwqKWpazgyTrLltP6++n4c8U12fkxDExfGPPKTYVnyNrPM10s2r8bd87R9t dESSjoMYqRTrvnpwkoL1o9C7Z6Vaz94jhTGyBgiwkNEWALgQU8bfdKmQyyXtNYDWBj jN0QRpBAbuDPIHTK1pzF7nA4tgAvUFT+SDMc3NP4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ingemar Johansson , Neal Cardwell , Yuchung Cheng , Soheil Hassas Yeganeh , Eric Dumazet , Jakub Kicinski Subject: [PATCH 4.9 013/175] tcp: fix cwnd-limited bug for TSO deferral where we send nothing Date: Mon, 28 Dec 2020 13:47:46 +0100 Message-Id: <20201228124853.898801718@linuxfoundation.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201228124853.216621466@linuxfoundation.org> References: <20201228124853.216621466@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Neal Cardwell [ Upstream commit 299bcb55ecd1412f6df606e9dc0912d55610029e ] When cwnd is not a multiple of the TSO skb size of N*MSS, we can get into persistent scenarios where we have the following sequence: (1) ACK for full-sized skb of N*MSS arrives -> tcp_write_xmit() transmit full-sized skb with N*MSS -> move pacing release time forward -> exit tcp_write_xmit() because pacing time is in the future (2) TSQ callback or TCP internal pacing timer fires -> try to transmit next skb, but TSO deferral finds remainder of available cwnd is not big enough to trigger an immediate send now, so we defer sending until the next ACK. (3) repeat... So we can get into a case where we never mark ourselves as cwnd-limited for many seconds at a time, even with bulk/infinite-backlog senders, because: o In case (1) above, every time in tcp_write_xmit() we have enough cwnd to send a full-sized skb, we are not fully using the cwnd (because cwnd is not a multiple of the TSO skb size). So every time we send data, we are not cwnd limited, and so in the cwnd-limited tracking code in tcp_cwnd_validate() we mark ourselves as not cwnd-limited. o In case (2) above, every time in tcp_write_xmit() that we try to transmit the "remainder" of the cwnd but defer, we set the local variable is_cwnd_limited to true, but we do not send any packets, so sent_pkts is zero, so we don't call the cwnd-limited logic to update tp->is_cwnd_limited. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Reported-by: Ingemar Johansson Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_output.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1532,7 +1532,8 @@ static void tcp_cwnd_validate(struct soc * window, and remember whether we were cwnd-limited then. */ if (!before(tp->snd_una, tp->max_packets_seq) || - tp->packets_out > tp->max_packets_out) { + tp->packets_out > tp->max_packets_out || + is_cwnd_limited) { tp->max_packets_out = tp->packets_out; tp->max_packets_seq = tp->snd_nxt; tp->is_cwnd_limited = is_cwnd_limited; @@ -2259,6 +2260,10 @@ repair: break; } + is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd); + if (likely(sent_pkts || is_cwnd_limited)) + tcp_cwnd_validate(sk, is_cwnd_limited); + if (likely(sent_pkts)) { if (tcp_in_cwnd_reduction(sk)) tp->prr_out += sent_pkts; @@ -2266,8 +2271,6 @@ repair: /* Send one loss probe per tail loss episode. */ if (push_one != 2) tcp_schedule_loss_probe(sk); - is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd); - tcp_cwnd_validate(sk, is_cwnd_limited); return false; } return !tp->packets_out && tcp_send_head(sk);