Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2464391yba; Mon, 15 Apr 2019 12:11:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqyNC28AKy3sxSe6MZrQjE5hqwSOC55WhX0OgkJqRgoOg4MKWCA0gzoJ4oizLjMnvcMH2f99 X-Received: by 2002:aa7:989a:: with SMTP id r26mr22620398pfl.8.1555355504744; Mon, 15 Apr 2019 12:11:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555355504; cv=none; d=google.com; s=arc-20160816; b=IlVfpDIPvFrL+aRRKnCEn3SqgttihBeSMKdXydK3uGL/9ryJVYHfjeqPuqzsBVK60+ ZX/UhlyJlqnnRgkRh1iRdtSyUnYDG/lnHk7tjM6sSxT8UGWyVRi2LOk0tv4yujppalnj KY9gI2/AIP9gFE6v5J9s4LaI4Pyqrlauc1TitI2S4H50zetNWVujlIqcqWWJQnikl4Jg ZTVQjTAL1oULpjTGje3Y7ZtmV5NdcUfv68Z5CPB/ZJYoXMFwcZljshzpyfdHqGutizub VRg/8h32BnO05LSwKAkZnO7X7Rak0kREFEkNGMXz84FsIXDYsZLkmx4cdSwrygSrKr+a yBOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=pxJS8V2X9hPc0nBeWm+C36sClLMBsi3oBvlw3Pm8gjU=; b=JG/q+iAxK0ZdbHgln1uJlWdSyD2kZIf9xJPJ8kewMX7GRkRDAjXh4MO2HjCsyl3x1d hfWzPn4rzntEl8bkmX8gv+K+VL45Te/qXXhLDNAcH8aM2ux7vb6Ch2J1MATyIdfgrTOV Be6OPIQBmcvCwF7p6qdF3iDK/Eb9sxPh+RTdFT0hLsW2IHpjz9ABieGoNYkpQIrbPvyM 6iwK0Q7JuOJCQtxpfaKiYeu5y/o35T/4PzCsUHG2xfaaeQSmbChl4eWP6k8Zp6Z6DEYh d8lsM10hAEB9uh5jFJ78BGxcog0f6XyaAE2bJxvRw4TVug/Nb4Ot+cFy+TRE7ldxGB3m nvZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=TMYx1tGZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e89si26103565plb.99.2019.04.15.12.11.28; Mon, 15 Apr 2019 12:11:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=TMYx1tGZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730925AbfDOTKK (ORCPT + 99 others); Mon, 15 Apr 2019 15:10:10 -0400 Received: from mail.kernel.org ([198.145.29.99]:46090 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730041AbfDOTKH (ORCPT ); Mon, 15 Apr 2019 15:10:07 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9992921900; Mon, 15 Apr 2019 19:10:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1555355406; bh=6YSsplDcjHOFm/ukDF2Nntd0ewWNSsJ/VApuiXTD/CU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TMYx1tGZbcIYWkeTWawamkJtYRizuqzcaglz/ER81vl/WMw+eUWA2Gi6Hu8x9SU7p hM+sP1JLjIeNvgqJdqCvxQTogIbcOg84MrSAgv0yVtLBy1ilzMviiD8acOhxGHsilv HfKzcKGjcLht69NHmm1QiF6ca5kiP82GBiIOyllI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Koen De Schepper , Olivier Tilmans , Bob Briscoe , Lawrence Brakmo , Florian Westphal , Daniel Borkmann , Yuchung Cheng , Neal Cardwell , Eric Dumazet , Andrew Shewmaker , Glenn Judd , Daniel Borkmann , "David S. Miller" , Sasha Levin Subject: [PATCH 5.0 023/117] tcp: Ensure DCTCP reacts to losses Date: Mon, 15 Apr 2019 20:59:53 +0200 Message-Id: <20190415183746.076639685@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190415183744.887851196@linuxfoundation.org> References: <20190415183744.887851196@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ] RFC8257 ยง3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP". Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values. This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed. The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison. | Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06% We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno. Signed-off-by: Koen De Schepper Signed-off-by: Olivier Tilmans Cc: Bob Briscoe Cc: Lawrence Brakmo Cc: Florian Westphal Cc: Daniel Borkmann Cc: Yuchung Cheng Cc: Neal Cardwell Cc: Eric Dumazet Cc: Andrew Shewmaker Cc: Glenn Judd Acked-by: Florian Westphal Acked-by: Neal Cardwell Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/tcp_dctcp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index cd4814f7e962..359da68d7c06 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -67,11 +67,6 @@ static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA; module_param(dctcp_alpha_on_init, uint, 0644); MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value"); -static unsigned int dctcp_clamp_alpha_on_loss __read_mostly; -module_param(dctcp_clamp_alpha_on_loss, uint, 0644); -MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss, - "parameter for clamping alpha on loss"); - static struct tcp_congestion_ops dctcp_reno; static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) @@ -164,21 +159,23 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags) } } -static void dctcp_state(struct sock *sk, u8 new_state) +static void dctcp_react_to_loss(struct sock *sk) { - if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) { - struct dctcp *ca = inet_csk_ca(sk); + struct dctcp *ca = inet_csk_ca(sk); + struct tcp_sock *tp = tcp_sk(sk); - /* If this extension is enabled, we clamp dctcp_alpha to - * max on packet loss; the motivation is that dctcp_alpha - * is an indicator to the extend of congestion and packet - * loss is an indicator of extreme congestion; setting - * this in practice turned out to be beneficial, and - * effectively assumes total congestion which reduces the - * window by half. - */ - ca->dctcp_alpha = DCTCP_MAX_ALPHA; - } + ca->loss_cwnd = tp->snd_cwnd; + tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); +} + +static void dctcp_state(struct sock *sk, u8 new_state) +{ + if (new_state == TCP_CA_Recovery && + new_state != inet_csk(sk)->icsk_ca_state) + dctcp_react_to_loss(sk); + /* We handle RTO in dctcp_cwnd_event to ensure that we perform only + * one loss-adjustment per RTT. + */ } static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) @@ -190,6 +187,9 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) case CA_EVENT_ECN_NO_CE: dctcp_ece_ack_update(sk, ev, &ca->prior_rcv_nxt, &ca->ce_state); break; + case CA_EVENT_LOSS: + dctcp_react_to_loss(sk); + break; default: /* Don't care for the rest. */ break; -- 2.19.1