Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp988080yba; Thu, 4 Apr 2019 01:50:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqzkhm1PwFsxhW+U4PXkO+ZjJRoHbUYVZhfWJqdEm8bVWSFoKYEkLD4AQWbOfoMGqN6+qDVf X-Received: by 2002:a17:902:a706:: with SMTP id w6mr5164655plq.91.1554367837964; Thu, 04 Apr 2019 01:50:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554367837; cv=none; d=google.com; s=arc-20160816; b=i32P7OaxwJfRmL/kfIhfY/LaX11s4dJ3YCA5j+8DfQH5c0Ofar9qBBD1RCvvHOYLyw r4GOTp5RjltwSwtyn5vHlcV5MYVXfkT+95j3tHo5fjYpjPKjSrOUvYjuer/DMjZ5KqY9 PoUPclGDGTKPYF/c9JBdJe20yl8BTcMnPArVKjEWDvw4N24bZ8AdHGOEgcnDKALID70Z 458uNVZKBEZuts/YgjOdLpfcG4E8Odd4fyRsZx/WgGYGkqcco+3QuU5txqTon6rF6I+e Y1/SkGjTfJucRAZaO0tn3cump9fWvgMqilfx1I9VkGBW7pFyXLm33U2g51l/K20F31HS SKrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=eP6d6GDpm2u1ISbS/OgNnk8nAWvuTiPHGUIELY5hJlk=; b=FVtsYTcez67sWnel7prFvcQjj435WbECjWnoklpm8lXEYvh6/3fwLYbFHM2kMAaKCI pPAgQ4LvV7Rd2bPsZryQF+Ndb8mJtlwoNyYCK0dFp8NNc3LUHA+ajzLD2sP9Utnx3Tcg V8zHGtFjY6SRIE+2CR3zlyH1qKDdZza589ywsaus6CrTTcq/kqjwBCzyWcEC84ufa2XY jYHTtZzVi/9Wtpfjhy2awRlc8zUfzKBtb3ktYeL3E57MS8rlfcSrA8bHjcpPPA78oRFK D2vnKb20UtwtJWw3fmEmumf1nBKxpTa3MgV14Snknpnp2aYlmqmVEmhQfFyHLGm8Dk4i JzdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 130si15636944pgc.256.2019.04.04.01.50.23; Thu, 04 Apr 2019 01:50:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728885AbfDDIr6 (ORCPT + 99 others); Thu, 4 Apr 2019 04:47:58 -0400 Received: from www62.your-server.de ([213.133.104.62]:41590 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727489AbfDDIr5 (ORCPT ); Thu, 4 Apr 2019 04:47:57 -0400 Received: from [78.46.172.2] (helo=sslproxy05.your-server.de) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1hBy1V-00074w-MM; Thu, 04 Apr 2019 10:47:01 +0200 Received: from [2a02:120b:c3fc:feb0:dda7:bd28:a848:50e2] (helo=linux.home) by sslproxy05.your-server.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1hBy1V-0005wb-CG; Thu, 04 Apr 2019 10:47:01 +0200 Subject: Re: [PATCH net-next] tcp: Ensure DCTCP reacts to losses To: "Tilmans, Olivier (Nokia - BE/Antwerp)" Cc: "De Schepper, Koen (Nokia - BE/Antwerp)" , Bob Briscoe , Lawrence Brakmo , Florian Westphal , Daniel Borkmann , Yuchung Cheng , Neal Cardwell , Eric Dumazet , Andrew Shewmaker , Glenn Judd , "David S. Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" References: <20190404082055.8981-1-olivier.tilmans@nokia-bell-labs.com> From: Daniel Borkmann Message-ID: Date: Thu, 4 Apr 2019 10:46:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20190404082055.8981-1-olivier.tilmans@nokia-bell-labs.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.100.3/25408/Wed Apr 3 09:53:31 2019) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/04/2019 10:26 AM, Tilmans, Olivier (Nokia - BE/Antwerp) wrote: > RFC8257 ยง3.5 explicitly states that DCTCP should "react to loss > episode in the same way that a conventional TCP". > This is also the behavior on MS Windows. > > Currently, Linux DCTCP performs no ssthresh reduction when losses > are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets > alpha to its maximal value if a RTO happens. This behavior > is sub-optimal for at least two reasons: i) it ignores losses > triggering fast retransmissions; and ii) it causes unnecessary large > cwnd reduction in the future if the loss was isolated as it resets > the historical term of DCTCP's alpha EWMA to its maximal value (i.e., > denoting a total congestion). The second reason has an especially > noticeable effect when using DCTCP in high BDP environments, where > alpha normally stays at low values. > > This patch replace the clamping of alpha by setting ssthresh to > half of cwnd for both fast retransmissions and RTOs, at most once > per RTT. To reflect the change, the dctcp_clamp_alpha_on_loss option > has been renamed to dctcp_halve_cwnd_on_loss. > > The table below shows experimental results where we measured the > drop probability of a PIE AQM (not applying ECN marks) at a > bottleneck in the presence of a single TCP flow with either the > alpha-clamping option enabled or the cwnd halving proposed by this > patch. Results using reno or cubic are given for comparison. > > | Link | RTT | Drop > TCP CC | speed | base+AQM | probability > ==================|=========|==========|============ > CUBIC | 40Mbps | 7+20ms | 0.21% > RENO | | | 0.19% > DCTCP-CLAMP-ALPHA | | | 25.80% > DCTCP-HALVE-CWND | | | 0.22% > ------------------|---------|----------|------------ > CUBIC | 100Mbps | 7+20ms | 0.03% > RENO | | | 0.02% > DCTCP-CLAMP-ALPHA | | | 23.30% > DCTCP-HALVE-CWND | | | 0.04% > ------------------|---------|----------|------------ > CUBIC | 800Mbps | 1+1ms | 0.04% > RENO | | | 0.05% > DCTCP-CLAMP-ALPHA | | | 18.70% > DCTCP-HALVE-CWND | | | 0.06% > > We see that, without halving its cwnd for all source of losses, > DCTCP drives the AQM to large drop probabilities in order to keep > the queue length under control (i.e., it repeatedly faces RTOs). > Instead, if DCTCP reacts to all source of losses, it can then be > controlled by the AQM using similar drop levels than cubic or reno. > > Signed-off-by: Koen De Schepper > Signed-off-by: Olivier Tilmans > Cc: Bob Briscoe > Cc: Lawrence Brakmo > Cc: Florian Westphal > Cc: Daniel Borkmann > Cc: Yuchung Cheng > Cc: Neal Cardwell > Cc: Eric Dumazet > Cc: Andrew Shewmaker > Cc: Glenn Judd > --- > net/ipv4/tcp_dctcp.c | 39 ++++++++++++++++++++++----------------- > 1 file changed, 22 insertions(+), 17 deletions(-) > > diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c > index cd4814f7e962..60417243e7d7 100644 > --- a/net/ipv4/tcp_dctcp.c > +++ b/net/ipv4/tcp_dctcp.c > @@ -67,10 +67,9 @@ static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA; > module_param(dctcp_alpha_on_init, uint, 0644); > MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value"); > > -static unsigned int dctcp_clamp_alpha_on_loss __read_mostly; > -module_param(dctcp_clamp_alpha_on_loss, uint, 0644); > -MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss, > - "parameter for clamping alpha on loss"); > +static unsigned int dctcp_halve_cwnd_on_loss __read_mostly; > +module_param(dctcp_halve_cwnd_on_loss, uint, 0644); > +MODULE_PARM_DESC(dctcp_halve_cwnd_on_loss, "halve cwnd in case of losses"); Is there a reason we still need to keep this module parameter around? The final RFC even says "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP". So it's a MUST requirement in which case it should be enabled by default. The dctcp_clamp_alpha_on_loss was a bit of a hack from very early days.. > static struct tcp_congestion_ops dctcp_reno; > > @@ -164,21 +163,23 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags) > } > } > > -static void dctcp_state(struct sock *sk, u8 new_state) > +static void dctcp_react_to_loss(struct sock *sk) > { > - if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) { > - struct dctcp *ca = inet_csk_ca(sk); > + struct dctcp *ca = inet_csk_ca(sk); > + struct tcp_sock *tp = tcp_sk(sk); > > - /* If this extension is enabled, we clamp dctcp_alpha to > - * max on packet loss; the motivation is that dctcp_alpha > - * is an indicator to the extend of congestion and packet > - * loss is an indicator of extreme congestion; setting > - * this in practice turned out to be beneficial, and > - * effectively assumes total congestion which reduces the > - * window by half. > - */ > - ca->dctcp_alpha = DCTCP_MAX_ALPHA; > - } > + ca->loss_cwnd = tp->snd_cwnd; > + tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); > +} > + > +static void dctcp_state(struct sock *sk, u8 new_state) > +{ > + if (dctcp_halve_cwnd_on_loss && new_state == TCP_CA_Recovery && > + new_state != inet_csk(sk)->icsk_ca_state) > + dctcp_react_to_loss(sk); > + /* We handle RTO in dctcp_cwnd_event to ensure that we perform only > + * one loss-adjustment per RTT. > + */ > } > > static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) > @@ -190,6 +191,10 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) > case CA_EVENT_ECN_NO_CE: > dctcp_ece_ack_update(sk, ev, &ca->prior_rcv_nxt, &ca->ce_state); > break; > + case CA_EVENT_LOSS: > + if (dctcp_halve_cwnd_on_loss) > + dctcp_react_to_loss(sk); > + break; > default: > /* Don't care for the rest. */ > break; >