Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5410359pxj; Wed, 26 May 2021 09:51:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyhd5Qbqjqu73zfyPXpv/gY5XPaBr+XIxdge2r7HhqKGL0EpxUljyQ/06A4VCPWyEFEmLh8 X-Received: by 2002:a17:907:770a:: with SMTP id kw10mr34646941ejc.213.1622047890722; Wed, 26 May 2021 09:51:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622047890; cv=none; d=google.com; s=arc-20160816; b=ojGNU6s8pafzUisMxRIGWop1ntB2vo39cOE1k9KeZjJWFXP6L0xDkQiWvQbIUXff/4 LlmMEP8jbnrCFDapeXW32EMKYNUA0pOBdEXAlZYGxRizwaMOTuQ47auRQziSAZ2bsmZ7 3jbrw2INOl6JHu27HjlYRGcSAOpvn8NFxapP5aL5DJGToF3LBzgJM0/UGSHyD6PSvovO 99MM1u+z+hJrlOhrL3iLTEFqV3arJx5pmuFn1cqaVfOKRWgmRp4bbH8yWER0d23nkTQF oowxQ/M6f0nF4jMdWuzdMQ1go1REFoOHvxWuD5bPTohaXU0+38LNG8Waut/Ctpo4D0NX dfnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wkQYCEaVCPImpDNQXApDXqkhMOXth/QVgA22IjicNCE=; b=ci0YkLitGs7S0O+1jXCvVrJznOsfMLKpUif0I9YCAnRUf6YaJqcwR7AK8cdA8yHIIQ aL4XfgiDeZR3h9J3MJ959r+1j+IaH+dXtOAaOhpLtimedVQVy2rc8rS0RaPMs0U7QKsA I8i4iT5bA93nuKmsO2QOtpr84asmVZHj6DI+qrhFQVsw1dJWtpmRqr3IKtXHEBBWtsNz WV9JXBanZlWmCqvLRnijN0gQsEhifhqKVMUn037mDiTbDfQiW3FMQTe3Gv9+KBbkxmzG HH6KFAuwQNPfjY9HStyXOm7GXlgDIxeMCrC10Hyl9KqjylkRMDWqw3b6aAaEU83/KqDt x85g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kxrmMXPT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ws15si5752386ejb.198.2021.05.26.09.51.07; Wed, 26 May 2021 09:51:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kxrmMXPT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234019AbhEZKkT (ORCPT + 99 others); Wed, 26 May 2021 06:40:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233653AbhEZKkR (ORCPT ); Wed, 26 May 2021 06:40:17 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2ED2C061756; Wed, 26 May 2021 03:38:44 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id r10so547208wrj.11; Wed, 26 May 2021 03:38:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wkQYCEaVCPImpDNQXApDXqkhMOXth/QVgA22IjicNCE=; b=kxrmMXPTgj8Rg96huyyVsmvRDD+vQ50+xp5vx4cOLvOHfDyA0nt1uCgjoahFsQ9udJ em/dRiB29wGUtZhJ0+5ML0f3v0JRSHkPArkpBGMyCcQ6oWEuHvZzEbvBmxj2GOW9x62E zXgkrprVOigTpYaQLboFm48pM0b5QAvWPGUqb/yiK0ADD07q0JkzvtPS/xPLOOegfoHh xbCdv0mckb3fIh3xxgA5hwRobeWXSAECq6gfIlnsKClWalcsdrwYY4W1ka/DpofupBQa 28rShX4GNLZYjozAQDi0scOnJbzvK7qI2/X9fHnGIdkkWx+uYm5q2G1kpiwwcXUvHZQr bg5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wkQYCEaVCPImpDNQXApDXqkhMOXth/QVgA22IjicNCE=; b=tzFWoD3pVfaAzhEkRJ51fP0ysNUfivFBkH/y41KJJokE4O1xUTjO1i0D0roXiIDGjq 53oPZWQb8DyuqAQa1qjzVrYNk2gyRXivY66VRs6qAhtY0wiyy+W8gq+9eY3HVKTMjgzT QhX7T5YsryVZ7LWyMVjrOg/18M7e/DKxks494vUFCn+puoznPavBh6n0HmjElCFhYd+w N2GUNxkOWwxc4MgM8C+NkAoIk74bDoR4ncvrSo5iTkoYywqLRRMUF1NXF2hrlinDu5M3 6anpmkVJFgD7bZcG+LQ5W/ZRvFKDIoSGk0827LlffdmN/BaF/YRuL/OQfyczg3TAKpj4 5qvQ== X-Gm-Message-State: AOAM530w4HraRH1hKpoiT5stFqaJbXmRJO/musdJLWb+gW/Q6Z+3Zfjp Y01B6ojXqBJggsdULlG7JAM= X-Received: by 2002:a5d:4ccc:: with SMTP id c12mr32300476wrt.137.1622025523220; Wed, 26 May 2021 03:38:43 -0700 (PDT) Received: from localhost.localdomain ([2a04:241e:502:1d80:a50a:8c74:4a8f:62b3]) by smtp.gmail.com with ESMTPSA id j101sm15364927wrj.66.2021.05.26.03.38.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 May 2021 03:38:42 -0700 (PDT) From: Leonard Crestez To: Neal Cardwell , Matt Mathis , Eric Dumazet Cc: "David S. Miller" , Willem de Bruijn , Jakub Kicinski , Hideaki YOSHIFUJI , David Ahern , John Heffner , Leonard Crestez , Soheil Hassas Yeganeh , Roopa Prabhu , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFCv2 1/3] tcp: Use smaller mtu probes if RACK is enabled Date: Wed, 26 May 2021 13:38:25 +0300 Message-Id: <750563aba3687119818dac09fc987c27c7152324.1622025457.git.cdleonard@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org RACK allows detecting a loss in rtt + min_rtt / 4 based on just one extra packet. If enabled use this instead of relying of fast retransmit. Suggested-by: Neal Cardwell Signed-off-by: Leonard Crestez --- Documentation/networking/ip-sysctl.rst | 5 +++++ include/net/netns/ipv4.h | 1 + net/ipv4/sysctl_net_ipv4.c | 7 +++++++ net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_output.c | 26 +++++++++++++++++++++++++- 5 files changed, 39 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a5c250044500..7ab52a105a5d 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -349,10 +349,15 @@ tcp_mtu_probe_floor - INTEGER If MTU probing is enabled this caps the minimum MSS used for search_low for the connection. Default : 48 +tcp_mtu_probe_rack - BOOLEAN + Try to use shorter probes if RACK is also enabled + + Default: 1 + tcp_min_snd_mss - INTEGER TCP SYN and SYNACK messages usually advertise an ADVMSS option, as described in RFC 1122 and RFC 6691. If this ADVMSS option is smaller than tcp_min_snd_mss, diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 746c80cd4257..b4ff12f25a7f 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -112,10 +112,11 @@ struct netns_ipv4 { #ifdef CONFIG_NET_L3_MASTER_DEV u8 sysctl_tcp_l3mdev_accept; #endif u8 sysctl_tcp_mtu_probing; int sysctl_tcp_mtu_probe_floor; + int sysctl_tcp_mtu_probe_rack; int sysctl_tcp_base_mss; int sysctl_tcp_min_snd_mss; int sysctl_tcp_probe_threshold; u32 sysctl_tcp_probe_interval; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 4fa77f182dcb..275c91fb9cf8 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -847,10 +847,17 @@ static struct ctl_table ipv4_net_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &tcp_min_snd_mss_min, .extra2 = &tcp_min_snd_mss_max, }, + { + .procname = "tcp_mtu_probe_rack", + .data = &init_net.ipv4.sysctl_tcp_mtu_probe_rack, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, { .procname = "tcp_probe_threshold", .data = &init_net.ipv4.sysctl_tcp_probe_threshold, .maxlen = sizeof(int), .mode = 0644, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 4f5b68a90be9..ed8af4a7325b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2892,10 +2892,11 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.sysctl_tcp_base_mss = TCP_BASE_MSS; net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS; net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD; net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL; net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS; + net->ipv4.sysctl_tcp_mtu_probe_rack = 1; net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME; net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES; net->ipv4.sysctl_tcp_keepalive_intvl = TCP_KEEPALIVE_INTVL; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index bde781f46b41..9691f435477b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2311,10 +2311,19 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) } return true; } +/* Check if rack is supported for current connection */ +static int tcp_mtu_probe_is_rack(const struct sock *sk) +{ + struct net *net = sock_net(sk); + + return (net->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION && + net->ipv4.sysctl_tcp_mtu_probe_rack); +} + /* Create a new MTU probe if we are ready. * MTU probe is regularly attempting to increase the path MTU by * deliberately sending larger packets. This discovers routing * changes resulting in larger path MTUs. * @@ -2351,11 +2360,26 @@ static int tcp_mtu_probe(struct sock *sk) * smaller than a threshold, backoff from probing. */ mss_now = tcp_current_mss(sk); probe_size = tcp_mtu_to_mss(sk, (icsk->icsk_mtup.search_high + icsk->icsk_mtup.search_low) >> 1); - size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache; + /* Probing the MTU requires one packet which is larger that current MSS as well + * as enough following mtu-sized packets to ensure that a probe loss can be + * detected without a full Retransmit Time Out. + */ + if (tcp_mtu_probe_is_rack(sk)) { + /* RACK allows recovering in min_rtt / 4 based on just one extra packet + * Use two to account for unrelated losses + */ + size_needed = probe_size + 2 * tp->mss_cache; + } else { + /* Without RACK send enough extra packets to trigger fast retransmit + * This is dynamic DupThresh + 1 + */ + size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache; + } + interval = icsk->icsk_mtup.search_high - icsk->icsk_mtup.search_low; /* When misfortune happens, we are reprobing actively, * and then reprobe timer has expired. We stick with current * probing process by not resetting search range to its orignal. */ -- 2.25.1