Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp85160pxj; Wed, 26 May 2021 16:57:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwB8rA+ar5xk+pciLd1CsMQM+BrH8MKXVtkyWTrHzKG0RPpkpz7CydB4nWVTI+ii1e2Hw04 X-Received: by 2002:a02:7348:: with SMTP id a8mr712159jae.116.1622073430061; Wed, 26 May 2021 16:57:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622073430; cv=none; d=google.com; s=arc-20160816; b=z2TGAyNjsxPiVc165xCxmxYSwkkl3IhucmNQMawYkrUkFb1XJpjBGzzG4xJrx2Pdd2 CtPWsLVF3dMRaJEWnADxQV1E2t0mRvHN3hJTUHeN9f9V4dW5AfA5EbmvW85Crw6wnbch 8orAxrWZ+0Pl7k/E9F3aGVXJPbWP+LBE3Uw0lrcE9S6lEMTpZMqc0xl4EFfa1NpNPlH/ riLsJ6IE7GaLyvHen5lH0SV6cMZuo/NCQnVxIisBWb1ZbBPLlumWFc8G/ZOWoUqGwYOw BfGCG96QGOSGeWV0Vr7KYneC7shtAtI9YgA3DNs0s/IEMiEHlafQXNVXAj4jQoAsUJs2 IXNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=8Z2K++KuCClDy9KbutpofrcdapQ1Tt8SVGV0bdU7V8A=; b=bzAtdpnlfmJUwYyOY4AIUgQWJ1mrUEFoH2sPvJACS6MU03wE3RFPJ1T9C8x+o45RSd uZfsCJzyySj25W0rNEjOUWj3CQe70InZdm5m1c4uHDaAH5hIB0kOlLoD9GCBDkpFGxdu g0Zw8ZJuPBSZsga2BlZve7Zp9swBH0dun0sW1T8XgXVhkYR96ATn3Ck7Wj6AISJzLyRI 3+05Gdnx5LeHNGSyLCQY3cZoEQFkjR0XGMV0XRpHLfpaA4E9nyvDbCY6OqQ6SA9WkTsU OOIxMmC0Qfx2C4aV9ArIuhh410swK210S6V0IY4oKlmj9cq3bHYExu/sEx9m0yJZByuy u3OA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eL3+6MfS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t13si611316ilu.151.2021.05.26.16.56.56; Wed, 26 May 2021 16:57:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eL3+6MfS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235048AbhEZRz1 (ORCPT + 99 others); Wed, 26 May 2021 13:55:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234928AbhEZRzY (ORCPT ); Wed, 26 May 2021 13:55:24 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60BD4C061574 for ; Wed, 26 May 2021 10:53:52 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id z17so2028360wrq.7 for ; Wed, 26 May 2021 10:53:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8Z2K++KuCClDy9KbutpofrcdapQ1Tt8SVGV0bdU7V8A=; b=eL3+6MfSM3PCtoxaRXow6IcRXBwwWESzbCM/8Dc4wDaSiJ2i0MpJP1yxnKDTBwfcHn jdbun3ZwmSGHi6XP+j7JUCzJhjQHWtetTBqxBoKRm40WbHUz/B6oFPA7AyxH9CAt1dGy lruuaxMU63WI13bsqjDz9Aj4gXXwoqJZkEKf3lXx9e97yenNiRE+5g4c/YJRqfpFx/5H xJwf9Bs6UFl7/V5Jluq/xO/69o5LASXmAM3CVWiR0vGVhbw6jvSBbCytzp9eMQmaCmzl kdMCgerwj7yGfVHKtmRnbmzuYUdJn7vzndOqKkQkKhTbgncyEjHddGnWgPpLAMTu985T 0rgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8Z2K++KuCClDy9KbutpofrcdapQ1Tt8SVGV0bdU7V8A=; b=P3J96pbwAewR5UcM923xOavKevmYDZyqEYBfxOyS0LQPFcwKUnwaSWSlOR5gJ8aT8V N3YxkB8UrfFsS3Kxrndy6OpzuNtgzQ2WMSmLaOaDOvLuY/9Z19BZHBte21r2nhYvLf/A WR+rrpU0cgb0L/bE2+AlSY8ypBf7skk/dAmSDM1VACaLIMb0XXCHUrA5UrskidRzBuxW hDWORLWUOA2xuKoDXsuVwO3F5GBiZLZXY7TSqIvgy60edoRPDHTuQwv+mQLzJzhjA0pM 2FlWObkZUYnAj6T1qM7T2KZnJRel3TPDdFvDA6cIONy0Oou2dYAB9ms40fgipiujLO5t 6Dvw== X-Gm-Message-State: AOAM531hWkGL361l79IF7eI+klAARS1U+Hj4RUa4PSTCUrHoJOBkSYRH HSBUWnlpSsWLLTaE5GiqsjfcVXvOGAt/q4F/M0iwxw== X-Received: by 2002:a5d:4c48:: with SMTP id n8mr33349537wrt.422.1622051630749; Wed, 26 May 2021 10:53:50 -0700 (PDT) MIME-Version: 1.0 References: <750563aba3687119818dac09fc987c27c7152324.1622025457.git.cdleonard@gmail.com> In-Reply-To: From: Yuchung Cheng Date: Wed, 26 May 2021 10:53:13 -0700 Message-ID: Subject: Re: [RFCv2 1/3] tcp: Use smaller mtu probes if RACK is enabled To: Neal Cardwell Cc: Leonard Crestez , Matt Mathis , Eric Dumazet , "David S. Miller" , Willem de Bruijn , Jakub Kicinski , Hideaki YOSHIFUJI , David Ahern , John Heffner , Leonard Crestez , Soheil Hassas Yeganeh , Roopa Prabhu , Netdev , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 26, 2021 at 5:11 AM Neal Cardwell wrote: > > On Wed, May 26, 2021 at 6:38 AM Leonard Crestez wrote: > > > > RACK allows detecting a loss in rtt + min_rtt / 4 based on just one > > extra packet. If enabled use this instead of relying of fast retransmit. > > IMHO it would be worth adding some more text to motivate the change, > to justify the added complexity and risk from the change. The > substance of the change seems to be decreasing the requirement for > PMTU probing from needing roughly 5 packets worth of data to needing > roughly 3 packets worth of data. It's not clear to me as a reader of > this patch by itself that there are lots of applications that very > often only have 3-4 packets worth of data to send and yet can benefit > greatly from PMTU discovery. > > > Suggested-by: Neal Cardwell > > Signed-off-by: Leonard Crestez > > --- > > Documentation/networking/ip-sysctl.rst | 5 +++++ > > include/net/netns/ipv4.h | 1 + > > net/ipv4/sysctl_net_ipv4.c | 7 +++++++ > > net/ipv4/tcp_ipv4.c | 1 + > > net/ipv4/tcp_output.c | 26 +++++++++++++++++++++++++- > > 5 files changed, 39 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst > > index a5c250044500..7ab52a105a5d 100644 > > --- a/Documentation/networking/ip-sysctl.rst > > +++ b/Documentation/networking/ip-sysctl.rst > > @@ -349,10 +349,15 @@ tcp_mtu_probe_floor - INTEGER > > If MTU probing is enabled this caps the minimum MSS used for search_low > > for the connection. > > > > Default : 48 > > > > +tcp_mtu_probe_rack - BOOLEAN > > + Try to use shorter probes if RACK is also enabled > > + > > + Default: 1 > > I would vote to not have a sysctl for this. If we think it's a good > idea to allow MTU probing with a smaller amount of data if RACK is > enabled (which seems true to me), then this is a low-risk enough > change that we should just change the behavior. +1 to not have another sysctl > > > tcp_min_snd_mss - INTEGER > > TCP SYN and SYNACK messages usually advertise an ADVMSS option, > > as described in RFC 1122 and RFC 6691. > > > > If this ADVMSS option is smaller than tcp_min_snd_mss, > > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > > index 746c80cd4257..b4ff12f25a7f 100644 > > --- a/include/net/netns/ipv4.h > > +++ b/include/net/netns/ipv4.h > > @@ -112,10 +112,11 @@ struct netns_ipv4 { > > #ifdef CONFIG_NET_L3_MASTER_DEV > > u8 sysctl_tcp_l3mdev_accept; > > #endif > > u8 sysctl_tcp_mtu_probing; > > int sysctl_tcp_mtu_probe_floor; > > + int sysctl_tcp_mtu_probe_rack; > > int sysctl_tcp_base_mss; > > int sysctl_tcp_min_snd_mss; > > int sysctl_tcp_probe_threshold; > > u32 sysctl_tcp_probe_interval; > > > > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > > index 4fa77f182dcb..275c91fb9cf8 100644 > > --- a/net/ipv4/sysctl_net_ipv4.c > > +++ b/net/ipv4/sysctl_net_ipv4.c > > @@ -847,10 +847,17 @@ static struct ctl_table ipv4_net_table[] = { > > .mode = 0644, > > .proc_handler = proc_dointvec_minmax, > > .extra1 = &tcp_min_snd_mss_min, > > .extra2 = &tcp_min_snd_mss_max, > > }, > > + { > > + .procname = "tcp_mtu_probe_rack", > > + .data = &init_net.ipv4.sysctl_tcp_mtu_probe_rack, > > + .maxlen = sizeof(int), > > + .mode = 0644, > > + .proc_handler = proc_dointvec, > > + }, > > { > > .procname = "tcp_probe_threshold", > > .data = &init_net.ipv4.sysctl_tcp_probe_threshold, > > .maxlen = sizeof(int), > > .mode = 0644, > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index 4f5b68a90be9..ed8af4a7325b 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -2892,10 +2892,11 @@ static int __net_init tcp_sk_init(struct net *net) > > net->ipv4.sysctl_tcp_base_mss = TCP_BASE_MSS; > > net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS; > > net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD; > > net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL; > > net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS; > > + net->ipv4.sysctl_tcp_mtu_probe_rack = 1; > > > > net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME; > > net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES; > > net->ipv4.sysctl_tcp_keepalive_intvl = TCP_KEEPALIVE_INTVL; > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index bde781f46b41..9691f435477b 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -2311,10 +2311,19 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) > > } > > > > return true; > > } > > > > +/* Check if rack is supported for current connection */ > > +static int tcp_mtu_probe_is_rack(const struct sock *sk) > > +{ > > + struct net *net = sock_net(sk); > > + > > + return (net->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION && > > + net->ipv4.sysctl_tcp_mtu_probe_rack); > > +} > > You may want to use the existing helper, tcp_is_rack(), by moving it > to include/net/tcp.h > > thanks, > neal