Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp3743774pxy; Mon, 26 Apr 2021 08:45:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwnA669DCoz25f+CyUfGGvVPznFnfxyIdDELZJcgQCKc46Wo0kiylMPuAgg9xfmzd/qpZwy X-Received: by 2002:a05:6402:1255:: with SMTP id l21mr21578544edw.362.1619451900702; Mon, 26 Apr 2021 08:45:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619451900; cv=none; d=google.com; s=arc-20160816; b=pXWMahMYpgzCKSkCVjQ/dDAAFcD/s/mTPaa2KxEQ8mW+usnF+MR7aWyLdSTL4qfwGf BlcgTwymVFAHnAZuIFMwc4zjObNlhforo55fWwxgdKD1rfT3lP4tHNM58qsqLPTbJuwM pF/3jXDZWSzETviZuFvI24i45xuA/kShqvH11G8eiRMCOnE+sw7M9dwBf1bsfW2h3Sjr guflFkzFLO6phWnUzsKf4+HHg0KEHGdvj+0O6cnR2j6dJ56JbJX8gkjpZG6osf1IG4o4 wQ8rIFffPHuiptsSi6SxGXWqZHF5TDOE2n7pkgdRySzTXYjNWXJjRCPfTqdgMSA/b683 z8bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=/LEh820Xvs2zEbfipOPRKzWCaKh4YLTGF1yympdnSp0=; b=O/4+JKCK1aW0yz2JnayDht6T+yJ4FltTldamM5pFRxSgc/N1HjtZ5jdbnLLYLFfyzj N3cj8/+HLktMKO4TGEK6Ik00JVr5yiEUukJ4YVVl12IumE39aEeiwpuw20uz3SAn3/Qr HEcPxTIuSoVdSEVmWVYLIQ+oaUOwj+vKaihleosgXcINMU/1rNZOUcIw1O/4ch/gxrxy gXmQWG+2EvI1NCNPEOqFP11Xb7FQLkO83durFcBWLEdc9w3OGIcfKHzpwjc6fU2bMtjH 4CQZKg1O6WaBabJsPO2pBoOKX1IfDhUespb0ir9htNBcsUFWVBmGSWdDlfKkBfVyp9PO X0qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=NMzegHAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p26si118996edx.181.2021.04.26.08.44.36; Mon, 26 Apr 2021 08:45:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=NMzegHAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234117AbhDZPln (ORCPT + 99 others); Mon, 26 Apr 2021 11:41:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233829AbhDZPlm (ORCPT ); Mon, 26 Apr 2021 11:41:42 -0400 Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C312FC061574 for ; Mon, 26 Apr 2021 08:40:59 -0700 (PDT) Received: by mail-ua1-x932.google.com with SMTP id 72so409031uaq.10 for ; Mon, 26 Apr 2021 08:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/LEh820Xvs2zEbfipOPRKzWCaKh4YLTGF1yympdnSp0=; b=NMzegHAfeXW7vlkxXWNzHIC8KqvGXHsZGietHL13mJmVrgYXlBQBhF2HQFEKrHMnh2 51Sj4WmEIsf/UJx/c3SMEpCCSzt6ONmMwLv2S8fK/prmdi9sOB2wJR6kqUeUkTAMFfQ1 EB7rWLtp3IWHYF5oHQ0Gl2085uQJ34HomfD8ETJOxaUnvQ9/C0Ckc3u7Bau6j8gNAGJ0 N6nSbzAFd6cTDbIdgLqhWg2wKXP+SlkfPeM7CRFg7f8XYtjr14DCDS4pnORarAZMyGX3 ecSs2KGlOAKa7Fx17xO4dE5oOgb1rEIkYvBWhmxgWffDyWXrWpB4SZasUauCX9aN2mM+ Op6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/LEh820Xvs2zEbfipOPRKzWCaKh4YLTGF1yympdnSp0=; b=VNabKbGKtajiW4nD5Ul8fCyQQl3Kj4RH6jKBVfxCzwKAZfsE9RCf4bTGW98SXIHnHm N9TQPAyVAQtnlz7Z+kyRz3MOeq/1L1NETlDFBUI5SJJi7ITwsYPGZhZ4Ug8e6vtl12tV YR2uLcPUvTXeMWrcU2tiw59jXMjf431ewYjSS7cilb9vUVpjgBciwaB0WwBaim6Fzkz2 Djbn2LGxfK2gupwwCWA1buek2EHDAG3IkkAFSmGSWcbDBRiPUmZYXtWw2ixc/lLT6QBF JCtmMzgMlmSwUmZ760y0mrP/HdEgx3T9TAejNMk/hHA2nEZMew4JqyO05/Pb1ewd3SGB U8BQ== X-Gm-Message-State: AOAM533MRqI72Uhe/ERMKt19/YUjpGiOEi050D7CMdZaDsX/oj7UwkXq l8FSvw+TnSde9yINSvNQZR8obSkzRlSmGMAqUrrJBA== X-Received: by 2002:ab0:2a84:: with SMTP id h4mr13463575uar.46.1619451658693; Mon, 26 Apr 2021 08:40:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Neal Cardwell Date: Mon, 26 Apr 2021 11:40:41 -0400 Message-ID: Subject: Re: [RFC] tcp: Consider mtu probing for tcp_xmit_size_goal To: Leonard Crestez Cc: Matt Mathis , "David S. Miller" , Eric Dumazet , Willem de Bruijn , Jakub Kicinski , Hideaki YOSHIFUJI , David Ahern , John Heffner , Soheil Hassas Yeganeh , Roopa Prabhu , Leonard Crestez , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for the patch. Some thoughts, in-line below: On Sun, Apr 25, 2021 at 10:22 PM Leonard Crestez wrote: > > According to RFC4821 Section 7.4 "Protocols MAY delay sending non-probes > in order to accumulate enough data" but linux almost never does that. > > Linux checks for probe_size + (1 + retries) * mss_cache to be available I think this means to say "reordering" rather than "retries"? > in the send buffer and if that condition is not met it will send anyway > using the current MSS. The feature can be made to work by sending very > large chunks of data from userspace (for example 128k) but for small > writes on fast links tcp mtu probes almost never happen. > > This patch tries to take mtu probe into account in tcp_xmit_size_goal, a > function which otherwise attempts to accumulate a packet suitable for > TSO. No delays are introduced beyond existing autocork heuristics. I would suggest phrasing this paragraph as something like: This patch generalizes tcp_xmit_size_goal(), a function which is used in autocorking in order to attempt to accumulate suitable transmit sizes, so that it takes into account not only TSO and receive window, but now also transmit sizes needed for efficient PMTU discovery (when a flow is eligible for PMTU discovery). > > Suggested-by: Matt Mathis > Signed-off-by: Leonard Crestez > --- ... > + > + /* Timer for wait_data */ > + struct timer_list wait_data_timer; This timer_list seems left over from a previous patch version, and no longer used? > @@ -1375,10 +1376,12 @@ static inline void tcp_slow_start_after_idle_check(struct sock *sk) > s32 delta; > > if (!sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle || tp->packets_out || > ca_ops->cong_control) > return; > + if (inet_csk(sk)->icsk_mtup.wait_data) > + return; > delta = tcp_jiffies32 - tp->lsndtime; > if (delta > inet_csk(sk)->icsk_rto) > tcp_cwnd_restart(sk, delta); > } I would argue that the MTU probing heuristics should not override congestion control. If congestion control thinks it's proper to reset the cwnd to a lower value due to the connection being idle, then this should happen irrespective of MTU probing activity. > > int tcp_send_mss(struct sock *sk, int *size_goal, int flags) > { > int mss_now; > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index bde781f46b41..c15ed548a48a 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2311,10 +2311,23 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) > } > > return true; > } > > +int tcp_mtu_probe_size_needed(struct sock *sk) > +{ > + struct inet_connection_sock *icsk = inet_csk(sk); > + struct tcp_sock *tp = tcp_sk(sk); > + int probe_size; > + int size_needed; > + > + probe_size = tcp_mtu_to_mss(sk, (icsk->icsk_mtup.search_high + icsk->icsk_mtup.search_low) >> 1); > + size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache; Adding this helper without refactoring tcp_mtu_probe() to use this helper would leave some significant unnecessary code duplication. To avoid duplication, AFAICT your tcp_mtu_probe() should call your new helper. You may also want to make this little helper static inline, since AFAICT it is called in some hot paths. > static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > int push_one, gfp_t gfp) > { > + struct inet_connection_sock *icsk = inet_csk(sk); > struct tcp_sock *tp = tcp_sk(sk); > + struct net *net = sock_net(sk); > struct sk_buff *skb; > unsigned int tso_segs, sent_pkts; > int cwnd_quota; > int result; > bool is_cwnd_limited = false, is_rwnd_limited = false; > u32 max_segs; > > sent_pkts = 0; > > tcp_mstamp_refresh(tp); > + /* > + * Waiting for tcp probe data also applies when push_one=1 > + * If user does many small writes we hold them until we have have enough > + * for a probe. > + */ This comment seems incomplete; with this patch AFAICT small writes are not simply always held unconditionally until there is enough data for a full-sized MTU probe. Rather the autocorking mechanism is used, so that if the flow if a flow is eligible for MTU probes, autocorking attempts to wait for a full amount of data with which to probe (and as is the case with autocorking, if this does not become available then when the local send queues are empty then the flow sends out whatever it has). Also I think it would be better to place this comment where you add new code to tcp_xmit_size_goal(), so that the comment is located near the corresponding key code, rather than in the very general tcp_write_xmit() function. best, neal