Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2678419pxj; Mon, 17 May 2021 07:20:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwNRjYThe4HMrPHF3HyMokLBasdMYtLsQwI1k5YZU8LYeVD2z+1PeE+uh6aoM+6Tai4a/jP X-Received: by 2002:a17:906:b0d:: with SMTP id u13mr137901ejg.159.1621261232801; Mon, 17 May 2021 07:20:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621261232; cv=none; d=google.com; s=arc-20160816; b=vsx2FMKH6nuqhMnzfAXMoR7XTJoR6PNueBuSo9vqL7akrXSqcdVFGs9TJe5yVSyylx E2TPzUzUpidu9hzxxJ/bnf8riHKnFS49gk4RZg1An/FjxBEjmqjUQOGZvbNiySLrQdvq 1yvZfYLupoBKpofBIsl/bYVW1shDRPIthMYsF+Pt/uzyUi+xN+LbsYKEbvTEIgeU3YqU x572fjOZaTtUJdHESNiJLLgSx08KfkYR1IrNsqqbZ91mm/lEqNiEo4MkcLQg4YFax12r q08DUMAQIPVXC0PZQExnQbCb/q/gVSqMMx3wzaE7Ek6EU7QemLHaXAUHjt2PlygNvQFU BmFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from:dkim-signature; bh=MM9OT2VX8MANaBDIr+TuHV8Zc0JnQ9RS2VSGzbp47tM=; b=LRkV+0CSOzTuoPmVWOTDyl8UAw3Oy3UURmcOWhyQc1JSCtpt9LwXkPpm9HkLlj8rc/ J0l4EUoj0dph/qh3qKNhQZFxnCJf8dgW5qlU6majHccw2Oairi1S9KXQL9lVoYYyAW5D PYnQ3da8es0TC0+/gOFQcxHhjdhtVmYPnc7BE+4lwwO8NQSK7nht9dpNU0uW4+zd/004 AF/4XUL6coQ5kg4I6H22k+1zDzFArgLAqWR8KeVVCduGC32MIdOmyOW7rMSF9I4iblEx IYim+YSwAtD2cbsF8bV38evRR33wGXVyp6UuLwtQFOVo0XGWwMffF11IT1N5z/fAnu/Z SZcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RQWlWkYY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ds10si18149185ejc.719.2021.05.17.07.20.09; Mon, 17 May 2021 07:20:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RQWlWkYY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237475AbhEQNn5 (ORCPT + 99 others); Mon, 17 May 2021 09:43:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237408AbhEQNn4 (ORCPT ); Mon, 17 May 2021 09:43:56 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6418BC061573; Mon, 17 May 2021 06:42:39 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id h16so6952348edr.6; Mon, 17 May 2021 06:42:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=MM9OT2VX8MANaBDIr+TuHV8Zc0JnQ9RS2VSGzbp47tM=; b=RQWlWkYYUNRC/++8UUgVItSQ/HvcwGnE9t1R7h605766oL03MohFulSHuLoz7rVYs5 aJMxhzAA9yG6gzvDlR4NP4pdb+aytBKTYkoXX6FrPVrEp1OMi51Kr+YR6Gx4RDK3fkyz JLxSqFCHFb0UDkL8c5899kViGX1OdDrKLY+JNJ9x1J3CEFitC7qwSYKAZaGUrXKrWvO0 lQg7gbDL20lu4YztxW0qaKWWDgk8n2tmopF7b5obMvfquRrmB+sW+6U2isQyDJTP11mw OS3vRWsn6v2s781mbo61IUc+Qk/NJY4xCDaBxq6QNytf/Xa2uo1OnyB0FfVgWlKbUIlm tlvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=MM9OT2VX8MANaBDIr+TuHV8Zc0JnQ9RS2VSGzbp47tM=; b=OpFehpv3NVhqgc6LeySzZ4FHYJUx7aEr4aC8182FLD8oM5S7cyj/3LNe3T6Rhdq9Pc Wp3tP6O1pWAkCiW+sTLdrYzveLl1KioDueFQJ2cXKwvBiRJyr+6wsUuX8aLDxLB05BtR 3uX504iaySoU2xtdg0rwpMTrr0VQCsS1WjsuvWCiE9qcY/Q4cUZirVQbb9WWQ8eyb4fR UHe7jWejMpBPqszJv/AeAIaYJGVUVs+0L/BwB+rWLogXnFAncEOKFXRKH+V5nSOqfcCp 6Te59uWZOt/1fGIkaJ1l3Uljcx08IbCIBSD2+hE0PQ/INuerUMy1uodzfFm+GU9IHOOU Sp5A== X-Gm-Message-State: AOAM531j2H12ewMYkdrxjlgtd0RlrONWxiXztfw0GQGBX7umGYp325Jo 5yzIWuSMClFyRKBvsgmw5eWUWlJ/tvJgS3T/G0RmBg== X-Received: by 2002:a05:6402:cb0:: with SMTP id cn16mr79341edb.15.1621258958120; Mon, 17 May 2021 06:42:38 -0700 (PDT) Received: from ?IPv6:2a04:241e:502:1d80:7d56:ee1e:623f:176f? ([2a04:241e:502:1d80:7d56:ee1e:623f:176f]) by smtp.gmail.com with ESMTPSA id dn4sm3661628edb.88.2021.05.17.06.42.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 May 2021 06:42:37 -0700 (PDT) From: Leonard Crestez Subject: Re: [RFC 1/3] tcp: Consider mtu probing for tcp_xmit_size_goal To: Eric Dumazet , Matt Mathis , Neal Cardwell Cc: "David S. Miller" , Willem de Bruijn , Jakub Kicinski , Hideaki YOSHIFUJI , David Ahern , John Heffner , Leonard Crestez , Soheil Hassas Yeganeh , Roopa Prabhu , netdev , LKML References: <52e63f5b41c9604b909badb7fbc593fe1fe77413.1620733594.git.cdleonard@gmail.com> Message-ID: Date: Mon, 17 May 2021 16:42:35 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/11/21 4:04 PM, Eric Dumazet wrote: > On Tue, May 11, 2021 at 2:04 PM Leonard Crestez wrote: >> >> According to RFC4821 Section 7.4 "Protocols MAY delay sending non-probes >> in order to accumulate enough data" but linux almost never does that. >> >> Linux checks for (probe_size + (1 + reorder) * mss_cache) bytes to be >> available in the send buffer and if that condition is not met it will >> send anyway using the current MSS. The feature can be made to work by >> sending very large chunks of data from userspace (for example 128k) but >> for small writes on fast links tcp mtu probes almost never happen. > > Why should they happen ? > > I am not sure the kernel should perform extra checks just because > applications are not properly written. My tests show that application writing a few kb at a time almost never trigger MTU probing enough to reach 9200. The reasons for this are very difficult for me to understand. It seems that only writing in very large chunks like 160k makes it happen, much more than the size_needed calculated inside tcp_mtu_probing (which is about 50k). This seems unreasonable. Ideally linux should try to accumulate enough data for a probe (as the RFC suggests) but at least it should send probes that fit inside a single userspace write. I dug a little deeper and what seems to happen is this: * size_needed is ~60k * once the head of the queue reached size_needed tcp_push_one is called which sends everything ignoring MTU probing * size_needed is reached again and tcp_push_pending_frames is called. At this point the cwnd has shrunk < 11 (due to the previous burst) so probing is skipped again in favor of just sending in mss-sized chunks. This happens repeatedly, a sender-limited app performing periodic 128k writes will see MSS stuck below MTU. I don't understand the push_one logic and why it completely skips mtu probing, it seems like an optimization which doesn't take RFC4821 into account. >> This patch tries to take mtu probe into account in tcp_xmit_size_goal, a >> function which otherwise attempts to accumulate a packet suitable for >> TSO. No delays are introduced beyond existing autocork heuristics. > > > MTU probing should not be attempted for every write(). > This belongs to some kind of slow path, once in a while. MTU probing is only attempted every 10 minutes but once a probe is pending it does have a slight impact on every write. This is already the case, tcp_write_xmit calls tcp_mtu_probe almost every time. I had an idea for reducing the overhead in tcp_size_needed but it turns out I was indeed mistaken about what this function does. I thought it returned ~mss when all GSO is disabled but this is not so. >> static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, >> int large_allowed) >> { >> + struct inet_connection_sock *icsk = inet_csk(sk); >> struct tcp_sock *tp = tcp_sk(sk); >> u32 new_size_goal, size_goal; >> >> if (!large_allowed) >> return mss_now; >> @@ -932,11 +933,19 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, >> tp->gso_segs = min_t(u16, new_size_goal / mss_now, >> sk->sk_gso_max_segs); >> size_goal = tp->gso_segs * mss_now; >> } >> >> - return max(size_goal, mss_now); >> + size_goal = max(size_goal, mss_now); >> + >> + if (unlikely(icsk->icsk_mtup.wait_data)) { >> + int mtu_probe_size_needed = tcp_mtu_probe_size_needed(sk, NULL); >> + if (mtu_probe_size_needed > 0) >> + size_goal = max(size_goal, (u32)mtu_probe_size_needed); >> + } > > > I think you are mistaken. > > This function usually returns 64KB depending on MSS. > Have you really tested this part ? I assumed that with all gso features disabled this function returns one MSS but this is not true. My patch had a positive effect just because I made tcp_mtu_probing return "0" instead of "-1" if not enough data is queued. I don't fully understand the implications of that change though. If tcp_mtu_probe returns zero what guarantee is there that data will eventually be sent even if no further userspace writes happen? I'd welcome any suggestions. -- Regards, Leonard