Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3053158rwe; Mon, 29 Aug 2022 05:11:50 -0700 (PDT) X-Google-Smtp-Source: AA6agR7R1+4gJ40PdKbgxDuFkoVvNftuKJkKTGMA/3LVWCh9rn+k2lZGXbnRsDPShXYnwFDhbHPX X-Received: by 2002:a05:6a00:1bcb:b0:538:4b52:78dd with SMTP id o11-20020a056a001bcb00b005384b5278ddmr3220807pfw.13.1661775109493; Mon, 29 Aug 2022 05:11:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661775109; cv=none; d=google.com; s=arc-20160816; b=GyMWA6umNci/MOFhw/2YxyVipPDoYjYsEAuU4vYEsKvW/ylpx7D/jqT+DS+7/bzOC+ kdJFtV53sV4f862pvHEH32av9V/PWMsoE5Ci3PJJP8NN3rE/HilP3DQx2naxsRZ12v73 cEgyRKrPZYzWTOBjFKnIL2COVC/z1s0Knck9N2Jd+WsDK+2pUsTBm/+yUfUXZLzOxSZQ awW8wJZok7bq8Mu5XFk1MxKfAClCKqJRDyZuEVCFJUGPRXiRN4B0X9LLHz+w/Ch1bIeP Si8SDNr+8rypLlLlA37Mcui4MMgVrqkOMs/hWS+FUrQ2JFi7ynm7Ef9IX0pleqkn73+t si/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Ro/t3hwgF0jEgqKrJ/CZEEN1/QvK/K/roLLkR6FCRX4=; b=CmQXNouCahzzeXaHOonysvmZTn/dYurNazEJ18BVbyEF61nne8+wadWILfyzYmfecM FtHB1r5edINU3KTv98kkU8C9BeDz0kidWfmMX0aBVtcTkZzVePrjaZwihuIB2+H2Jn9M 0ByJCa1/76qUqkAcPXKjc95fxAxVpEIjBeLREJFxws5zCht/T5tUQzQsRw5V54SFvRJR YTKT/mZzGdyNR89+QJ4pyksCRvtYxshWX7MIGkxp83DHFXzU0pEXkovC/B97r9rpZVIg ou4T/svfNekrfwwi0zDM/cFfSUBEi+hYfwkdxoWkHML+WvMZ8x3Oymoeu7CbghE9QpmS hC6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=pyJT+D+Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e125-20020a636983000000b0042c7b672365si58915pgc.416.2022.08.29.05.11.38; Mon, 29 Aug 2022 05:11:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=pyJT+D+Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229854AbiH2LQM (ORCPT + 99 others); Mon, 29 Aug 2022 07:16:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231517AbiH2LN6 (ORCPT ); Mon, 29 Aug 2022 07:13:58 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 808173341A; Mon, 29 Aug 2022 04:10:08 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 136E7B80F2B; Mon, 29 Aug 2022 11:07:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6CEC9C433D6; Mon, 29 Aug 2022 11:07:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1661771255; bh=WBJLOiKgmCJhSoNPFPHvzfExi2tX4P6FT9wYsvgQ9M4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pyJT+D+Zvgln8GbW1772Kg9Oz3YuLmkAx9LFN240+hmE6TGLXUMdAEJnwu6oD7DHx BF8gRxusAJMeEcbDrnq21ck60j/vwCMQiQcYLs4skTHnvN2Gf/xa8Md8sFgCgPkrBr +t5d0zJ9l4yF2Pxl/F6quJW5p0DlKa3sfCIEuyQM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Eric Dumazet , Soheil Hassas Yeganeh , Neal Cardwell , Yuchung Cheng , "David S. Miller" , Sasha Levin Subject: [PATCH 5.10 44/86] tcp: tweak len/truesize ratio for coalesce candidates Date: Mon, 29 Aug 2022 12:59:10 +0200 Message-Id: <20220829105758.337074804@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220829105756.500128871@linuxfoundation.org> References: <20220829105756.500128871@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Eric Dumazet [ Upstream commit 240bfd134c592791fdceba1ce7fc3f973c33df2d ] tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh which has a direct impact on advertized window sizes. We added TCP coalescing in linux-3.4 & linux-3.5: Instead of storing skbs with one or two MSS in receive queue (or OFO queue), we try to append segments together to reduce memory overhead. High performance network drivers tend to cook skb with 3 parts : 1) sk_buff structure (256 bytes) 2) skb->head contains room to copy headers as needed, and skb_shared_info 3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU) Once coalesced into a previous skb, 1) and 2) are freed. We can therefore tweak the way we compute len/truesize ratio knowing that skb->truesize is inflated by 1) and 2) soon to be freed. This is done only for in-order skb, or skb coalesced into OFO queue. The result is that low rate flows no longer pay the memory price of having low GRO aggregation factor. Same result for drivers not using GRO. This is critical to allow a big enough receiver window, typically tcp_rmem[2] / 2. We have been using this at Google for about 5 years, it is due time to make it upstream. Signed-off-by: Eric Dumazet Cc: Soheil Hassas Yeganeh Cc: Neal Cardwell Cc: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/tcp_input.c | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d35e88b5ffcbe..33a3fb04ac4df 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -454,11 +454,12 @@ static void tcp_sndbuf_expand(struct sock *sk) */ /* Slow part of check#2. */ -static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb) +static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb, + unsigned int skbtruesize) { struct tcp_sock *tp = tcp_sk(sk); /* Optimize this! */ - int truesize = tcp_win_from_space(sk, skb->truesize) >> 1; + int truesize = tcp_win_from_space(sk, skbtruesize) >> 1; int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1; while (tp->rcv_ssthresh <= window) { @@ -471,7 +472,27 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb) return 0; } -static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) +/* Even if skb appears to have a bad len/truesize ratio, TCP coalescing + * can play nice with us, as sk_buff and skb->head might be either + * freed or shared with up to MAX_SKB_FRAGS segments. + * Only give a boost to drivers using page frag(s) to hold the frame(s), + * and if no payload was pulled in skb->head before reaching us. + */ +static u32 truesize_adjust(bool adjust, const struct sk_buff *skb) +{ + u32 truesize = skb->truesize; + + if (adjust && !skb_headlen(skb)) { + truesize -= SKB_TRUESIZE(skb_end_offset(skb)); + /* paranoid check, some drivers might be buggy */ + if (unlikely((int)truesize < (int)skb->len)) + truesize = skb->truesize; + } + return truesize; +} + +static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, + bool adjust) { struct tcp_sock *tp = tcp_sk(sk); int room; @@ -480,15 +501,16 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) /* Check #1 */ if (room > 0 && !tcp_under_memory_pressure(sk)) { + unsigned int truesize = truesize_adjust(adjust, skb); int incr; /* Check #2. Increase window, if skb with such overhead * will fit to rcvbuf in future. */ - if (tcp_win_from_space(sk, skb->truesize) <= skb->len) + if (tcp_win_from_space(sk, truesize) <= skb->len) incr = 2 * tp->advmss; else - incr = __tcp_grow_window(sk, skb); + incr = __tcp_grow_window(sk, skb, truesize); if (incr) { incr = max_t(int, incr, 2 * skb->len); @@ -782,7 +804,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) tcp_ecn_check_ce(sk, skb); if (skb->len >= 128) - tcp_grow_window(sk, skb); + tcp_grow_window(sk, skb, true); } /* Called to compute a smoothed rtt estimate. The data fed to this @@ -4761,7 +4783,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) * and trigger fast retransmit. */ if (tcp_is_sack(tp)) - tcp_grow_window(sk, skb); + tcp_grow_window(sk, skb, true); kfree_skb_partial(skb, fragstolen); skb = NULL; goto add_sack; @@ -4849,7 +4871,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) * and trigger fast retransmit. */ if (tcp_is_sack(tp)) - tcp_grow_window(sk, skb); + tcp_grow_window(sk, skb, false); skb_condense(skb); skb_set_owner_r(skb, sk); } -- 2.35.1