Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp384370rdg; Tue, 10 Oct 2023 13:12:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFVb+CSZuAGWQk4tZ5Lv+V2YoToySiNwetwWepxtd77UpX/lijX0EfGimsDWaGXcFnIYgB/ X-Received: by 2002:a05:6358:e48c:b0:143:7cc8:70b1 with SMTP id by12-20020a056358e48c00b001437cc870b1mr14139884rwb.6.1696968747206; Tue, 10 Oct 2023 13:12:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696968747; cv=none; d=google.com; s=arc-20160816; b=hYY6/ETUDPuRecSsTdreV6J4dxFd+HCxMyRaIi6uv+Ft06tyfboPUKBiUPZIjXRsPf 1IQFz5vHW9HtjQRwOGEXFHp+yDzOLqPL/HtHOJpwuuxdCG49r4gpGI12T1IU4NYtn7BI joUi3T/liyxH7VqQG6r1DeIYxzw/UGYm6YDdG1ThlEXRDSPKpfuKFtS4dksxUtVGF71k 3xNAAYcU2fVKYBAujHoVds+UPxL18jwZoXcimfk8WC9g7kjE8hOPf9XvZCOd0az1HEyI 0A2dR2tJeTeaAxNrw27yPtXkRNEYjwMcWwqgILfOZHHWyoz12KPNAaHF1j181FIiJJX/ ykhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=u6W9/K6/ze2C0HD+HbZjeDA0xZ+XZ+0g7jpColbh2S4=; fh=Ozm5NMVC+1jBNQcheU0mo61gz+kg84qkw7Ssn3b/nAg=; b=bEix0SWQAVbfSJypzAeANPhJraXvoVPnd9J8EhTuBm4n68+TpZrtkjGf3nJSeg7qJ4 Zc/P4bVVw9NAlJ873Ae2ujmFZWfvlAVF/1+mF+4BLHbO5BTPPGOSh7+4ihEWHVwav9nu l8sP7eGn+y9aHpPQ/s44ZBehW7g29AY0ktoD6JXB6BTgxXERxEqPtg7WaWiN/UmHgDNB huXzI3s35VjA2yPZGNt4tbFTRDMrppzUdkhsT3KWVT+bHzhixoiGp9IytEJEj6U8BenF HLIEIt84l9Q4e7ipIx4WHomqG9hIOQAG89/0BkuCZP1IORLmtmnEDQho/bIzTwAspxy9 sUcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="YcDLH/oD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id jb18-20020a170903259200b001c5fc1f79f1si12159280plb.165.2023.10.10.13.12.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 13:12:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="YcDLH/oD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 2C4A8802F574; Tue, 10 Oct 2023 13:12:24 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234393AbjJJUMR (ORCPT + 99 others); Tue, 10 Oct 2023 16:12:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232064AbjJJUMQ (ORCPT ); Tue, 10 Oct 2023 16:12:16 -0400 Received: from smtp-fw-9105.amazon.com (smtp-fw-9105.amazon.com [207.171.188.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41F0193; Tue, 10 Oct 2023 13:12:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1696968736; x=1728504736; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u6W9/K6/ze2C0HD+HbZjeDA0xZ+XZ+0g7jpColbh2S4=; b=YcDLH/oDtINJYJJEZleBAVRDQIcVeRwcS8xCnInXsiQov4+E68bN1T3F 7OPhWM/RBG/R/UqUnizmqGOMb2gCn99bvSM62b3F/AlN4oVYl81aSmUTL WJWQFBBwbcZ9Y4ODBthDFQcRK4bHlzJDj41oh6T2WBp/PkQeQZ8UHy0PL E=; X-IronPort-AV: E=Sophos;i="6.03,213,1694736000"; d="scan'208";a="677566915" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-94edd59b.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 20:12:08 +0000 Received: from EX19MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2c-m6i4x-94edd59b.us-west-2.amazon.com (Postfix) with ESMTPS id 754B940AEB; Tue, 10 Oct 2023 20:12:06 +0000 (UTC) Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.37; Tue, 10 Oct 2023 20:12:06 +0000 Received: from 88665a182662.ant.amazon.com (10.187.171.11) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.37; Tue, 10 Oct 2023 20:12:02 +0000 From: Kuniyuki Iwashima To: CC: , , , , , , , , , , , , , , , , , Subject: Re: [PATCH net-next,v2] tcp: Set pingpong threshold via sysctl Date: Tue, 10 Oct 2023 13:11:54 -0700 Message-ID: <20231010201154.31898-1-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <1696965810-8315-1-git-send-email-haiyangz@microsoft.com> References: <1696965810-8315-1-git-send-email-haiyangz@microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.187.171.11] X-ClientProxiedBy: EX19D044UWA003.ant.amazon.com (10.13.139.43) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Spam-Status: No, score=2.7 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 10 Oct 2023 13:12:24 -0700 (PDT) X-Spam-Level: ** From: Haiyang Zhang Date: Tue, 10 Oct 2023 12:23:30 -0700 > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > may prefer a higher pingpong threshold to activate delayed acks in quick > ack mode for better performance. > > The pingpong threshold and related code were changed to 3 in the year > 2019 in: > commit 4a41f453bedf ("tcp: change pingpong threshold to 3") > And reverted to 1 in the year 2022 in: > commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") > > There is no single value that fits all applications. > Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for > optimal performance based on the application needs. > > Signed-off-by: Haiyang Zhang > --- > v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell, > and Kuniyuki Iwashima. > > --- > Documentation/networking/ip-sysctl.rst | 8 ++++++++ > include/net/inet_connection_sock.h | 16 ++++++++++++---- > include/net/netns/ipv4.h | 1 + > net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ > net/ipv4/tcp_ipv4.c | 2 ++ > net/ipv4/tcp_output.c | 4 ++-- > 6 files changed, 33 insertions(+), 6 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst > index 5bfa1837968c..c0308b65dc2f 100644 > --- a/Documentation/networking/ip-sysctl.rst > +++ b/Documentation/networking/ip-sysctl.rst > @@ -1183,6 +1183,14 @@ tcp_plb_cong_thresh - INTEGER > > Default: 128 > > +tcp_pingpong_thresh - INTEGER > + TCP pingpong threshold is 1 by default, but some application may need a > + higher threshold for optimal performance. > + > + Possible Values: 1 - 255 > + > + Default: 1 > + > UDP variables > ============= > > diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h > index 5d2fcc137b88..0182f27bce40 100644 > --- a/include/net/inet_connection_sock.h > +++ b/include/net/inet_connection_sock.h > @@ -325,11 +325,10 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb, > > struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); > > -#define TCP_PINGPONG_THRESH 1 > - > static inline void inet_csk_enter_pingpong_mode(struct sock *sk) > { > - inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH; > + inet_csk(sk)->icsk_ack.pingpong = > + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); > } > > static inline void inet_csk_exit_pingpong_mode(struct sock *sk) > @@ -339,7 +338,16 @@ static inline void inet_csk_exit_pingpong_mode(struct sock *sk) > > static inline bool inet_csk_in_pingpong_mode(struct sock *sk) > { > - return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; > + return inet_csk(sk)->icsk_ack.pingpong >= > + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); > +} > + > +static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) > +{ > + struct inet_connection_sock *icsk = inet_csk(sk); > + > + if (icsk->icsk_ack.pingpong < U8_MAX) > + icsk->icsk_ack.pingpong++; > } > > static inline bool inet_csk_has_ulp(const struct sock *sk) > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > index d96d05b08819..9f1b3eb9473e 100644 > --- a/include/net/netns/ipv4.h > +++ b/include/net/netns/ipv4.h > @@ -191,6 +191,7 @@ struct netns_ipv4 { > u8 sysctl_tcp_plb_rehash_rounds; > u8 sysctl_tcp_plb_suspend_rto_sec; > int sysctl_tcp_plb_cong_thresh; > + u8 sysctl_tcp_pingpong_thresh; > > int sysctl_udp_wmem_min; > int sysctl_udp_rmem_min; Maybe a hole after sysctl_tcp_backlog_ack_defer is a good place to put a new TCP knob. After sysctl_tcp_plb_cong_thresh, we can fill 1-byte hole but the cacheline seems cold for TCP. $ pahole -C netns_ipv4 vmlinux struct netns_ipv4 { ... u8 sysctl_tcp_backlog_ack_defer; /* 402 1 */ /* XXX 1 byte hole, try to pack */ int sysctl_tcp_reordering; /* 404 4 */ ... int sysctl_tcp_plb_cong_thresh; /* 572 4 */ /* --- cacheline 9 boundary (576 bytes) --- */ int sysctl_udp_wmem_min; /* 576 4 */ int sysctl_udp_rmem_min; /* 580 4 */ u8 sysctl_fib_notify_on_flag_change; /* 584 1 */ u8 sysctl_tcp_syn_linear_timeouts; /* 585 1 */ u8 sysctl_igmp_llm_reports; /* 586 1 */ /* XXX 1 byte hole, try to pack */ ... > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > index e7f024d93572..f63a545a7374 100644 > --- a/net/ipv4/sysctl_net_ipv4.c > +++ b/net/ipv4/sysctl_net_ipv4.c > @@ -1498,6 +1498,14 @@ static struct ctl_table ipv4_net_table[] = { > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > + { > + .procname = "tcp_pingpong_thresh", > + .data = &init_net.ipv4.sysctl_tcp_pingpong_thresh, > + .maxlen = sizeof(u8), > + .mode = 0644, > + .proc_handler = proc_dou8vec_minmax, > + .extra1 = SYSCTL_ONE, > + }, > { } > }; > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index a441740616d7..f603ad9307af 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -3288,6 +3288,8 @@ static int __net_init tcp_sk_init(struct net *net) > net->ipv4.sysctl_tcp_syn_linear_timeouts = 4; > net->ipv4.sysctl_tcp_shrink_window = 0; > > + net->ipv4.sysctl_tcp_pingpong_thresh = 1; > + > return 0; > } > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 8885552dff8e..5736a736b59c 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp, > tp->lsndtime = now; > > /* If it is a reply for ato after last received > - * packet, enter pingpong mode. > + * packet, increase pingpong count. > */ > if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) > - inet_csk_enter_pingpong_mode(sk); > + inet_csk_inc_pingpong_cnt(sk); > } > > /* Account for an ACK we sent. */ > -- > 2.25.1