Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp609960pxb; Tue, 15 Feb 2022 23:42:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJys4REKxs0YZEoQxWXtd7T/E3l+B0Oo+uLxCgdnL2Zvm033mel35lofjhpeY74XGrEnns/z X-Received: by 2002:a17:902:b60e:b0:14d:7a55:2efb with SMTP id b14-20020a170902b60e00b0014d7a552efbmr1373049pls.124.1644997320232; Tue, 15 Feb 2022 23:42:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644997320; cv=none; d=google.com; s=arc-20160816; b=FXqSIQBWEttQiFVvjS7UeFRv70R1GDxFhf+UlIPXTIMTovm/rLtND8FMnXsbB99rmf laZd2KC1YBXq1YlmWMxatbqEOWxlQ9RVn1+vS79khijdI+OqXmBgxunQVCq4j2WWwOSt Pkn5PUFqD4RNUOirM6YIR89DP3DZ3hIp1/tnFDpMk3m5CaVjLvTTZxDyKwzSW8mY3lq9 3ZzJcqwmVKpUmPHgRVnRTe33f1RFxXAod1//6vHJ1XDRhLGWTHqwLjiu8opIo0JNIi+Z TI1KepBYUOfJEcR5mlAOGA0ggAXMM3RWhx3O+BdFeusD96W5xVYehCkeNP0Qq3inqcl8 SWaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=PF60trGTwto+G7FK7KYiFR5q+c9AYVK17+VWNcK7qFA=; b=LfWWThxzFfUh30KR36ES+4iFWLFgaInfGOQPy3KpjFIU/5D8/RbjHsKqksrMBKCERE pakXEFhbK+BqEXDMvnU+QDVWyeizNPLmNQgxueLrr69BsOGH2BGqgLnfbBMDVsEKI1iQ st82A86OoKzfjXmT4IX2oDZ2mO9PMY4dJKkgVQWn1SybteVlKcGrhuJIMJKiHOQrYVMq N08JlQ/do6DnOaNUTOsl7F/II2s35ZlXUxyn0UXLFDTTKMRcTuhZnWc2BqAOIweUEXha ugkjuYoNXG1h6RpD/kl048TxupXJiVvsJUxpMXVZ21eaakCxtrPrfgauctZxYV8vVrL/ 4AEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CVeoJoRF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id q2si3124216pfk.174.2022.02.15.23.41.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Feb 2022 23:42:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CVeoJoRF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8BD452C8C4A; Tue, 15 Feb 2022 23:06:33 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229451AbiBPGZ0 (ORCPT + 99 others); Wed, 16 Feb 2022 01:25:26 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:41208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbiBPGZZ (ORCPT ); Wed, 16 Feb 2022 01:25:25 -0500 Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBEDE1D1790 for ; Tue, 15 Feb 2022 22:25:09 -0800 (PST) Received: by mail-yb1-xb2d.google.com with SMTP id c6so3068681ybk.3 for ; Tue, 15 Feb 2022 22:25:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PF60trGTwto+G7FK7KYiFR5q+c9AYVK17+VWNcK7qFA=; b=CVeoJoRFtyPTieVAqzhRJI9uCBlv2cAsoTo9f6L1KFm/XI1BBDdawqJqhnapwnGN7x CrWgXqUu11eVpXZpwBWWKQa1je4W/YN8lvBifzIzAtjEjj7igUtiYaoXTEK2wsIN1IPM qB7EUrmOGX1POXWRk7Bk5iyeispWLytE2G++4+Ypwc67+CYn2dqJ5058i0lDF0QmJdN8 EDoofx6NLfA45iOeCH4yPHf5CC3ViI216hJJ3susQcaGspi7YoQLhg6AqpnBIo/xKNmt Dth8rNMLFJEDbhnPVxCElDtdu43wTp9sgsey0LL4jsXcUtkEHnGo2lLNGfznzBRcfaO0 m9mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PF60trGTwto+G7FK7KYiFR5q+c9AYVK17+VWNcK7qFA=; b=VYNd43cqwkvAR+GF3kelrSXFLeUmrwEo44bRlVTudwdppSqa/b0JJVnFwkMneDLOZR 0+I7EHhqwQCVqxlkT4V5yjTQtnP70yt0Bz99o7Nugf/FXXzgVYwK8+O+WkOE5hmQ9cNa Rbx2TgC2Xr8ayHD6WqZ4eGkKfSQTYUH6ZPTesxq/tYYplbhBnY6CPcSYV74SqWSUrvkZ uv4KG+U7/z8GYIv1PQX1h6qMTMfJ2EujWLHxHQ6Gn4sr0upMFgqpFO46NstHQcLPGH0t 0EixvEZWdxoewZREelO+vqV/4P1MtLKMUodGHoQQS84NIRmqYBMkuEslFBsbYWnum4dS C9dQ== X-Gm-Message-State: AOAM532ckK+s8/s184hbU3cDKBmTMt4edRRysR9vTRlcIhcJDizeimqh 3jhy8T6AgWJSZ17metYY2BlB5SnOO4MCBZJQyxhEzA== X-Received: by 2002:a81:347:0:b0:2d2:bca7:fe7f with SMTP id 68-20020a810347000000b002d2bca7fe7fmr1070261ywd.467.1644992708711; Tue, 15 Feb 2022 22:25:08 -0800 (PST) MIME-Version: 1.0 References: <20220216050320.3222-1-kerneljasonxing@gmail.com> In-Reply-To: <20220216050320.3222-1-kerneljasonxing@gmail.com> From: Eric Dumazet Date: Tue, 15 Feb 2022 22:24:57 -0800 Message-ID: Subject: Re: [PATCH v2 net-next] net: introduce SO_RCVBUFAUTO to let the rcv_buf tune automatically To: Jason Xing Cc: David Miller , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Paolo Abeni , Wei Wang , Alexander Aring , Yangbo Lu , Florian Westphal , Tonghao Zhang , Thomas Gleixner , netdev , LKML , bpf , Jason Xing Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 15, 2022 at 9:03 PM wrote: > > From: Jason Xing > > Normally, user doesn't care the logic behind the kernel if they're > trying to set receive buffer via setsockopt. However, once the new > value of the receive buffer is set even though it's not smaller than > the initial value which is sysctl_tcp_rmem[1] implemented in > tcp_rcv_space_adjust(),, the server's wscale will shrink and then > lead to the bad bandwidth as intended. Quite confusing changelog, honestly. Users of SO_RCVBUF specifically told the kernel : I want to use _this_ buffer size, I do not want the kernel to decide for me. Also, I think your changelog does not really explain that _if_ you set SO_RCVBUF to a small value before connect() or in general the 3WHS, the chosen wscale will be small, and this won't allow future 10x increase of the effective RWIN. > > For now, introducing a new socket option to let the receive buffer > grow automatically no matter what the new value is can solve > the bad bandwidth issue meanwhile it's not breaking the application > with SO_RCVBUF option set. > > Here are some numbers: > $ sysctl -a | grep rmem > net.core.rmem_default = 212992 > net.core.rmem_max = 40880000 > net.ipv4.tcp_rmem = 4096 425984 40880000 > > Case 1 > on the server side > # iperf -s -p 5201 > on the client side > # iperf -c [client ip] -p 5201 > It turns out that the bandwidth is 9.34 Gbits/sec while the wscale of > server side is 10. It's good. > > Case 2 > on the server side > #iperf -s -p 5201 -w 425984 > on the client side > # iperf -c [client ip] -p 5201 > It turns out that the bandwidth is reduced to 2.73 Gbits/sec while the > wcale is 2, even though the receive buffer is not changed at all at the > very beginning. > > After this patch is applied, the bandwidth of case 2 is recovered to > 9.34 Gbits/sec as expected at the cost of consuming more memory per > socket. How does your patch allow wscale to increase after flow is established ? I would remove from the changelog these experimental numbers that look quite wrong, maybe copy/pasted from your prior version. Instead I would describe why an application might want to clear the 'receive buffer size is locked' socket attribute. > > Signed-off-by: Jason Xing > -- > v2: suggested by Eric > - introduce new socket option instead of breaking the logic in SO_RCVBUF > - Adjust the title and description of this patch > link: https://lore.kernel.org/lkml/CANn89iL8vOUOH9bZaiA-cKcms+PotuKCxv7LpVx3RF0dDDSnmg@mail.gmail.com/ > --- > I think adding another parallel SO_RCVBUF option is not good. It is adding confusion (and net/core/filter.c has been unchanged) Also we want CRIU to work correctly. So if you have a SO_XXXX setsockopt() call, you also need to provide getsockopt() implementation. I would suggest an option to clear or set SOCK_RCVBUF_LOCK, and getsockopt() would return if the bit is currently set or not. Something clearly describing the intent, like SO_RCVBUF_LOCK maybe.