Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp41595pxb; Tue, 15 Feb 2022 08:06:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJz2mW9Mi2kbS+dz1l9p/ScUA+1mF/5YQU6R2PYPvRiA+AxRtWGT4VzuX3G2UfhKltAw1OFv X-Received: by 2002:a17:907:8192:: with SMTP id iy18mr1539196ejc.209.1644941182927; Tue, 15 Feb 2022 08:06:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644941182; cv=none; d=google.com; s=arc-20160816; b=whu+cRB999gfIIlxaOqzS7SYkSAxm2m0eWDOkAUlCmqDgNERdmSDCKHx4AkO36RMLr jaero2UyfVROWCcmN1kHdf4x62Wjp7oZVoFIklCfRx6Fycrjm71VCv0nEoSevbDtcrr0 +DXKLphAQ0ClJKXe/8h/ACo/JrO7lVuxMDNzlDyCAMapMa6+oMHjMHFvZUJRLxIhwv0h gjbJR1sXpcifsqO8z0pK3bf+1u8OhiyLVPM9xfXTc6+yMhV+Swbxk84rXhmaWzy+Q+di yStbHpr/SRbbah1jNOb+R67gBxVUr5+tRa92lGn/XWXGRg9LTU4NQI87/runYPcpT6RQ d37w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=857+qV47ddm8WE8KTWXLHPQ7djLUydmVO+RXbwgQrfk=; b=ZI8BJQcrBfQoLM94ABq562SjOyLuOnCXwruuPSZLqenglMEZbh5zA22zymoKenxYs+ fr7zP/pnmv4EOBgmkc5RkGA5iCmllrFGoY2S429S1z9KlvvWVIK9F7dPBsmURAY2apeL ZUqMtvEfZ8dSoGbVCC5Dji4INJ6HNe/mVmKcyo3zYnq+0y+fY+cnpCGYxr3/VRvQXnu1 /8fvIxgD8iArHPwTIHohhPEkf7YwuRxPdraoFkbG80B0QIavCYTHgLemWRmmmQo03NY0 h2VPZI5bWhqEN8Gt4S6vTYQz0sZrcmZ0kOxMkN5KFOE01Xe9EY5tgXy88PJey7ejwwUw S8+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="D/mNPmTV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i2si27944111ejw.952.2022.02.15.08.05.59; Tue, 15 Feb 2022 08:06:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="D/mNPmTV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239770AbiBOPZz (ORCPT + 99 others); Tue, 15 Feb 2022 10:25:55 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:42092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236753AbiBOPZy (ORCPT ); Tue, 15 Feb 2022 10:25:54 -0500 Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E57B8E19B for ; Tue, 15 Feb 2022 07:25:44 -0800 (PST) Received: by mail-yb1-xb35.google.com with SMTP id 124so28961401ybn.11 for ; Tue, 15 Feb 2022 07:25:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=857+qV47ddm8WE8KTWXLHPQ7djLUydmVO+RXbwgQrfk=; b=D/mNPmTVJktt/9sUx/SBEKcVTfsa+TjCn2lpj52UUTHad0PYCP5Y5GRCI5TsYp/Qb4 +3GRI9/4MzexhaYzhS1sAqdgGVT0f+0MRLF/NNyIfoc/F4Gny6uGGwGZX5yY/oC69zyU q7sKT2+fhC81GUp9Tt55QBzS+BOkBsnQqsHYKpOfiPoQZ+Obe1Xgjr8O8vtBz+eAaMa4 y3o7yKo5gCKZxcglyuEHZlXGeK4UGpDYUtiyhWlGWOVamG4O4oaeCRkDm2Zcpr2kGGK7 l1pP12h2lndBiXfbkyBb2/wULR3aPBHDIuFoWjZGf3a7HZM2PbN3evJ5bxgQTp+mMTYf QD7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=857+qV47ddm8WE8KTWXLHPQ7djLUydmVO+RXbwgQrfk=; b=AG21oKsVkntItY9jFs3+cTVw8cgvf4pMzI48jMG6venYrj7S6f4ACu/YK+3g967KDY 9ai/obyhOCWM5ZU1uXmnnHS6M4vPg5Q9Z/ZydENZzeADx6KJwnf4XZ/Mbju/GX82u2L4 COCwdpyEOQUfjG0QoJACUcmVQs7rXte2MxvUvo5WLOCQLMMrlYYSBMrQw4DS3fM0TU1E qtMRdjmULC9Yzc1UOm9F5vP0lIgJq93fLmuCaqSR1THqZV3nh5j6HU9AzhUN0zx9Boce xdsQ+CSYXVw+R3JHdl8mHhwWyse37KDfwD+laQkaG7sEF3uYHkhAOZi49uowLh8zZMSV 6Qsw== X-Gm-Message-State: AOAM5313hEGlOPmlovnj4i6pzKRMocf9C9Shm4ojiPsUpVia3w2PL5Uc 5qR7dbYNrcH7mPn82Rx3+vwB03W5McxZIs9glCI2hQ== X-Received: by 2002:a81:ff05:: with SMTP id k5mr4176036ywn.474.1644938742804; Tue, 15 Feb 2022 07:25:42 -0800 (PST) MIME-Version: 1.0 References: <20220215103639.11739-1-kerneljasonxing@gmail.com> In-Reply-To: <20220215103639.11739-1-kerneljasonxing@gmail.com> From: Eric Dumazet Date: Tue, 15 Feb 2022 07:25:31 -0800 Message-ID: Subject: Re: [PATCH] net: do not set SOCK_RCVBUF_LOCK if sk_rcvbuf isn't reduced To: Jason Xing Cc: David Miller , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Paolo Abeni , Wei Wang , Alexander Aring , Yangbo Lu , Florian Westphal , Tonghao Zhang , Thomas Gleixner , netdev , LKML , bpf , Jason Xing Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 15, 2022 at 2:37 AM wrote: > > From: Jason Xing > > Normally, user doesn't care the logic behind the kernel if they're > trying to set receive buffer via setsockopt. However, if the new value > of the receive buffer is not smaller than the initial value which is > sysctl_tcp_rmem[1] implemented in tcp_rcv_space_adjust(), the server's > wscale will shrink and then lead to the bad bandwidth. I think it is > not appropriate. Then do not use SO_RCVBUF ? It is working as intended really. > > Here are some numbers: > $ sysctl -a | grep rmem > net.core.rmem_default = 212992 > net.core.rmem_max = 40880000 > net.ipv4.tcp_rmem = 4096 425984 40880000 > > Case 1 > on the server side > # iperf -s -p 5201 > on the client side > # iperf -c [client ip] -p 5201 > It turns out that the bandwidth is 9.34 Gbits/sec while the wscale of > server side is 10. It's good. > > Case 2 > on the server side > #iperf -s -p 5201 -w 425984 > on the client side > # iperf -c [client ip] -p 5201 > It turns out that the bandwidth is reduced to 2.73 Gbits/sec while the > wcale is 2, even though the receive buffer is not changed at all at the > very beginning. Great, you discovered auto tuning is working as intended. > > Therefore, I added one condition where only user is trying to set a > smaller rx buffer. After this patch is applied, the bandwidth of case 2 > is recovered to 9.34 Gbits/sec. > > Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt") This commit has nothing to do with your patch or feature. > Signed-off-by: Jason Xing > --- > net/core/filter.c | 7 ++++--- > net/core/sock.c | 8 +++++--- > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 4603b7c..99f5d9c 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -4795,9 +4795,10 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname, > case SO_RCVBUF: > val = min_t(u32, val, sysctl_rmem_max); > val = min_t(int, val, INT_MAX / 2); > - sk->sk_userlocks |= SOCK_RCVBUF_LOCK; > - WRITE_ONCE(sk->sk_rcvbuf, > - max_t(int, val * 2, SOCK_MIN_RCVBUF)); > + val = max_t(int, val * 2, SOCK_MIN_RCVBUF); > + if (val < sock_net(sk)->ipv4.sysctl_tcp_rmem[1]) > + sk->sk_userlocks |= SOCK_RCVBUF_LOCK; > + WRITE_ONCE(sk->sk_rcvbuf, val); > break; > case SO_SNDBUF: > val = min_t(u32, val, sysctl_wmem_max); > diff --git a/net/core/sock.c b/net/core/sock.c > index 4ff806d..e5e9cb0 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -923,8 +923,6 @@ static void __sock_set_rcvbuf(struct sock *sk, int val) > * as a negative value. > */ > val = min_t(int, val, INT_MAX / 2); > - sk->sk_userlocks |= SOCK_RCVBUF_LOCK; > - > /* We double it on the way in to account for "struct sk_buff" etc. > * overhead. Applications assume that the SO_RCVBUF setting they make > * will allow that much actual data to be received on that socket. > @@ -935,7 +933,11 @@ static void __sock_set_rcvbuf(struct sock *sk, int val) > * And after considering the possible alternatives, returning the value > * we actually used in getsockopt is the most desirable behavior. > */ > - WRITE_ONCE(sk->sk_rcvbuf, max_t(int, val * 2, SOCK_MIN_RCVBUF)); > + val = max_t(int, val * 2, SOCK_MIN_RCVBUF); > + if (val < sock_net(sk)->ipv4.sysctl_tcp_rmem[1]) > + sk->sk_userlocks |= SOCK_RCVBUF_LOCK; > + > + WRITE_ONCE(sk->sk_rcvbuf, val); > } > > void sock_set_rcvbuf(struct sock *sk, int val) You are breaking applications that want to set sk->sk_rcvbuf to a fixed value, to control memory usage on millions of active sockets in a host. I think that you want new functionality, with new SO_ socket options, targeting net-next tree (No spurious FIxes: tag) For instance letting an application set or unset SOCK_RCVBUF_LOCK would be more useful, and would not break applications.