Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1933168rdb; Tue, 3 Oct 2023 05:49:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFZ6dIwRN1MJzUDN2bpmazIzxM57mXiExabO5mkvQanlvp0cUrawypj8T3OPL9lRS6q/J9l X-Received: by 2002:a05:6870:972c:b0:1ba:caf2:acc3 with SMTP id n44-20020a056870972c00b001bacaf2acc3mr17709709oaq.5.1696337375430; Tue, 03 Oct 2023 05:49:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696337375; cv=none; d=google.com; s=arc-20160816; b=L9QtN1IfS145B79hDm3wtWzZu1MZQBifj2S7155QdoV+JBmaDk5UH2Kz1rF/HG+wP1 8bmGiRLRpEBaj2uNd+geedRn6BXV20yerL7O3hFCj8cnQjk+xvzy0Py8GU1dte+kuSEM dyzlLfHSUlyFkIj1cS/GRl79trP81AlO4bMsNErjT9qNClMaUKROBlAdhXd8pNcp7ikm SvrT3M4EPF0vdTFlZxA0zr/D8B7ijP0weeqkgpLSqlMggt7+eRnsqZ5GP9LAga17ez3o rL8/3nYooaS7NgosmKBTS1Lgdgh+MC1g0C5y1mXaSjZRiH1DEkXVJgYCtWf9Xd0bdSvv Z8tA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=SN3E964ODCJjpz85I+WSpRkPsUEBZcxQzYCTWvMspok=; fh=0m3gyEBZvNqjuZ7kEuCvDqCtgOlUwTGeYotZuvYvXjE=; b=JLl8cYNJ2N3HiBlHSN3cyg/xYBGQk8OVS0qyJCIdugyoiuKszCzlUPhVIESgjJzoBC g5lcokFhN1Bjnu0MEm3ER/A6m0T0AXpEfwqYgoMW0dBe6fttOozMCk31ugNSPE/fb5+k 7ou2oxDF5PlzCWQ9BA2XExPMOwkY9ElxPdzVCLd1ICdRxsbbM7QRBDRehlFDDe/rMJGH qgB2GpK0FSsmVjznRUHEnkkTwf5UjqaJJbvx1lCd9X54B80oUTsfygEhdCqcMeM0Ej5G IOWDxqpg0Uhgn3q05oIyzzz1nDFy5ygzya2f64T5ntbfMt5mYyhwrq2at4M5EKC6841y WSCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="FSR/D/BG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id t23-20020a656097000000b005855f67e48csi1335460pgu.697.2023.10.03.05.49.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 05:49:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="FSR/D/BG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id ED8E18145960; Tue, 3 Oct 2023 05:49:32 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231232AbjJCMtV (ORCPT + 99 others); Tue, 3 Oct 2023 08:49:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230283AbjJCMtU (ORCPT ); Tue, 3 Oct 2023 08:49:20 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADB09A6 for ; Tue, 3 Oct 2023 05:49:16 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6910ea9cca1so622223b3a.1 for ; Tue, 03 Oct 2023 05:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1696337356; x=1696942156; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=SN3E964ODCJjpz85I+WSpRkPsUEBZcxQzYCTWvMspok=; b=FSR/D/BGssyVjkBE3jLTpQOdL2tRItJq+u4rAEFew31UK4bf0BrQBKM6WqxGR1GRii H82vTH2wciOLlkDT+IoUnIfkV15RZb96WZM0IzOjgLjwaaojyoJQtdOEPVPeCib7Cw4K Gl3IZn74Gs7M6vJ/2VOdXdztWfnTjdPdf7Zz6pV4nEUQ7E0k/HjYUnC7erwf7Le/LArJ yhGwl1WY9lzMo76UHHLEUYDfAKbtK6MrqegPQXMi9M22tqHJ03TKjtxs2Tty6UBCewHq faL6HxefbQfOdo5pqVxEFXZsD4iIAup04vkVaS0Xo5VLjnCVxhvOP8Fj/abRu9+xgU2N akxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696337356; x=1696942156; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SN3E964ODCJjpz85I+WSpRkPsUEBZcxQzYCTWvMspok=; b=YZyfJb+Fqs+CGw5992vH+HigP7WbOFOxikgewZM7y8XQ55q7K5CQnW9dOf4djeeZSp S+1Dq2apUdDvNnmAy/nmPgzNsyRe3knhNRbynL2ldzTwYXoWzpjZtC3C40BdqWwb7wdA Md51uev0EZ3y6P+QqQG/yR/cndWDIKK0ilI66OzeNyQMW/vnPEWsQqRcy7A9/+ePhiDc aANHTS6FSlKLXzGHAOcJr//GyERk0sYnOOFib6diLipijDEgtIKtHWVgFYbC9zGFDgND Unfr12zPEY4vzSlthc39bhzt9ibFSaVpf7E1lrj4GZPkblO0M9nJMh9Nqo3lRqI/nMRt iWKQ== X-Gm-Message-State: AOJu0YyGg58UP8QgsLnYiN9qgix4CIEt+4fwtknUJodAudJeEc0cT53z ZS6tOSpnbJ+RPKE41zHdlxEsQA== X-Received: by 2002:a05:6a00:1254:b0:691:2d4:238e with SMTP id u20-20020a056a00125400b0069102d4238emr12585943pfi.6.1696337356110; Tue, 03 Oct 2023 05:49:16 -0700 (PDT) Received: from [10.254.175.124] ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id q17-20020a62e111000000b0068ff6d21563sm1311489pfh.148.2023.10.03.05.49.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 03 Oct 2023 05:49:15 -0700 (PDT) Message-ID: <7551924f-a9b6-4bb8-bfe9-e3efcf0da438@bytedance.com> Date: Tue, 3 Oct 2023 20:49:08 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory Content-Language: en-US To: Shakeel Butt Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Breno Leitao , Alexander Mikhalitsyn , David Howells , Jason Xing , Xin Long , KAMEZAWA Hiroyuki , "open list:NETWORKING [GENERAL]" , open list References: <20230920132545.56834-1-wuyun.abel@bytedance.com> <20230920132545.56834-2-wuyun.abel@bytedance.com> <20230921190156.s4oygohw4hud42tx@google.com> <82c0a442-c7d7-d0f1-54de-7a5e7e6a31d5@bytedance.com> <71ac08d3-9f36-e0de-870e-3e252abcb66a@bytedance.com> <20230924072816.6ywgoe7ab2max672@google.com> From: Abel Wu In-Reply-To: <20230924072816.6ywgoe7ab2max672@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 03 Oct 2023 05:49:33 -0700 (PDT) On 9/24/23 3:28 PM, Shakeel Butt wrote: > On Fri, Sep 22, 2023 at 06:10:06PM +0800, Abel Wu wrote: > [...] >> >> After a second thought, it is still vague to me about the position >> the memcg pressure should be in socket memory allocation. It lacks >> convincing design. I think the above hunk helps, but not much. >> >> I wonder if we should take option (3) first. Thoughts? >> > > Let's take a step further. Let's decouple the memcg accounting and > global skmem accounting. __sk_mem_raise_allocated is already very hard > to reason. There are couple of heuristics in it which may or may not > apply to both accounting infrastructures. > > Let's explicitly document what heurisitics allows to forcefully succeed > the allocations i.e. irrespective of pressure or over limit for both > accounting infras. I think decoupling them would make the flow of the > code very clear. I can't agree more. > > There are three heuristics: I found all of them were first introduced in linux-2.4.0-test7pre1 for TCP only, and then migrated to socket core in linux-2.6.8-rc1 without functional change. > > 1. minimum buffer size even under pressure. This is required by RFC 7323 (TCP Extensions for High Performance) to make features like Window Scale option work as expected, and should be succeeded under global pressure by tcp_{r,w}mem's definition. And IMHO for same reason, it should also be succeeded under memcg pressure, or else workloads might suffer performance drop due to bottleneck on network. The allocation must not be succeeded either exceed global or memcg's hard limit, or else a DoS attack can be taken place by spawning lots of sockets that are under minimum buffer size. > > 2. allow allocation for a socket whose usage is below average of the > system. Since 'average' is within the scope of global accounting, this one only makes sense under global memory pressure. Actually this exists before cgroup was born, hence doesn't take memcg into consideration. While OTOH the intention of throttling under memcg pressure is to relief the memcg from heavy reclaim pressure, this heuristic does no help. And there also seems to be no reason to succeed the allocation when global or memcg's hard limit is exceeded. > > 3. socket is over its sndbuf. TBH I don't get its point.. > > Let's discuss which heuristic applies to which accounting infra and > under which state (under pressure or over limit). I will follow your suggestion to post a patch to explicitly document the behaviors once things are cleared. Thanks, Abel