Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp2811172rdb; Fri, 22 Sep 2023 09:00:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEPDUhhpvBh5TYjw1WRPbhADGszR0cxEYhr0Svsm0+9/9xzLO5KERqYb6Tobx3qqdlxBd8R X-Received: by 2002:a05:6a00:2441:b0:68f:e810:e87e with SMTP id d1-20020a056a00244100b0068fe810e87emr8629476pfj.31.1695398420438; Fri, 22 Sep 2023 09:00:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695398420; cv=none; d=google.com; s=arc-20160816; b=tOpKyLoL78T9dMJQW9gCwIS2QeXXP1GpcdMDJt3KKFRnGigu/VY4/v4T4OUg9+leUR +p97XvRHuQLvXh+G8KnJMFZDCYJEx4grkIA3ygPkNmuwICylc6JQ/Yf+mZyvgp3x91H2 /SxRcwBKViSIpDfhcjOQIHmMfQZ6V4H6dVU2kE3nkfWNA2MsYqSwbktDD3RM8yR72oYQ gVefc0rR3AOTJ9d4ZxvJj2uYLuoQ/6oqvZtWgWAGiwH2S9bII4PXWXDRtcdc7O3HseTJ ITmBfPYX34pt27nOcf4O6sdpKd8ZuMYx/95abuIOoK6aQv3uk//3eLGTzchTaqynSRHq Iriw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=F2ufTk+SzQXAhmOt0ss0e42lnYpRlYTzhmX5B7/leRY=; fh=0m3gyEBZvNqjuZ7kEuCvDqCtgOlUwTGeYotZuvYvXjE=; b=vQH2k+w/fmkqnsF1SksCtyHdUUP6s3EYDDIwXeL4/QoQ4msiaTkpYeMaObXJ8EYhwq JMOECbC/Yl/K2QdyknsbChgbLHhUdn1Ay1cdqs1DReH7CLHEOof2KIirOUk+dy3hlGBS tvR38FHrWcYWr+BYYlFn6D2paPb2iYDWsgwach61T5j/FHPKkmlw5BILUtCUOHIESh+o l1U43OpNwy06RKtKyRZ82mf9l7U3vSm3CK4k1CDdIIA+YaN9SiiiUISoGYNezEvG0/q9 MmB7FJE1Rww1/0gC7UXPoKVscEj8d25YVkBCWdAKCnZR4J6SNd3Npz38124Rr80pORM5 4M+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=FFJ4HrP3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id a7-20020a056a000c8700b00690f8edee16si4151063pfv.275.2023.09.22.09.00.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 09:00:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=FFJ4HrP3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 1F12682365CA; Fri, 22 Sep 2023 03:10:46 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233195AbjIVKKq (ORCPT + 99 others); Fri, 22 Sep 2023 06:10:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233191AbjIVKKn (ORCPT ); Fri, 22 Sep 2023 06:10:43 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14BE4F7 for ; Fri, 22 Sep 2023 03:10:15 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-274df3878a2so1359137a91.1 for ; Fri, 22 Sep 2023 03:10:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695377414; x=1695982214; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=F2ufTk+SzQXAhmOt0ss0e42lnYpRlYTzhmX5B7/leRY=; b=FFJ4HrP3kPrgXxxsiKz6vEPbePxc89gPQznYNlgPstWun+Xe7k144ikeKa0v53JCyX DXhWWBlZF3RBytQ3qzdo1Qco9FG7M5bybQ594j6LWUJoYFR6tG1we9jMltkbliv57U3s WCnIDsn1SBcgCS7H9mgbTnazqr9uQCgft4b3Shr4mRyA5Bcp0iyEhYRUxA3LVXO/ezqE TQONTeB2T+syxInIddzgoIw0XUABv2L1xGxrX83CLjFCp25oDdykSWLT7pY7lcSV+HTa /nG0yARncmNNVXcSIpx5oqZxBVK1sc4jgaMGjyWag/+1H4YQJsbVTvZ+cRm7eE2meChl PSZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695377414; x=1695982214; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F2ufTk+SzQXAhmOt0ss0e42lnYpRlYTzhmX5B7/leRY=; b=gr4Iaf/L60hKQm2aVh07iEWY9JlscEOBifXFTFoeVN+OvikYIVcZrSUTVq/g3Jrx8P PLZrhFhI4tvy0D8LdQuO7TpIIMpo6m5G9Uw/uJfETGY2RhOLLRLubF4yxAAxEASPGfTx EDupwsRVt/KbaawHPXSMYUheR9HuM4DAgh9W+LLNwH1uppcL9Pj7xmwAs7sO2vQz6iO3 BYhUaPsqLWaJVV5legmXbohcT+55PvH7pu9xc+/qEj2jd49V3xE3SyAReaRVupSt+31S Z+y0PHpEYqjxQLZbX+50jOv86uExFvf2+pumzXnr32KvBxWHrj8tD1Q2STEGC78tNTDK 2Vfg== X-Gm-Message-State: AOJu0Yy7HVZn0JwmEJB+oROffQVTUQ1hhWaZa8zukIditdT4txPI/e4W DDhoT8wSCq5X83E4dJ3zkwJFsBRbCh478F/XHk4= X-Received: by 2002:a17:90b:390a:b0:268:3f2d:66e4 with SMTP id ob10-20020a17090b390a00b002683f2d66e4mr7502221pjb.37.1695377414528; Fri, 22 Sep 2023 03:10:14 -0700 (PDT) Received: from [10.254.1.169] ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id 22-20020a17090a001600b0026b4d215627sm3186459pja.21.2023.09.22.03.10.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Sep 2023 03:10:14 -0700 (PDT) Message-ID: <71ac08d3-9f36-e0de-870e-3e252abcb66a@bytedance.com> Date: Fri, 22 Sep 2023 18:10:06 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory Content-Language: en-US From: Abel Wu To: Shakeel Butt Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Breno Leitao , Alexander Mikhalitsyn , David Howells , Jason Xing , Xin Long , KAMEZAWA Hiroyuki , "open list:NETWORKING [GENERAL]" , open list References: <20230920132545.56834-1-wuyun.abel@bytedance.com> <20230920132545.56834-2-wuyun.abel@bytedance.com> <20230921190156.s4oygohw4hud42tx@google.com> <82c0a442-c7d7-d0f1-54de-7a5e7e6a31d5@bytedance.com> In-Reply-To: <82c0a442-c7d7-d0f1-54de-7a5e7e6a31d5@bytedance.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 22 Sep 2023 03:10:46 -0700 (PDT) On 9/22/23 4:36 PM, Abel Wu wrote: > On 9/22/23 3:01 AM, Shakeel Butt wrote: >> On Wed, Sep 20, 2023 at 09:25:41PM +0800, Abel Wu wrote: >>> Before sockets became aware of net-memcg's memory pressure since >>> commit e1aab161e013 ("socket: initial cgroup code."), the memory >>> usage would be granted to raise if below average even when under >>> protocol's pressure. This provides fairness among the sockets of >>> same protocol. >>> >>> That commit changes this because the heuristic will also be >>> effective when only memcg is under pressure which makes no sense. >>> Fix this by skipping this heuristic when under memcg pressure. >>> >>> Fixes: e1aab161e013 ("socket: initial cgroup code.") >>> Signed-off-by: Abel Wu >>> --- >>>   net/core/sock.c | 10 +++++++++- >>>   1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/core/sock.c b/net/core/sock.c >>> index 379eb8b65562..ef5cf6250f17 100644 >>> --- a/net/core/sock.c >>> +++ b/net/core/sock.c >>> @@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, >>> int size, int amt, int kind) >>>       if (sk_has_memory_pressure(sk)) { >>>           u64 alloc; >>> -        if (!sk_under_memory_pressure(sk)) >>> +        if (memcg && mem_cgroup_under_socket_pressure(memcg)) >>> +            goto suppress_allocation; >>> + >>> +        if (!sk_under_global_memory_pressure(sk)) >>>               return 1; >> >> I am onboard with replacing sk_under_memory_pressure() with >> sk_under_global_memory_pressure(). However suppressing on memcg pressure >> is a behavior change from status quo and need more thought and testing. >> >> I think there are three options for this hunk: >> >> 1. proposed patch >> 2. Consider memcg pressure only for !in_softirq(). >> 3. Don't consider memcg pressure at all. >> >> All three options are behavior change from the status quo but with >> different risk levels. (1) may reintroduce the regression fixed by >> 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure"). > > Just for the record, it is same for the current upstream implementation > if the socket reaches average usage. Taking option 2 will fix this too. > >> (2) is more inlined with 720ca52bcef22. (3) has the risk to making memcg >> limits ineffective. >> >> IMHO we should go with (2) as there is already a precedence in >> 720ca52bcef22. > > Yes, I agree. Actually applying option(2) would make this patch quite > similar to the previous version[a], except the below part: > >      /* Under limit. */ >      if (allocated <= sk_prot_mem_limits(sk, 0)) { >          sk_leave_memory_pressure(sk); > -        return 1; > +        if (!under_memcg_pressure) > +            return 1; >      } After a second thought, it is still vague to me about the position the memcg pressure should be in socket memory allocation. It lacks convincing design. I think the above hunk helps, but not much. I wonder if we should take option (3) first. Thoughts? Thanks, Abel > > My original thought is to inherit the behavior of tcpmem pressure. > There are also 3 levels of memcg pressure named low/medium/critical, > but considering that the 'low' level is too much conservative for > socket allocation, I made the following match: > >     PROTOCOL    MEMCG        ACTION >     ----------------------------------------------------- >     low            pressure    medium        be more conservative >     high        critical    throttle > > which also seems align with the design[b] of memcg pressure. Anyway > I will take option (2) and post v2. > > Thanks & Best, >     Abel > > [a] > https://lore.kernel.org/lkml/20230901062141.51972-4-wuyun.abel@bytedance.com/ > [b] > https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#memory-pressure