Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp1236928rwb; Fri, 28 Jul 2023 06:47:22 -0700 (PDT) X-Google-Smtp-Source: APBJJlGwuwOenKWREetK6G32XcZy26n8vHJeeiQ7/GKLlYLrU8d53JDce5Cx68btjfILN8jL2vi3 X-Received: by 2002:a17:90a:858d:b0:263:4962:94c6 with SMTP id m13-20020a17090a858d00b00263496294c6mr1740237pjn.20.1690552042399; Fri, 28 Jul 2023 06:47:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690552042; cv=none; d=google.com; s=arc-20160816; b=Xm2UPx6+394p7xA2RNFWrlF3uBTdq2MODLbvZUjDQC+vk36qkNVpMSv9jWJqLiQHSl XpkciCQN1hXBPcykDccs5VqGGs/owq7WqFqEIVQOMt1cftKIVRnEZKjRskOTG8WtQmi+ GfNFn7TfPY1do0BED4j18sLI6/7GoQCzi0kT4ZasYXw+v2FqEZuMRuQafhnsQtkL0G9K +jKfaXZaH2J2Q+DSr2jzSfpwLLA23SWx9+Zsz0w4J3VWTIHfX+amwxEeevtW+0kiO6Oy nzZKgH5qG6ffRmsNNqFKnpOClmiiCQByxozLoXu6C3ibdOuBtshyyn7PbD8VlKenvigQ FKbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :content-language:references:cc:to:subject:from:user-agent :mime-version:date:message-id:dkim-signature; bh=dZhd6Z+1TRytPTwd9c5Pl7JGaMAas8iSnbHYztoIY+I=; fh=3dIbFeWGpZpvv9AQTe4aeh7NuFUY52o52oP5nbFUC+g=; b=iqsM0iqSRPtym0wR0SHLEHiDjdm+EG+1gSlWyq/h87gUdPPVAy40zYy1Nu2Nd9Vn3R 1pY9bPoUuRMqZ6AQqVps+IfEVFVXQg6tbSuiTWFS+82Q6AmjoDXN/bayqZ5P4/5aFtBO dIGtGBmNAxeiam6PnldQHJ4RSIJDYV0T2nMzYHdqL2cC1bYepTd2o55QDw4f7ug+y/DU W/OLEWunuuWxY+H48AGy9dGI2gtdBL/phxuT2JC1Cd70gI2/HSo/DHRnLlXDt8RvDp0Y un5cnE131dNsjMeSZwnEiD2KlXz3/X0NH8zZce7umOywvfcHyy1Fbfkf6gdzujfI+NNl 3+YA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=TgN5QPu9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id me15-20020a17090b17cf00b00267d70fe0f7si3395518pjb.23.2023.07.28.06.47.09; Fri, 28 Jul 2023 06:47:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=TgN5QPu9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234758AbjG1Mpm (ORCPT + 99 others); Fri, 28 Jul 2023 08:45:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234614AbjG1Mpl (ORCPT ); Fri, 28 Jul 2023 08:45:41 -0400 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB918187 for ; Fri, 28 Jul 2023 05:45:13 -0700 (PDT) Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-3a426e70575so1383180b6e.0 for ; Fri, 28 Jul 2023 05:45:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690548313; x=1691153113; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=dZhd6Z+1TRytPTwd9c5Pl7JGaMAas8iSnbHYztoIY+I=; b=TgN5QPu9YYBwRx1weOr/WieWPm5UErXX8Yl2ZnYRnuydyDECUF4gTKm/YstmY5l56B +wnzxryDqAXht0H4tNvFivjOPlHsYMJXJT16Ya4FZKhXjvLU+tu2ANlM2Hnggey8OizN kDKwNfr3/H1DGnnb8CFZ3Da0gxp96slSEQTjWN+YY002DAwTEY+vBkU5qAN2XCWJu/gT esYhuHCDSJ7t6D7godXKexmEfaiKr8p6lUWl2qVGZEu58BKWOAgb1Dxwj7NUCNOh0esu 0woAXuC1jDCs0SqpMA1m8vH5WR568aWfDh6ShMuQiWhHo8Rn/ZYpvSR8Y3tRNJRX9GW7 st9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690548313; x=1691153113; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dZhd6Z+1TRytPTwd9c5Pl7JGaMAas8iSnbHYztoIY+I=; b=DHz379miYt4xsN5Ookm7DMWV6Ki99dKEfRvtG5IttiXQxVXiCi1684FAELvzROUyRs 7D5B8j+HFjN6onEAPofaibMzUV1z2HZ3R1UCYP6lcW2CG/V5EAtBDwFsBZmxGzstTUhZ hjdx3qEG6/Yu9gocqQjl9gxhXM7QkMBdH/gMRkpBLmwNoZT4iJAw8xX66baQLL7gh8A7 vGq0w0n0bpSlTIsJOlYTU7Oz/SDe8+EXmZAORuuk4agtGPqvjEVJ7/PgwRRpJPzTq40t o7tcU+3ssmozladjtvSJVEG30HWBAcNiGKigM4Z4P8t42VdSveRiQqrGdRAZB2i179uI FWwQ== X-Gm-Message-State: ABy/qLY1kX2PJYM2HcjBv0WlQMproEdtPILc17x4TKxI6egkalgx8yhE vml6k/qUxlm5BWbbJhr0HrImBg== X-Received: by 2002:a05:6808:e82:b0:3a4:6b13:b721 with SMTP id k2-20020a0568080e8200b003a46b13b721mr2742390oil.46.1690548313043; Fri, 28 Jul 2023 05:45:13 -0700 (PDT) Received: from [10.254.173.0] ([139.177.225.238]) by smtp.gmail.com with ESMTPSA id ep11-20020a17090ae64b00b00262eccfa29fsm4145574pjb.33.2023.07.28.05.45.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jul 2023 05:45:12 -0700 (PDT) Message-ID: Date: Fri, 28 Jul 2023 20:45:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 From: Abel Wu Subject: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure To: Roman Gushchin Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Kefeng Wang , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Alexander Mikhalitsyn , Breno Leitao , David Howells , Jason Xing , Xin Long , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" References: <20230711124157.97169-1-wuyun.abel@bytedance.com> <58e75f44-16e3-a40a-4c8a-0f61bbf393f9@bytedance.com> <29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/27/23 8:19 AM, Roman Gushchin wrote: > On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote: >> On 7/26/23 10:56 AM, Roman Gushchin wrote: >>> On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote: >>>> Hi Roman, thanks for taking time to have a look! >>>>> >>>>> Overall I think it's a good idea to clean these things up and thank you >>>>> for working on this. But I wonder if we can make the next step and leave only >>>>> one mechanism for both cgroup v1 and v2 instead of having this weird setup >>>>> where memcg->socket_pressure is set differently from different paths on cgroup >>>>> v1 and v2. >>>> >>>> There is some difficulty in unifying the mechanism for both cgroup >>>> designs. Throttling socket memory allocation when memcg is under >>>> pressure only makes sense when socket memory and other usages are >>>> sharing the same limit, which is not true for cgroupv1. Thoughts? >>> >>> I see... Generally speaking cgroup v1 is considered frozen, so we can leave it >>> as it is, except when it creates an unnecessary complexity in the code. >> >> Are you suggesting that the 2nd patch can be ignored and keep >> ->tcpmem_pressure as it is? Or keep the 2nd patch and add some >> explanation around as you suggested in last reply? > > I suggest to split a code refactoring (which is not expected to bring any > functional changes) and an actual change of the behavior on cgroup v1. > Re the refactoring: I see a lot of value in adding comments and make the > code more readable, I don't see that much value in merging two variables. > But if it comes organically with the code simplification - nice. I see, thanks for the clarification! > >>> I'm curious, was your work driven by some real-world problem or a desire to clean >>> up the code? Both are valid reasons of course. >> >> We (a cloud service provider) are migrating users to cgroupv2, >> but encountered some problems among which the socket memory >> really puts us in a difficult situation. There is no specific >> threshold for socket memory in cgroupv2 and relies largely on >> workloads doing traffic control themselves. >> >> Say one workload behaves fine in cgroupv1 with 10G of ->memory >> and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2 >> with 11G of ->memory due to burst memory usage on socket. >> >> It's rational for the workloads to build some traffic control >> to better utilize the resources they bought, but from kernel's >> point of view it's also reasonable to suppress the allocation >> of socket memory once there is a shortage of free memory, given >> that performance degradation is better than failure. > > Yeah, I can see it. But Idk if it's too workload-specific to have > a single-policy-fits-all-cases approach. > E.g. some workloads might prefer to have a portion of pagecache > being reclaimed. > What do you think? Now the memcg is considered to be under pressure if the number of pages reclaimed is much less than desired. I doubt it could be a win in such case to spend more time on reclaiming while letting socket continue to allocate memory (which could make things worse), compared to relieving reclaim pressure and putting time on its real work. Best, Abel