Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3529861pxb; Mon, 4 Apr 2022 20:05:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbkqVZuYo6XcB0JdiqLjEE2KpdvgggeRa8n7o+njPXUMkaPL6Zz32vtL/HDDS0xyzXgLxF X-Received: by 2002:a63:1c22:0:b0:385:fcae:cb3f with SMTP id c34-20020a631c22000000b00385fcaecb3fmr1115968pgc.102.1649127914794; Mon, 04 Apr 2022 20:05:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649127914; cv=none; d=google.com; s=arc-20160816; b=hSMz77mSl7zFaq8nlBKYEOPWSVpoPMlJUXwK328nLFMO4D9deA+u39QLtdUYhwKeAV bfPM5FVTVeOP8YkHEmu79CSGBsrfqPlHZe1vsEiyhiGJ7rXZX6zOe5U8a29ZbQ7SW0gJ UxBLSOKfIMGfThhRzE8W11HJTJJOE4/iSopHnvwYIguAgf1mlJ2EZJNQO+kLCl1b52N5 36W2Tw74gRruNJuxQfVCrZkQ+RQTjGFgKA6QfYu8FkdWuYIT59sD1gCN3WXb+DxrehK8 9Yb7g1OJ+TrMJYPy+G8YB4nqyMi3r3CIYMdw/mXuMGUFrL7cMs0c2BaU9Y6AHblf0JdS mMRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uTEQc40WAUSwoIQpKgQjJIExKC4lQtRNhQFFG5ZkcD4=; b=W/UuJHuAk+8vIgB7zyOc4589BjwUblyJSPnrTeyrmfsjuPZqxj9MuUR+z412BwX2aI Db2CJ7I9KukdgJCcDXn/mY/+XiljsStCoXak7CSfdH5D1Cy16jz2/oN1A2lBWlg7X2o4 lGWmRF90XspDVN08IfpIusFLdUHyXpPys6VszEsGFY2+hVYcomgkg2t9zH5P0SsENd2+ v40y7+/neM70MzOmEkKqqn77Xn2jwnnJPGSpYEYOJYfQ37ddPNiQa1sD5UqzSjG6MjI0 9BOrB+kExCLA8/5f/eT3+4NhYcWQL2aZ+fWT01M5prtBL+WdeT8RU6y+bLgXFVNLiVR8 HXEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=pSQrIEqH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id q25-20020a656859000000b00398eddc36e6si9798763pgt.676.2022.04.04.20.05.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 20:05:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=pSQrIEqH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8D4FC34F48D; Mon, 4 Apr 2022 18:29:25 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359412AbiDDLZM (ORCPT + 99 others); Mon, 4 Apr 2022 07:25:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235756AbiDDLZL (ORCPT ); Mon, 4 Apr 2022 07:25:11 -0400 Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43E4626569; Mon, 4 Apr 2022 04:23:15 -0700 (PDT) Received: by mail-qt1-x82d.google.com with SMTP id t2so7342837qtw.9; Mon, 04 Apr 2022 04:23:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uTEQc40WAUSwoIQpKgQjJIExKC4lQtRNhQFFG5ZkcD4=; b=pSQrIEqHbVdTq/rz7ZjoIHJeXGubsbHAjJLdZQp8l5AlsVTZJ+zGV6P/H00sP5zGjo q+P5rRL6Xf1ojgtC3ksx1Ef8i2OuaW6hN5Xvmhv81HeIWyvZeivSRdnL27IsTVbg6i3O 9B3re0wo1wSl/8BcUSuX4LVgJRfaDdhN6r87xtLvideB2JC7a+DPgCUTNGgGwOjgRVyW Zvxxz7gIeRHEVgHhHIX7quhCuYkWirUKgloE0bw5kmWIex1c1AUdUPNaRxvptrxfhIvR QjWrwJZu5C4IjXkkouQaycFHjCIqdJbz9C21zz2s0nyXkEv6veEazyX+AXCVSnLXMhE4 b+Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uTEQc40WAUSwoIQpKgQjJIExKC4lQtRNhQFFG5ZkcD4=; b=VWFV7Oec/fVIBxUwQvEbgsaAms7lflUJWSTLNvao/SyJeCQg0r71BJ8+DAuCj+pI+9 0bzl32qa5D6NiGvC53Le49fUXGvs3jz5861V+4sp2Lx7xp9amVNm9jjBaMc1tfKqQcJL 9f9x53GedjY8qd+gr2mX87g6/x9gAqpEP+Q0Brd0EfqB7V9FXPCcz9DU2eLptxokv/+Q atXRIpC2lmG/W1CAkPryx2gtaHhE6R7o8n3yQqazC50TX0z4y/ppTQyF5lEw67asTXHS EhfNZ1cNR2XVI9NEdGUVreZaXwnc3UjHaP5vrLLW/MtN9r8XOCPJNJ5TTu73+pd+WYxz P9RQ== X-Gm-Message-State: AOAM531mrDgYtTjdALUulziKls8dZAJluG0/JfSQaA/KaNX7ehFUL5FK kzWvlOuQ2qvwGYklxCcMmbUCOV/3vStMpJU+rfI= X-Received: by 2002:a05:622a:1999:b0:2e2:2928:db7d with SMTP id u25-20020a05622a199900b002e22928db7dmr17447404qtc.160.1649071394354; Mon, 04 Apr 2022 04:23:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zhaoyang Huang Date: Mon, 4 Apr 2022 19:23:03 +0800 Message-ID: Subject: Re: [RFC PATCH] cgroup: introduce dynamic protection for memcg To: Michal Hocko Cc: Suren Baghdasaryan , "zhaoyang.huang" , Andrew Morton , Johannes Weiner , Vladimir Davydov , "open list:MEMORY MANAGEMENT" , LKML , cgroups mailinglist , Ke Wang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 4, 2022 at 5:32 PM Michal Hocko wrote: > > On Mon 04-04-22 17:23:43, Zhaoyang Huang wrote: > > On Mon, Apr 4, 2022 at 5:07 PM Zhaoyang Huang wrote: > > > > > > On Mon, Apr 4, 2022 at 4:51 PM Michal Hocko wrote: > > > > > > > > On Mon 04-04-22 10:33:58, Zhaoyang Huang wrote: > > > > [...] > > > > > > One thing that I don't understand in this approach is: why memory.low > > > > > > should depend on the system's memory pressure. It seems you want to > > > > > > allow a process to allocate more when memory pressure is high. That is > > > > > > very counter-intuitive to me. Could you please explain the underlying > > > > > > logic of why this is the right thing to do, without going into > > > > > > technical details? > > > > > What I want to achieve is make memory.low be positive correlation with > > > > > timing and negative to memory pressure, which means the protected > > > > > memcg should lower its protection(via lower memcg.low) for helping > > > > > system's memory pressure when it's high. > > > > > > > > I have to say this is still very confusing to me. The low limit is a > > > > protection against external (e.g. global) memory pressure. Decreasing > > > > the protection based on the external pressure sounds like it goes right > > > > against the purpose of the knob. I can see reasons to update protection > > > > based on refaults or other metrics from the userspace but I still do not > > > > see how this is a good auto-magic tuning done by the kernel. > > > > > > > > > The concept behind is memcg's > > > > > fault back of dropped memory is less important than system's latency > > > > > on high memory pressure. > > > > > > > > Can you give some specific examples? > > > For both of the above two comments, please refer to the latest test > > > result in Patchv2 I have sent. I prefer to name my change as focus > > > transfer under pressure as protected memcg is the focus when system's > > > memory pressure is low which will reclaim from root, this is not > > > against current design. However, when global memory pressure is high, > > > then the focus has to be changed to the whole system, because it > > > doesn't make sense to let the protected memcg out of everybody, it > > > can't > > > do anything when the system is trapped in the kernel with reclaiming work. > > Does it make more sense if I describe the change as memcg will be > > protect long as system pressure is under the threshold(partially > > coherent with current design) and will sacrifice the memcg if pressure > > is over the threshold(added change) > > No, not really. For one it is still really unclear why there should be any > difference in the semantic between global and external memory pressure > in general. The low limit is always a protection from the external > pressure. And what should be the actual threshold? Amount of the reclaim > performed, effectivness of the reclaim or what? Please find bellowing for the test result, which shows current design has more effective protection when system memory pressure is high. It could be argued that the protected memcg lost the protection as its usage dropped too much. I would like to say that this is just the goal of the change. Is it reasonable to let the whole system be trapped in memory pressure while the memcg holds the memory? With regard to threshold, it is a dynamic decayed watermark value which represents the historic(watermark) and present(update to new usage if it expands again) usage. Actually, I have update the code by adding opt-in code which means this is a opt type of the memcg. This patch is coherent to the original design if user want to set the fixed value by default and also provide a new way of dynamic protected memcg without external monitor and interactivation. We simply test above change by comparing it with current design on a v5.4 based system in 3GB RAM in bellowing steps, via which we can find that fixed memory.low have the system experience high memory pressure with holding too much memory. 1. setting up the topology seperatly as [1] 2. place a memory cost process into B and have it consume 1GB memory from userspace. 3. generating global memory pressure via mlock 1GB memory. 4. watching B's memory.current and PSI_MEM. 5. repeat 3,4 twice. [1]. setting fixed low=500MB; low=600MB; wm_decay_factor=36(68s decay 1/2) A(low=500MB) / B(low=500MB) What we observed are: PSI_MEM, usage PSI_MEM,usage PSI_MEM,usage (Mlock 1GB) (Mlock 2GB) (stable) low=600MB s=23 f=17 u=720/600MB s=91 f=48 u=202MB s=68 f=32 u=106MB low=500MB s=22 f=13 u=660/530MB s=88 f=50 u=156MB s=30 f=20 u=120MB patch s=23 f=12 u=692/470MB s=40 f=23 u=67MB s=21 f=18 u=45MB > -- > Michal Hocko > SUSE Labs