Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp4230958imw; Tue, 12 Jul 2022 04:31:07 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sFkShLPo3VgwjMeUeoQF+IM91u0vmnyzU149tHpuLQxZ5YcnA0FO5jpSoj+yPG6IkN4vPy X-Received: by 2002:a63:90c9:0:b0:413:9877:d8aa with SMTP id a192-20020a6390c9000000b004139877d8aamr20851047pge.298.1657625467689; Tue, 12 Jul 2022 04:31:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657625467; cv=none; d=google.com; s=arc-20160816; b=BC9wIJypMpi4N0KngUqD/cUr/a3zEV/xW+89A84mW40ZPEfFd9qQ/KQocN6prad2ml lImT2qceN1+bX7fRneQZFepgs7xWvXcg8chqiAa59fU3UJDtu7px8Q6A6zcm6tMYiPBR ba/vWOerBf/xn6j2LegGvAEvSvPpdEg8O/6rie9/SoPTsGEyRTdehDdyDP5EUoggJGiR nzvy+NX9iK6QSRmAwWnM49dXSqZgGUh9cUv6YrReDC8gf4T2Yd28EDlPBmPsvSycNtqG pFRo4ftMeOt7gP6c/qDTdR8PcmBn/DBkSaNsMsX05P6vay2n4pwebrFjJDgIOxPEb1Zc x9rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Qb55Kbq0y4vwgfIg3wFDxJyNQoImybJoWJeGkFEBt8Y=; b=dr9UHiRv8Wsj1WANb8gOLQhnLHmr8Ruyc6hnPX9AUIqFzDj494b9seCV76GDzXJ1fN oaScXrD8px1b3zo6PPPzkDf4fsg5udfQbL5aDL4nDFUE2xVZ9qsmjzFQ/VQNknf5ifIz COLW3nUluPJVD8R8ISBZbGjBRkvc6EWONtQA/B9iF1cHW3R/PSpZxmVGftzg4VZWceW6 z9S69q7QsMBqsYxugUPE9TfrOQOrzlbCBE1Y145SEO0A1kaH4cN3F7getrRhHusPDCq4 88TEwQvCACqXSOoYOg5eRQtOTLJ9eTg6lXEwRh80IfF22r3gPVUiPhd4KCWzsOKdHJKy djug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1j7HJgDk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q7-20020a170902f78700b0016bd8971c85si12572322pln.205.2022.07.12.04.30.37; Tue, 12 Jul 2022 04:31:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1j7HJgDk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232941AbiGLLNE (ORCPT + 99 others); Tue, 12 Jul 2022 07:13:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231160AbiGLLMj (ORCPT ); Tue, 12 Jul 2022 07:12:39 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50100B1846 for ; Tue, 12 Jul 2022 04:12:36 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id o3-20020a17090a744300b001ef8f7f3dddso7614528pjk.3 for ; Tue, 12 Jul 2022 04:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=Qb55Kbq0y4vwgfIg3wFDxJyNQoImybJoWJeGkFEBt8Y=; b=1j7HJgDk+BOlvIrPto5v/w0UyoFz/lnQv9wgfOdsMphjfRFqT2DIg1DRCHe62jotIw Wcd/TP+YDwzfz1hnJ6brTDEGw2ZuVW3aJM4o1q8Q0gS/+iGgF/XMxyvjeSCbVzLv2yv/ AZOOfT7iwDEcacqkH1Rg/gSvMy7ht63jN/2WIxL6byrFEmQFcJ+MCxNo/6iVX5tgrM46 jeNMwF7+1ZlSigpdyBSpowNoE2GVGisPcjq7xOSY0ZJs36ILM/NQQSm4zFPQUpdNls/C 2lw78hd8AKVpILhM+sGMlgeveyvYN0B+WRx8Por5mmiOEpPIllo6NLOIUa7ziMhmBgP1 rMMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=Qb55Kbq0y4vwgfIg3wFDxJyNQoImybJoWJeGkFEBt8Y=; b=j/RPBINkaKB4KIPoX8tRedIyxx/ZVmMtdAebbjcV4FTBObSN+25XD5FZN3L5ilODFT KCqckuxxeC3PrEc+hBoTz8aVd3NABs5BeBM8EDEJ3bsSx1pycyhe2k8ECESxVuTmUPQZ O1hHQp7MPw1/xWyHki9DT/uKxqaS9KIwalqD6KKUa4uR/BnP8t16M7jJ2Sfb2jKOQNsU UinU/HKDorO0gWqs1/F4sYRyTaQo4V/GQysCyk1gKZzBbcdt6SGuHadAEEf/nq14+uYl YiLZPeixFzyv1C+5ZXfHyGrYBfU/x6SvbYV64pWbO7v9lRB6xAynTO5AFKxji+O05Nyd GgKw== X-Gm-Message-State: AJIora80nudj8n1yElUVKIFGw1gQI9nQf81202VOOlS+RyUcE17qAsm5 eYvpzoI7TgHe7aiF/5NGzeKHFw== X-Received: by 2002:a17:90b:2c0b:b0:1ef:aa42:f19b with SMTP id rv11-20020a17090b2c0b00b001efaa42f19bmr3729663pjb.211.1657624355862; Tue, 12 Jul 2022 04:12:35 -0700 (PDT) Received: from [10.4.113.6] ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id t7-20020a17090340c700b0016c59b38254sm1550585pld.127.2022.07.12.04.12.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Jul 2022 04:12:35 -0700 (PDT) Message-ID: <41ae31a7-6998-be88-858c-744e31a76b2a@bytedance.com> Date: Tue, 12 Jul 2022 19:12:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2 0/5] mm, oom: Introduce per numa node oom for CONSTRAINT_{MEMORY_POLICY,CPUSET} Content-Language: en-US To: Michal Hocko , Gang Li Cc: akpm@linux-foundation.org, surenb@google.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, viro@zeniv.linux.org.uk, ebiederm@xmission.com, keescook@chromium.org, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, adobriyan@gmail.com, yang.yang29@zte.com.cn, brauner@kernel.org, stephen.s.brennan@oracle.com, zhengqi.arch@bytedance.com, haolee.swjtu@gmail.com, xu.xin16@zte.com.cn, Liam.Howlett@oracle.com, ohoono.kwon@samsung.com, peterx@redhat.com, arnd@arndb.de, shy828301@gmail.com, alex.sierra@amd.com, xianting.tian@linux.alibaba.com, willy@infradead.org, ccross@google.com, vbabka@suse.cz, sujiaxun@uniontech.com, sfr@canb.auug.org.au, vasily.averin@linux.dev, mgorman@suse.de, vvghjk1234@gmail.com, tglx@linutronix.de, luto@kernel.org, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org, hezhongkun.hzk@bytedance.com References: <20220708082129.80115-1-ligang.bdlg@bytedance.com> From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On 7/8/22 4:54 PM, Michal Hocko Wrote: > On Fri 08-07-22 16:21:24, Gang Li wrote: >> TLDR >> ---- >> If a mempolicy or cpuset is in effect, out_of_memory() will select victim >> on specific node to kill. So that kernel can avoid accidental killing on >> NUMA system. > > We have discussed this in your previous posting and an alternative > proposal was to use cpusets to partition NUMA aware workloads and > enhance the oom killer to be cpuset aware instead which should be a much > easier solution. > >> Problem >> ------- >> Before this patch series, oom will only kill the process with the highest >> memory usage by selecting process with the highest oom_badness on the >> entire system. >> >> This works fine on UMA system, but may have some accidental killing on NUMA >> system. >> >> As shown below, if process c.out is bind to Node1 and keep allocating pages >> from Node1, a.out will be killed first. But killing a.out did't free any >> mem on Node1, so c.out will be killed then. >> >> A lot of AMD machines have 8 numa nodes. In these systems, there is a >> greater chance of triggering this problem. > > Please be more specific about existing usecases which suffer from the > current OOM handling limitations. I was just going through the mail list and happen to see this. There is another usecase for us about per-numa memory usage. Say we have several important latency-critical services sitting inside different NUMA nodes without intersection. The need for memory of these LC services varies, so the free memory of each node is also different. Then we launch several background containers without cpuset constrains to eat the left resources. Now the problem is that there doesn't seem like a proper memory policy available to balance the usage between the nodes, which could lead to memory-heavy LC services suffer from high memory pressure and fails to meet the SLOs. It's quite appreciated if you can shed some light on this! Thanks & BR, Abel