Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp3182422rwb; Wed, 30 Nov 2022 16:52:57 -0800 (PST) X-Google-Smtp-Source: AA0mqf7nElpwR7GOvNTX6U9LUZtuRf+BEz+DTjkw+s+dDVYL5kdvyfTqwYOnS/dEtwVBxRpewjnf X-Received: by 2002:a17:90a:898e:b0:218:bcab:96c6 with SMTP id v14-20020a17090a898e00b00218bcab96c6mr50482262pjn.46.1669855977298; Wed, 30 Nov 2022 16:52:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669855977; cv=none; d=google.com; s=arc-20160816; b=pLfUuHbd6IgDGu7eNjPR2mwn4EzNiY7V6qtCBqxmQ3VTFd8yiCXqWruHl28rijAVBN 2j/ty3u9W6DvmCC+F5W2F6QJZZVjmR9hAXGaYAJvaWR0A77EE4gZd9Cg2c8s7gY8CLdH 74iKRXAy0cguJCOH7HUovhLtW15iemq6y9CCDkZW6UV30JWL2NHkFMqb9PyfAZLMd3uf ZEZP2K1NowdPuNq7uB3Knt7jJ1jndGpqNvcQA53LRQ7pGWGQTP0R1mHNLEHwQj9EpVr/ g9IQf6G08AX8zEZDi5CUJDUK5TSV2P3Nk1MeP2qnOzdPCgzN7BRCVlRF634250DkBl3u +/jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=tLdcW8YLuJNMoWtBITi5oNMUbS4UvnO6BlVvWMG2nYE=; b=IrDjMx3ZMtknlWNN5ntW+xQSbY6jHL5hJYSbaY0srtchU3S25K2FvdHjPXLslHz2rN QkKuXlOrnT/TlpuWTcO2U7LJLe/VQ2jmV0tQ9FOgfd3YtBij8YNqPhrIal8tGiQg0fAo CindEQxPd0ohJkfqiZxfNCyLAwPoIsHIO1Wy0W29IIKml3Bxs4D7ZH32WwQcX7F5lLxg oIV+pC7Yn/zNZsmoLE3WM1YGeQE/wpxlbCaaYNM3Vw7y+yCysositkXrt50NboPzPgkI n8lhQsjZbVUh0se8MYd+JoUHv2PFdm79961Sl47xEUbAIDX3xnwmcn0P2dgoNoczbAAa S3DQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b="oJT/fRlb"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l15-20020a654c4f000000b0047701a2244esi2718495pgr.773.2022.11.30.16.52.15; Wed, 30 Nov 2022 16:52:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b="oJT/fRlb"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229740AbiK3XdM (ORCPT + 83 others); Wed, 30 Nov 2022 18:33:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230034AbiK3Xcq (ORCPT ); Wed, 30 Nov 2022 18:32:46 -0500 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0A969790E; Wed, 30 Nov 2022 15:29:32 -0800 (PST) Date: Wed, 30 Nov 2022 15:29:11 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1669850971; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=tLdcW8YLuJNMoWtBITi5oNMUbS4UvnO6BlVvWMG2nYE=; b=oJT/fRlbxbmZrYuoQmUv92rI06HoeclkYkBmi6RfeHrRbgReK+CHE10c6xJ8goJ+2VOyWN h9wjleatctaHsOOBKTxLa1OitIsI7BCUNk3DDcto0/tSYqLUVefiRap60qItuf/b1XmpE8 dJVgc1TkD0UwkYXK7a1IHJWdtcREVCk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: chengkaitao Cc: tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net, mhocko@kernel.org, shakeelb@google.com, akpm@linux-foundation.org, songmuchun@bytedance.com, cgel.zte@gmail.com, ran.xiaokai@zte.com.cn, viro@zeniv.linux.org.uk, zhengqi.arch@bytedance.com, ebiederm@xmission.com, Liam.Howlett@Oracle.com, chengzhihao1@huawei.com, haolee.swjtu@gmail.com, yuzhao@google.com, willy@infradead.org, vasily.averin@linux.dev, vbabka@suse.cz, surenb@google.com, sfr@canb.auug.org.au, mcgrof@kernel.org, sujiaxun@uniontech.com, feng.tang@intel.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed Message-ID: References: <20221130070158.44221-1-chengkaitao@didiglobal.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221130070158.44221-1-chengkaitao@didiglobal.com> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 30, 2022 at 03:01:58PM +0800, chengkaitao wrote: > From: chengkaitao > > We created a new interface for memory, If there is > the OOM killer under parent memory cgroup, and the memory usage of a > child cgroup is within its effective oom.protect boundary, the cgroup's > tasks won't be OOM killed unless there is no unprotected tasks in other > children cgroups. It draws on the logic of in the > inheritance relationship. > > It has the following advantages, > 1. We have the ability to protect more important processes, when there > is a memcg's OOM killer. The oom.protect only takes effect local memcg, > and does not affect the OOM killer of the host. > 2. Historically, we can often use oom_score_adj to control a group of > processes, It requires that all processes in the cgroup must have a > common parent processes, we have to set the common parent process's > oom_score_adj, before it forks all children processes. So that it is > very difficult to apply it in other situations. Now oom.protect has no > such restrictions, we can protect a cgroup of processes more easily. The > cgroup can keep some memory, even if the OOM killer has to be called. It reminds me our attempts to provide a more sophisticated cgroup-aware oom killer. The problem is that the decision which process(es) to kill or preserve is individual to a specific workload (and can be even time-dependent for a given workload). So it's really hard to come up with an in-kernel mechanism which is at the same time flexible enough to work for the majority of users and reliable enough to serve as the last oom resort measure (which is the basic goal of the kernel oom killer). Previously the consensus was to keep the in-kernel oom killer dumb and reliable and implement complex policies in userspace (e.g. systemd-oomd etc). Is there a reason why such approach can't work in your case? Thanks!