Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp547114rwb; Thu, 1 Dec 2022 05:39:19 -0800 (PST) X-Google-Smtp-Source: AA0mqf5wsym5iqN7eSUZhoNisgd3aEFP7TLFkhH57CYtwCqzFMAWutKfigPVXJtg8pnnKOfPMq4M X-Received: by 2002:a05:6402:28c4:b0:469:ee21:16d4 with SMTP id ef4-20020a05640228c400b00469ee2116d4mr22539806edb.315.1669901959107; Thu, 01 Dec 2022 05:39:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669901959; cv=none; d=google.com; s=arc-20160816; b=Z8RTPZHekyvfjVQS/vR3aUrAr0ZDFwWlekLXjQ9z7diU6HgwWpxe/7vMkav/GdFJ8o KHJrGYcGCWM6bqSOm21zk7xpB7tUXIMg99LwU/RFGcI463b9PjrgLG6ucCcVZd9tgHU9 U8H5EZC+M7lFiAZ9FxNBoMu9gaOsRlLW6pV5XvFET9BgJmGIRhPDf0HLnRtgb2tEVOVX 6KpsrqfNZTOJBxUB7f7qR535KaRQpWdM8d/xBWlCU4iXj8xvbS70SoWGdpFT7oDO2+ne 6zIzraZh6PeXVmhqAThVoqEP/i5L43R0mG02Fi9szpyz8+9E9MkDeOr6Krp3TkNC4T5H 5iNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=2usVyVHvq6UpCYGTeYatlo+CIODmj9sJULa7nXifL7I=; b=yC84pDpRxaHdRieS00dL3K8znhPa2FPuxfyQ+fRgIbEHOXshjQEI4TUc0PghfQNi35 lqnVMxVg8M3f2V1o9UBfPBBWS5Y963GKuc7KL9oUPVCDyTVx5c9BVdfYIq6kdPHGUvbm sz40VjJdlkQ9fCqJ3+U9M5ZnjYHXWa22odYTcFrn9PKSAzvXNZ70BqUcQ9MrYbOe7lp6 KCWbfbC8k433QA5tJIoH25bIgWiWGgC3grxnBhWhcRCVaq2BGWFZZJl1S9jjPVgOfuqV FOmBLw79xJajaD9G7dKUqh7hcvRFLq/W4Q4H31N3wG01DklOIqBppvbPhBQ2X6pirqSg jWsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=hVlPQZhQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cw28-20020a056402229c00b0046b5aff3c95si3479846edb.311.2022.12.01.05.38.53; Thu, 01 Dec 2022 05:39:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=hVlPQZhQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230492AbiLAMpE (ORCPT + 82 others); Thu, 1 Dec 2022 07:45:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230348AbiLAMpA (ORCPT ); Thu, 1 Dec 2022 07:45:00 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0E7C8C6AF; Thu, 1 Dec 2022 04:44:59 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1E60621AC2; Thu, 1 Dec 2022 12:44:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1669898698; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2usVyVHvq6UpCYGTeYatlo+CIODmj9sJULa7nXifL7I=; b=hVlPQZhQULbKSaUOF1Ptr93R7UHM2CFgYMqevAQOi+JuvHhxUFBIvxcFSM2Vh7Djf7zhEL dlFAaCNFppiO/tq30A6MzlbBRIYJXya8GMZ+SOvkebJl/TRVMO1i2Tyy8KA1wVap+1o63F rrwgJlk321vFBTdqNpCzEqeY/yXhzEU= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id EB1CB13503; Thu, 1 Dec 2022 12:44:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id mpA4OcmhiGM3eAAAGKfGzw (envelope-from ); Thu, 01 Dec 2022 12:44:57 +0000 Date: Thu, 1 Dec 2022 13:44:57 +0100 From: Michal Hocko To: =?utf-8?B?56iL5Z6y5rab?= Chengkaitao Cheng Cc: Tao pilgrim , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "songmuchun@bytedance.com" , "cgel.zte@gmail.com" , "ran.xiaokai@zte.com.cn" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , Bagas Sanjaya , "linux-mm@kvack.org" , Greg Kroah-Hartman Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 01-12-22 10:52:35, 程垲涛 Chengkaitao Cheng wrote: > At 2022-12-01 16:49:27, "Michal Hocko" wrote: > >On Thu 01-12-22 04:52:27, 程垲涛 Chengkaitao Cheng wrote: > >> At 2022-12-01 00:27:54, "Michal Hocko" wrote: > >> >On Wed 30-11-22 15:46:19, 程垲涛 Chengkaitao Cheng wrote: > >> >> On 2022-11-30 21:15:06, "Michal Hocko" wrote: > >> >> > On Wed 30-11-22 15:01:58, chengkaitao wrote: > >> >> > > From: chengkaitao > >> >> > > > >> >> > > We created a new interface for memory, If there is > >> >> > > the OOM killer under parent memory cgroup, and the memory usage of a > >> >> > > child cgroup is within its effective oom.protect boundary, the cgroup's > >> >> > > tasks won't be OOM killed unless there is no unprotected tasks in other > >> >> > > children cgroups. It draws on the logic of in the > >> >> > > inheritance relationship. > >> >> > > >> >> > Could you be more specific about usecases? > >> > > >> >This is a very important question to answer. > >> > >> usecases 1: users say that they want to protect an important process > >> with high memory consumption from being killed by the oom in case > >> of docker container failure, so as to retain more critical on-site > >> information or a self recovery mechanism. At this time, they suggest > >> setting the score_adj of this process to -1000, but I don't agree with > >> it, because the docker container is not important to other docker > >> containers of the same physical machine. If score_adj of the process > >> is set to -1000, the probability of oom in other container processes will > >> increase. > >> > >> usecases 2: There are many business processes and agent processes > >> mixed together on a physical machine, and they need to be classified > >> and protected. However, some agents are the parents of business > >> processes, and some business processes are the parents of agent > >> processes, It will be troublesome to set different score_adj for them. > >> Business processes and agents cannot determine which level their > >> score_adj should be at, If we create another agent to set all processes's > >> score_adj, we have to cycle through all the processes on the physical > >> machine regularly, which looks stupid. > > > >I do agree that oom_score_adj is far from ideal tool for these usecases. > >But I also agree with Roman that these could be addressed by an oom > >killer implementation in the userspace which can have much better > >tailored policies. OOM protection limits would require tuning and also > >regular revisions (e.g. memory consumption by any workload might change > >with different kernel versions) to provide what you are looking for. > > There is a misunderstanding, oom.protect does not replace the user's > tailed policies, Its purpose is to make it easier and more efficient for > users to customize policies, or try to avoid users completely abandoning > the oom score to formulate new policies. Then you should focus on explaining on how this makes those policies and easier and moe efficient. I do not see it. [...] > >Why cannot you simply discount the protection from all processes > >equally? I do not follow why the task_usage has to play any role in > >that. > > If all processes are protected equally, the oom protection of cgroup is > meaningless. For example, if there are more processes in the cgroup, > the cgroup can protect more mems, it is unfair to cgroups with fewer > processes. So we need to keep the total amount of memory that all > processes in the cgroup need to protect consistent with the value of > eoom.protect. You are mixing two different concepts together I am afraid. The per memcg protection should protect the cgroup (i.e. all processes in that cgroup) while you want it to be also process aware. This results in a very unclear runtime behavior when a process from a more protected memcg is selected based on its individual memory usage. -- Michal Hocko SUSE Labs