Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp817348rwb; Thu, 1 Dec 2022 08:43:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf7mJSYIWUoSHVdk01tGcflDKgukmkiYcYpwLNEscYnW4a3nzovsXHEdV2nF52qJP8wy1E4b X-Received: by 2002:a62:65c3:0:b0:562:ce80:1417 with SMTP id z186-20020a6265c3000000b00562ce801417mr47932684pfb.19.1669913029725; Thu, 01 Dec 2022 08:43:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669913029; cv=none; d=google.com; s=arc-20160816; b=ov0PpsUOT/CpaynLrrAwevNCuOBjTecElYOrxCtv0ivWbYdLgALIm2a+/+RlgunupV eYmdxSKDv/W53h4u9dU+dkXDIj1iIa/iC5vF0j1mxxsZeAGB8dWE49VKFJ3IjvnNlnpN l1ShqUwfPabwTeOZbX8oLv1/E8irGH9DokmMclgRhztQizX5Na7dsJh0I/XegzvE+vML u2xbrlZSCxTHKkMaoKD5pdgqleXZo6OpNfnfST9VqXmgD3UXylXJKuCk96l6iRLs5DrD cJdQ+KHbJlldY0sXpa/K0Qukzps0lnNTbNZ4aJFSzzw7B/gxyQc5nYmaRtjltK7q8u0C rCLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=nXlkWC2x3jCBLbFSc4SzlcbQGp8MHx6uDp5av6z1mNo=; b=ThkMn0uJ29ErcNoRZ0+/fpzSc1iGZ15xxWIkRhRJPSp2OTRXBu1n+4f8IBy4qrfln5 P8FeQAMhw20S/OmqI70omR2QUDcVsgnSo9/hF3h9hqfMX5U69IyakeCx2KzC65ROSUFI TrqHv0D8jWosLh51S7GcHuBdB8OI147aXExqoThplzUQ8nDBoi51qpI7/xhOmCWIQ1vc 8vU3Ga7ZW1VRqSA3VUDwOBXXwX7M2qvz+399GlaMb3idygSMLwOc1mlsEXZXRUJHBWK0 LNi+C4XN55v1sN6R9jo2hSBsJ9uxvE0S++B1GH7axpnkTAqHWj0kp7A+L1DzUOSfi9M2 +dkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=k30ixjE6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v15-20020a63464f000000b0047858b6b946si4629323pgk.666.2022.12.01.08.43.38; Thu, 01 Dec 2022 08:43:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=k30ixjE6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231561AbiLAPR4 (ORCPT + 82 others); Thu, 1 Dec 2022 10:17:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230371AbiLAPRx (ORCPT ); Thu, 1 Dec 2022 10:17:53 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D09A920364; Thu, 1 Dec 2022 07:17:51 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1336C21A99; Thu, 1 Dec 2022 15:17:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1669907870; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nXlkWC2x3jCBLbFSc4SzlcbQGp8MHx6uDp5av6z1mNo=; b=k30ixjE6LqemmfsFV0Zse9FQiuesSQnFgGvweadjcr1aagroTOCG5WHwVwJFGEfGusCFCf KW2BoWfpyl09bC/FfJrhxJiI1euhcu0Y3ghwo3YfXLLX4Fou04MqnJwgyeq+bFLrnwfWaS 682upUE6Tr9AOU9BINm3McZE+5sMDig= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id EEA7713503; Thu, 1 Dec 2022 15:17:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id fy3bOZ3FiGMETgAAGKfGzw (envelope-from ); Thu, 01 Dec 2022 15:17:49 +0000 Date: Thu, 1 Dec 2022 16:17:49 +0100 From: Michal Hocko To: =?utf-8?B?56iL5Z6y5rab?= Chengkaitao Cheng Cc: Tao pilgrim , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "songmuchun@bytedance.com" , "cgel.zte@gmail.com" , "ran.xiaokai@zte.com.cn" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , Bagas Sanjaya , "linux-mm@kvack.org" , Greg Kroah-Hartman Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 01-12-22 14:30:11, 程垲涛 Chengkaitao Cheng wrote: > At 2022-12-01 21:08:26, "Michal Hocko" wrote: > >On Thu 01-12-22 13:44:58, Michal Hocko wrote: > >> On Thu 01-12-22 10:52:35, 程垲涛 Chengkaitao Cheng wrote: > >> > At 2022-12-01 16:49:27, "Michal Hocko" wrote: > >[...] > >> There is a misunderstanding, oom.protect does not replace the user's > >> tailed policies, Its purpose is to make it easier and more efficient for > >> users to customize policies, or try to avoid users completely abandoning > >> the oom score to formulate new policies. > > > > Then you should focus on explaining on how this makes those policies and > > easier and moe efficient. I do not see it. > > In fact, there are some relevant contents in the previous chat records. > If oom.protect is applied, it will have the following benefits > 1. Users only need to focus on the management of the local cgroup, not the > impact on other users' cgroups. Protection based balancing cannot really work in an isolation. > 2. Users and system do not need to spend extra time on complicated and > repeated scanning and configuration. They just need to configure the > oom.protect of specific cgroups, which is a one-time task This will not work same way as the memory reclaim protection cannot work in an isolation on the memcg level. > >> > >Why cannot you simply discount the protection from all processes > >> > >equally? I do not follow why the task_usage has to play any role in > >> > >that. > >> > > >> > If all processes are protected equally, the oom protection of cgroup is > >> > meaningless. For example, if there are more processes in the cgroup, > >> > the cgroup can protect more mems, it is unfair to cgroups with fewer > >> > processes. So we need to keep the total amount of memory that all > >> > processes in the cgroup need to protect consistent with the value of > >> > eoom.protect. > >> > >> You are mixing two different concepts together I am afraid. The per > >> memcg protection should protect the cgroup (i.e. all processes in that > >> cgroup) while you want it to be also process aware. This results in a > >> very unclear runtime behavior when a process from a more protected memcg > >> is selected based on its individual memory usage. > > > The correct statement here should be that each memcg protection should > protect the number of mems specified by the oom.protect. For example, > a cgroup's usage is 6G, and it's oom.protect is 2G, when an oom killer occurs, > In the worst case, we will only reduce the memory used by this cgroup to 2G > through the om killer. I do not see how that could be guaranteed. Please keep in mind that a non-trivial amount of memory resources could be completely independent on any process life time (just consider tmpfs as a trivial example). > >Let me be more specific here. Although it is primarily processes which > >are the primary source of memcg charges the memory accounted for the oom > >badness purposes is not really comparable to the overal memcg charged > >memory. Kernel memory, non-mapped memory all that can generate rather > >interesting cornercases. > > Sorry, I'm thoughtless enough about some special memory statistics. I will fix > it in the next version Let me just emphasise that we are talking about fundamental disconnect. Rss based accounting has been used for the OOM killer selection because the memory gets unmapped and _potentially_ freed when the process goes away. Memcg changes are bound to the object life time and as said in many cases there is no direct relation with any process life time. Hope that clarifies. -- Michal Hocko SUSE Labs