Received: by 10.223.176.5 with SMTP id f5csp1177600wra; Wed, 31 Jan 2018 02:27:44 -0800 (PST) X-Google-Smtp-Source: AH8x225Ez6yoFTimH0turhKFkoIiAlwgSmBB0/iDkOGccxG4N7p8h/IgPN5rnAd07FQvnKS+pK+D X-Received: by 10.99.170.73 with SMTP id x9mr26217700pgo.393.1517394464700; Wed, 31 Jan 2018 02:27:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517394464; cv=none; d=google.com; s=arc-20160816; b=fW9FuA/LQtkVgSMUVvMMlrnvKpfEF5sTfG2JG6mIF257Flmn2hn/nbQ2mAybCtg9lk PRAERv7XuxwBidk52jrPLcCeLINcJtL6pu6fRNwtMLU2Z8yHluSqh0DW298tPU634coR NK0nlfn2fPptRRgaJV0308wf5DPs40IrDXdUneEsHY0Ubt1hF0h/KaIPSiOes42S/RBp G8/qCb5X051SoVJBAh+iT5P+RoMhKhiOiD2bOUziMFO1nUmRni60nJrs315sCylCSlgx xyD+/nHW5O22AHxTWil/I2OzUjqZZTb8P4FPUghD9odP8FzJueHqn65KTVJPgdKp0qrz mfNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=Lpy5/MCi9oL668Ph4CEdqnnBVnjyiw1I6u/FrBEAsEw=; b=j9utK8+CkgmnIPGSnskhZK07Jb0Zy3VJRTahF6V3Y2h8SPHS4LLIITqarzcVLia6O8 sNjw51GurciX1Rfo+NHfNm9pwB7aIMEjghmsELnjXp6wTxY8jStizhYlQgHO4bSSd01g EELh5mlB0+77fl6NqrMfMzP4E7TBDdjHQupsmeEoUhxixzwKV1YIHemDAQQcHKeB+7Sk 8xPHxm1/xxJl05nMFUqa4MHKl2zzfvIO1UbdnY0UAbA4NaUxPmicVjjmfm+XEhn1sMOA WqDwTa1mUNZ5dGdxi+SAGp8w3th1tnojgmLT7oDPW2UCg8CHCfaN57X4898SeEAukwYO eY4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e82si6476217pfd.331.2018.01.31.02.27.30; Wed, 31 Jan 2018 02:27:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753861AbeAaJrY (ORCPT + 99 others); Wed, 31 Jan 2018 04:47:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:58826 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753430AbeAaJrV (ORCPT ); Wed, 31 Jan 2018 04:47:21 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id B28CEAD30; Wed, 31 Jan 2018 09:47:19 +0000 (UTC) Date: Wed, 31 Jan 2018 10:47:17 +0100 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Roman Gushchin , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable Message-ID: <20180131094717.GR21609@dhcp22.suse.cz> References: <20180126171548.GB16763@dhcp22.suse.cz> <20180130085013.GP21609@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 30-01-18 14:38:40, David Rientjes wrote: > On Tue, 30 Jan 2018, Michal Hocko wrote: > > > > > So what is the actual semantic and scope of this policy. Does it apply > > > > only down the hierarchy. Also how do you compare cgroups with different > > > > policies? Let's say you have > > > > root > > > > / | \ > > > > A B C > > > > / \ / \ > > > > D E F G > > > > > > > > Assume A: cgroup, B: oom_group=1, C: tree, G: oom_group=1 > > > > > > > > > > At each level of the hierarchy, memory.oom_policy compares immediate > > > children, it's the only way that an admin can lock in a specific oom > > > policy like "tree" and then delegate the subtree to the user. If you've > > > configured it as above, comparing A and C should be the same based on the > > > cumulative usage of their child mem cgroups. > > > > So cgroup == tree if we are memcg aware OOM killing, right? Why do we > > need both then? Just to make memcg aware OOM killing possible? > > > > We need "tree" to account the usage of the subtree rather than simply the > cgroup alone, but "cgroup" and "tree" are accounted with the same units. > In your example, D and E are treated as individual memory consumers and C > is treated as the sum of all subtree memory consumers. It seems I am still not clear with my question. What kind of difference does policy=cgroup vs. none on A? Also what kind of different does it make when a leaf node has cgroup policy? [...] > > So now you have a killable cgroup selected by process criterion? That > > just doesn't make any sense. So I guess it would at least require to > > enforce (cgroup || tree) to allow oom_group. > > > > Hmm, I'm not sure why we would limit memory.oom_group to any policy. Even > if we are selecting a process, even without selecting cgroups as victims, > killing a process may still render an entire cgroup useless and it makes > sense to kill all processes in that cgroup. If an unlucky process is > selected with today's heursitic of oom_badness() or with a "none" policy > with my patchset, I don't see why we can't enable the user to kill all > other processes in the cgroup. It may not make sense for some trees, but > but I think it could be useful for others. My intuition screams here. I will think about this some more but I would be really curious about any sensible usecase when you want sacrifice the whole gang just because of one process compared to other processes or cgroups is too large. Do you see how you are mixing entities here? > > > Right, a policy of "none" reverts its subtree back to per-process > > > comparison if you are either not using the cgroup aware oom killer or your > > > subtree is not using the cgroup aware oom killer. > > > > So how are you going to compare none cgroups with those that consider > > full memcg or hierarchy (cgroup, tree)? Are you going to consider > > oom_score_adj? > > > > No, I think it would make sense to make the restriction that to set > "none", the ancestor mem cgroups would also need the same policy, I do not understand. Get back to our example. Are you saying that G with none will enforce the none policy to C and root? If yes then this doesn't make any sense because you are not really able to delegate the oom policy down the tree at all. It would effectively make tree policy pointless. I am skipping the rest of the following text because it is picking on details and the whole design is not clear to me. So could you start over documenting semantic and requirements. Ideally by describing: - how does the policy on the root of the OOM hierarchy controls the selection policy - how does the per-memcg policy act during the tree walk - for both intermediate nodes and leafs - how does the oom killer act based on the selected memcg - how do you compare tasks with memcgs [...] -- Michal Hocko SUSE Labs