Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751443AbdIMUqM (ORCPT ); Wed, 13 Sep 2017 16:46:12 -0400 Received: from mail-pg0-f51.google.com ([74.125.83.51]:46921 "EHLO mail-pg0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751354AbdIMUqK (ORCPT ); Wed, 13 Sep 2017 16:46:10 -0400 X-Google-Smtp-Source: ADKCNb6wrqmGklkbch9RSPyTJbqdDHTRFiUT5NTxMgzTWy5ud5/pFgnGBh12YCJllAyMpcMmdcflFw== Date: Wed, 13 Sep 2017 13:46:08 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Michal Hocko cc: Roman Gushchin , linux-mm@kvack.org, Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Andrew Morton , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [v8 0/4] cgroup-aware OOM killer In-Reply-To: <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> Message-ID: References: <20170911131742.16482-1-guro@fb.com> <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1843 Lines: 37 On Wed, 13 Sep 2017, Michal Hocko wrote: > > > This patchset makes the OOM killer cgroup-aware. > > > > > > v8: > > > - Do not kill tasks with OOM_SCORE_ADJ -1000 > > > - Make the whole thing opt-in with cgroup mount option control > > > - Drop oom_priority for further discussions > > > > Nack, we specifically require oom_priority for this to function correctly, > > otherwise we cannot prefer to kill from low priority leaf memcgs as > > required. > > While I understand that your usecase might require priorities I do not > think this part missing is a reason to nack the cgroup based selection > and kill-all parts. This can be done on top. The only important part > right now is the current selection semantic - only leaf memcgs vs. size > of the hierarchy). I strongly believe that comparing only leaf memcgs > is more straightforward and it doesn't lead to unexpected results as > mentioned before (kill a small memcg which is a part of the larger > sub-hierarchy). > The problem is that we cannot enable the cgroup-aware oom killer and oom_group behavior because, without oom priorities, we have no ability to influence the cgroup that it chooses. It is doing two things: providing more fairness amongst cgroups by selecting based on cumulative usage rather than single large process (good!), and effectively is removing all userspace control of oom selection (bad). We want the former, but it needs to be coupled with support so that we can protect vital cgroups, regardless of their usage. It is certainly possible to add oom priorities on top before it is merged, but I don't see why it isn't part of the patchset. We need it before its merged to avoid users playing with /proc/pid/oom_score_adj to prevent any killing in the most preferable memcg when they could have simply changed the oom priority.