Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936230AbdIYUZ1 (ORCPT ); Mon, 25 Sep 2017 16:25:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:55123 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S934254AbdIYUZZ (ORCPT ); Mon, 25 Sep 2017 16:25:25 -0400 Date: Mon, 25 Sep 2017 22:25:21 +0200 From: Michal Hocko To: Roman Gushchin Cc: Johannes Weiner , Tejun Heo , kernel-team@fb.com, David Rientjes , linux-mm@kvack.org, Vladimir Davydov , Tetsuo Handa , Andrew Morton , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20170925202442.lmcmvqwy2jj2tr5h@dhcp22.suse.cz> References: <20170913215607.GA19259@castle> <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz> <20170914160548.GA30441@castle> <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle> <20170918061405.pcrf5vauvul4c2nr@dhcp22.suse.cz> <20170920215341.GA5382@castle> <20170925122400.4e7jh5zmuzvbggpe@dhcp22.suse.cz> <20170925170004.GA22704@cmpxchg.org> <20170925181533.GA15918@castle> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170925181533.GA15918@castle> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1157 Lines: 25 On Mon 25-09-17 19:15:33, Roman Gushchin wrote: [...] > I'm not against this model, as I've said before. It feels logical, > and will work fine in most cases. > > In this case we can drop any mount/boot options, because it preserves > the existing behavior in the default configuration. A big advantage. I am not sure about this. We still need an opt-in, ragardless, because selecting the largest process from the largest memcg != selecting the largest task (just consider memcgs with many processes example). > The only thing, I'm slightly concerned, that due to the way how we calculate > the memory footprint for tasks and memory cgroups, we will have a number > of weird edge cases. For instance, when putting a single process into > the group_oom memcg will alter the oom_score significantly and result > in significantly different chances to be killed. An obvious example will > be a task with oom_score_adj set to any non-extreme (other than 0 and -1000) > value, but it can also happen in case of constrained alloc, for instance. I am not sure I understand. Are you talking about root memcg comparing to other memcgs? -- Michal Hocko SUSE Labs