Date: Fri, 15 Sep 2017 12:55:55 -0700 (PDT)
From: David Rientjes <rientjes@google.com>
To: Roman Gushchin <guro@fb.com>
cc: Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org,
        Vladimir Davydov <vdavydov.dev@gmail.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
        Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org>,
        kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [v8 0/4] cgroup-aware OOM killer
In-Reply-To: <20170915152301.GA29379@castle>
Message-ID: <alpine.DEB.2.10.1709151249290.76069@chino.kir.corp.google.com>
References: <20170911131742.16482-1-guro@fb.com> <alpine.DEB.2.10.1709111334210.102819@chino.kir.corp.google.com> <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> <20170913215607.GA19259@castle> <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz>
 <20170914160548.GA30441@castle> <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1117
Lines: 31

On Fri, 15 Sep 2017, Roman Gushchin wrote:

> > But then you just enforce a structural restriction on your configuration
> > because
> > 	root
> >         /  \
> >        A    D
> >       /\   
> >      B  C
> > 
> > is a different thing than
> > 	root
> >         / | \
> >        B  C  D
> >
> 
> I actually don't have a strong argument against an approach to select
> largest leaf or kill-all-set memcg. I think, in practice there will be
> no much difference.
> 
> The only real concern I have is that then we have to do the same with
> oom_priorities (select largest priority tree-wide), and this will limit
> an ability to enforce the priority by parent cgroup.
> 

Yes, oom_priority cannot select the largest priority tree-wide for exactly 
that reason.  We need the ability to control from which subtree the kill 
occurs in ancestor cgroups.  If multiple jobs are allocated their own 
cgroups and they can own memory.oom_priority for their own subcontainers, 
this becomes quite powerful so they can define their own oom priorities.   
Otherwise, they can easily override the oom priorities of other cgroups.