Received: by 10.223.176.46 with SMTP id f43csp447099wra; Wed, 24 Jan 2018 00:21:33 -0800 (PST) X-Google-Smtp-Source: AH8x2275ER7YHJaF7+hhvPdiWEExNLwjj9rDgjzdAmd0WkGC+y0pq8+NPuz9lpJ0hHqOnVMf8l3t X-Received: by 10.101.98.193 with SMTP id m1mr6134934pgv.174.1516782093459; Wed, 24 Jan 2018 00:21:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516782093; cv=none; d=google.com; s=arc-20160816; b=HsPXlD27XNmfa/4QsStO2JZFZgRpSTwgrGTPfjZprHYOiGLR4+NRU33Nl2WeD1HE9q MSfuOL7I3BjI/L9iZ8GqRK9ivwea4q3m6/r/U5A60WTQX9AMXvsxUoNN0EtJRh/jzXPx 4v38LDPo4laW1fgPkQwSGxLrMAvH9CjgQCvUU32aBp8slNmpvNnQruVXLvNx+R4B6CSy z6yVMIAXwp73mtZtsZ3Q5AGIBecaEU6+9aG2ePw91iz+Cxi4L+IniDQ4PO1Bm7xCKOCT qGWvYlxGzGQtK5Zwsq4ht8HrTbTrsH7XCzXuk7v2YZjg31UnrF1gh/5PCRpV7N55i09E RCrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=gWK1Ho/8jGGy0gAE/Svw3IQjE4tASotOsAO+FQdDMh0=; b=hwoEzqWmmYn09+aP6MuzF22u6FoO6c3+8JzEPJ5qW0+kVCSgsD/4yHMTKeNZtmIctN qUa8QU/WjuKTLOg0DqWBqAnjKz4/BtQ8GClM+/jnpyuxbYylN/0nrwsr2U3dEQmw4tqK rTf3/WbP+q7xLiBEkhtmWI/BapdTitc+iefwMC8pdhRnmuhJZvlFi/DAVLS4WWfuMJs+ 7DS7Uni6dMH1tVXYEzWJ+7j+hnXoxR9ixWxht58Uf6XH4hCCCKa3BxV5C3nwbZ5AsW5Q cHnQqaSYmCpLWJ4kmOkj2iF17PcnN28W/V2pAl+RUs/fkhdEvJNQmpUr1l5SLJQQiOH8 D7eQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o3si7604819pgd.298.2018.01.24.00.21.19; Wed, 24 Jan 2018 00:21:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932356AbeAXIUq (ORCPT + 99 others); Wed, 24 Jan 2018 03:20:46 -0500 Received: from mx2.suse.de ([195.135.220.15]:35476 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932136AbeAXIUp (ORCPT ); Wed, 24 Jan 2018 03:20:45 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 4D80EAE54; Wed, 24 Jan 2018 08:20:43 +0000 (UTC) Date: Wed, 24 Jan 2018 09:20:41 +0100 From: Michal Hocko To: David Rientjes Cc: Tejun Heo , Andrew Morton , Roman Gushchin , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable Message-ID: <20180124082041.GD1526@dhcp22.suse.cz> References: <20180117154155.GU3460072@devbig577.frc2.facebook.com> <20180120123251.GB1096857@devbig577.frc2.facebook.com> <20180123155301.GS1526@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 23-01-18 14:22:07, David Rientjes wrote: > On Tue, 23 Jan 2018, Michal Hocko wrote: > > > > It can't, because the current patchset locks the system into a single > > > selection criteria that is unnecessary and the mount option would become a > > > no-op after the policy per subtree becomes configurable by the user as > > > part of the hierarchy itself. > > > > This is simply not true! OOM victim selection has changed in the > > past and will be always a subject to changes in future. Current > > implementation doesn't provide any externally controlable selection > > policy and therefore the default can be assumed. Whatever that default > > means now or in future. The only contract added here is the kill full > > memcg if selected and that can be implemented on _any_ selection policy. > > > > The current implementation of memory.oom_group is based on top of a > selection implementation that is broken in three ways I have listed for > months: This doesn't lead to anywhere. You are not presenting any new arguments and you are ignoring feedback you have received so far. We have tried really hard. Considering different _independent_ people presented more or less consistent view on these points I think you should deeply reconsider how you take that feedback. > - allows users to intentionally/unintentionally evade the oom killer, > requires not locking the selection implementation for the entire > system, requires subtree control to prevent, makes a mount option > obsolete, and breaks existing users who would use the implementation > based on 4.16 if this were merged, > > - unfairly compares the root mem cgroup vs leaf mem cgroup such that > users must structure their hierarchy only for 4.16 in such a way > that _all_ processes are under hierarchical control and have no > power to create sub cgroups because of the point above and > completely breaks any user of oom_score_adj in a completely > undocumented and unspecified way, such that fixing that breakage > would also break any existing users who would use the implementation > based on 4.16 if this were merged, and > > - does not allow userspace to protect important cgroups, which can be > built on top. For the last time. This all can be done on top of the proposed solution without breaking the proposed user API. I am really _convinced_ that you underestimate how complex it is to provide a sane selection policy API and it will take _months_ to settle on something. Existing OOM APIs are a sad story and I definitly do not want to repeat same mistakes from the past. -- Michal Hocko SUSE Labs