Received: by 10.223.176.46 with SMTP id f43csp2815420wra; Thu, 25 Jan 2018 15:54:42 -0800 (PST) X-Google-Smtp-Source: AH8x224rpm5WgribyTxS4xAjemGqZDRVTEjDRqIICIEBQQvuIowmx23tQeAjuZsbvSQEySq73e/S X-Received: by 10.98.159.25 with SMTP id g25mr17598566pfe.224.1516924482827; Thu, 25 Jan 2018 15:54:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516924482; cv=none; d=google.com; s=arc-20160816; b=w6SeDiOXc9QSWGYufmixm6ofNYjgDvGbtCcqxu24IJw5NNFP2AZ4QdO09nVQPLE5Qi ADf0t9ZWp3PRxchlVco2F40pmZu0hVn0FvQ4G/409tDalO+xNvT3jaaY1N2r89G2Jrvl 0dUCzU1XfsP4Vs5i1nwuNpRp+8XqJ811gAr2N/4/ZepXM51kG4c7yRC9d+LSXdFTGy6V SE2tUh5gg0mHxavrxt76XnwE62lDpJ5myi2sJwaa1NRrQmI5VByCyDBeLuPUbi4UdutP adTQJT7F9e0IbVLSol/O6S4UTqjoNbHuJrWWb/7pLf9xJT3zUuGBf/pl0gyM3XuoQSsK mDGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=+v3NJW6Ds9lWaSzbtSt51yeici740A6QNeQuPN/RawI=; b=oQfaVpEUaUoSVIkiJt0nFda8Vaa665UFDI0J+8iLxGm0GJwlgSdaJbcaI+YTf90CTJ BR66/PlhOHSLNCmJI9+jNQxZfuwoKaBtndvfaTzqTC5nUoVkyjXrhGVdctujO68RAEOB gLusUosVhAvcKDAqboP4aahAgy/T/0r6lgHGX2DFJ0IcynhWOkCNJTJOtntrGiyi3PRo ADNjizVS0ZnpL7LhpEdbcQSqTqmSE4/k4XLN7MXzSObq8E4/F/bRF465snTHAr4Ozvax IaDJmkXqIl9Glee2Lw5LydSCUpohQxxxKoe49sIDOwGFvD0xLjnLW04VRimlArC10++n CXTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Okm4AvQC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y4-v6si2811039plk.63.2018.01.25.15.54.27; Thu, 25 Jan 2018 15:54:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Okm4AvQC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751764AbeAYXxz (ORCPT + 99 others); Thu, 25 Jan 2018 18:53:55 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:33698 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751696AbeAYXxs (ORCPT ); Thu, 25 Jan 2018 18:53:48 -0500 Received: by mail-io0-f195.google.com with SMTP id n7so9974975iob.0 for ; Thu, 25 Jan 2018 15:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=+v3NJW6Ds9lWaSzbtSt51yeici740A6QNeQuPN/RawI=; b=Okm4AvQC0EjO6cYoBw69hPiA8uLXB9oqWgX08pQNiKWSoxHYMECmrv3pzxBm2Go84z iS3gZTVFpOlDLjSQf3LVwrXYcbsuPOKQAd2277tcSrdjF/gF3h6dgkBHPzhecFWh9Ik9 A6QoePAHB9QXD23enHiBvY8aavKt8FRKCiHzwG5dPeGgOTEI4VNg9SNb9kafaCFnZcVQ W9WXHaII/gYAFgAsy/HmqqYWLK3OvFQKgAzZZAYn9o/VmLToWnS2FD/TRHWFyuKpF+v2 8bsswubjKjbDY0SiME+/I7Xna1Ooq5SJt7PJ/aIygJqeLo7U7xz9O9fsy7QDmqRUZKG0 sDSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=+v3NJW6Ds9lWaSzbtSt51yeici740A6QNeQuPN/RawI=; b=Z8gRoPy2bVv0mW9w6UGFOhI/8WKn4p5AXH9GxR5+n6gLyjRrubNRfV4DkmtOc2eTIu L2Gav+N8Zr/+A2Z/wBfkWuN2HKvZ7YOQT60FllxRreYIOQc6jAqKmdyq4KtpmyL8aC/m b0yVhPGKdQC6n+448h20DQ0MrPAk/Alx+0JWJqjCNxerqk4w3y+oQ4EwkW+1Zh5CBmj8 U9r35hMDEaFSvWArKx50NmyeofQmN7lGtznrU5KBWS1Eg68CavcVpDiXWX+yUE5IkE6b hQZbUGmtdyD3ksik869/dNPthpEOzpbMCTU/CDqg1/TzcGIfIFZjinYBYHRG+k3Fjs3A Upyw== X-Gm-Message-State: AKwxytdN/wws30VwwQ5yArAoaT3q5rOfhKGCTxgfo23+SMgt+DeA/EuW gcIhVx9YrYRzFwpu3N80KPkTuQ== X-Received: by 10.107.152.207 with SMTP id a198mr13827111ioe.302.1516924427314; Thu, 25 Jan 2018 15:53:47 -0800 (PST) Received: from [2620:15c:17:3:855a:6e21:19d4:9b12] ([2620:15c:17:3:855a:6e21:19d4:9b12]) by smtp.gmail.com with ESMTPSA id o73sm2591582ito.4.2018.01.25.15.53.46 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Jan 2018 15:53:46 -0800 (PST) Date: Thu, 25 Jan 2018 15:53:45 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable In-Reply-To: Message-ID: References: User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cgroup aware oom killer is needlessly declared for the entire system by a mount option. It's unnecessary to force the system into a single oom policy: either cgroup aware, or the traditional process aware. This patch introduces a memory.oom_policy tunable for all mem cgroups. It is currently a no-op: it can only be set to "none", which is its default policy. It will be expanded in the next patch to define cgroup aware oom killer behavior. This is an extensible interface that can be used to define cgroup aware assessment of mem cgroup subtrees or the traditional process aware assessment. Another benefit of such an approach is that an admin can lock in a certain policy for the system or for a mem cgroup subtree and can delegate the policy decision to the user to determine if the kill should originate from a subcontainer, as indivisible memory consumers themselves, or selection should be done per process. Signed-off-by: David Rientjes --- Documentation/cgroup-v2.txt | 9 +++++++++ include/linux/memcontrol.h | 11 +++++++++++ mm/memcontrol.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1065,6 +1065,15 @@ PAGE_SIZE multiple when read back. If cgroup-aware OOM killer is not enabled, ENOTSUPP error is returned on attempt to access the file. + memory.oom_policy + + A read-write single string file which exists on all cgroups. The + default value is "none". + + If "none", the OOM killer will use the default policy to choose a + victim; that is, it will choose the single process with the largest + memory footprint. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -58,6 +58,14 @@ enum memcg_event_item { MEMCG_NR_EVENTS, }; +enum memcg_oom_policy { + /* + * No special oom policy, process selection is determined by + * oom_badness() + */ + MEMCG_OOM_POLICY_NONE, +}; + struct mem_cgroup_reclaim_cookie { pg_data_t *pgdat; int priority; @@ -203,6 +211,9 @@ struct mem_cgroup { /* OOM-Killer disable */ int oom_kill_disable; + /* OOM policy for this subtree */ + enum memcg_oom_policy oom_policy; + /* * Treat the sub-tree as an indivisible memory consumer, * kill all belonging tasks if the memory cgroup selected diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4417,6 +4417,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; + memcg->oom_policy = parent->oom_policy; } if (parent && parent->use_hierarchy) { memcg->use_hierarchy = true; @@ -5534,6 +5535,34 @@ static int memory_stat_show(struct seq_file *m, void *v) return 0; } +static int memory_oom_policy_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + enum memcg_oom_policy policy = READ_ONCE(memcg->oom_policy); + + switch (policy) { + case MEMCG_OOM_POLICY_NONE: + default: + seq_puts(m, "none\n"); + }; + return 0; +} + +static ssize_t memory_oom_policy_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = nbytes; + + buf = strstrip(buf); + if (!memcmp("none", buf, min(sizeof("none")-1, nbytes))) + memcg->oom_policy = MEMCG_OOM_POLICY_NONE; + else + ret = -EINVAL; + + return ret; +} + static struct cftype memory_files[] = { { .name = "current", @@ -5575,6 +5604,12 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .seq_show = memory_stat_show, }, + { + .name = "oom_policy", + .flags = CFTYPE_NS_DELEGATABLE, + .seq_show = memory_oom_policy_show, + .write = memory_oom_policy_write, + }, { } /* terminate */ };