Received: by 10.213.65.68 with SMTP id h4csp1080797imn; Thu, 22 Mar 2018 14:56:49 -0700 (PDT) X-Google-Smtp-Source: AG47ELv7rP9tcPRHCEFDakrev0EBln1SsF44WJUTMiyxiRFPk/vYa4KFjuWK1puo8/uSnyn98GwP X-Received: by 10.98.172.3 with SMTP id v3mr9213138pfe.140.1521755809342; Thu, 22 Mar 2018 14:56:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521755809; cv=none; d=google.com; s=arc-20160816; b=HFa3kf93IGtlNYMr8RQg6eyh3lKg2qdYTHe6TKbVBuIcnTcIzTFUFwdo+AcUF23DIM kTPKqKvh0mKFq1y+RFt3VBvioFZtHYGbTsmkngk3YduFJE55svNZzc5RpWqG1M1OA+tS coPpgOK9OG11x/f4HhBxFKjpkaRjaKLYkHY9BC+sQW6d7eS/BbrPQ+oFOdjktZv1xMSD 4G8ZM7UkFZkYjFM8rZcxpK4hK4JmsnwXo4AFesQnsYrnDeZ1kfm6TBu5xLyATo44Ri6L pmQlKBCEXVCEZWz3ZoI0KSiEgQp03pfKzuztUIPEhPDLCIKRGYw2+VcUvI+pdD+zV7Bv HCGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=+n88kiWd7AcQNtj7MHbfphivQkZ4paw+Ubi+GhSjD3s=; b=M/JvnQBmNKE666Ys/4QiDeCQ6LNIwXi6KvM4IIbNocLYDUDtnRXOT0GgDK5v7KiGif /Xo9UU2+sR2iXW2yDEGTpgSWLyMjiXGZKy1E7P7sOAfVOELxSy615oUxbMu2UpeC86+w Tx31m6gCFxIBzkRThjBKyuDdCc9JdtSJP1GZhcakg4zmxMd0Doij5y3ibaZCU0KfUXS0 GLOv0ChDkxky72t9B9WJ9y1VBbgtB1NXCHJ2uja9YOgfaPQ8QnOGQzMz8uyoHvMsNwNA ljKPyAKKLvevDorRAZgIp0J3WS9FQu66XzsCcSPtFg7RilDXt8A43yqpA4UsT5fTPsI8 Ujdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=pHyEnx1r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o3-v6si6966018pls.636.2018.03.22.14.56.35; Thu, 22 Mar 2018 14:56:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=pHyEnx1r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752177AbeCVVzJ (ORCPT + 99 others); Thu, 22 Mar 2018 17:55:09 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:38198 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751625AbeCVVxu (ORCPT ); Thu, 22 Mar 2018 17:53:50 -0400 Received: by mail-pl0-f66.google.com with SMTP id m22-v6so6187144pls.5 for ; Thu, 22 Mar 2018 14:53:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=+n88kiWd7AcQNtj7MHbfphivQkZ4paw+Ubi+GhSjD3s=; b=pHyEnx1rhlzKM2zLlM2x3pgjadfDYHkEWNGZvn+1vIY5eHs/C+4w8hcil7rzXmRb2N seIShH1zSaZz6inbsI/ChIOfspsIXcCtw3JYlOsbbzixduNd2YLxDy+1VI3NDjAXBAhq KBJH8LaCEA3O79PivdQZwQzApMEOOM1l/TzoAainShIEk9HVGDqZMfO1DEYvPvlAQJQR WKu9X/qoEZYXCQuMvfNbm6sQxfpkdGDJVuJCjztg+UVK5qh/FMNf3sWi8+7kk9/JwvoF yaAojHViocAeDtBZ+8hS0U8E2MdmwSy6vXOI9hjw8gFulI9wECVK2H50Qh50Y20vu/01 Awgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=+n88kiWd7AcQNtj7MHbfphivQkZ4paw+Ubi+GhSjD3s=; b=DIIWM18Mkc02VVGH60uytX7/sWRcKvmp7UsIxM2XyquBuX4enaYdgey05D/QR10xV9 HAZ0NwRf53t8nPDaoE2RPnb/I8w1IW1/VuvkjWsUeZHi1CM8JYaWYuQcuCbfo80HrrQQ T9oJElp5+fT0F6PWqHLjLn5tu5dkhSNjRe+xCxG5TUh0LyEs6wLDMID7JlNtGWOFy61b VK54iW9/W9XrKHzxeL/ydFbyFDmDc8E7pqLx9PFW+vhG0qYr8bc45VhLOKWgsZF1whlB U3fzaJvS7ODVdGl8rMo16kCickhLT4ZULfwAaBHtBFV/diAlK1XhT22gI/GB/4DSVrfG eeKg== X-Gm-Message-State: AElRT7Epr3vvjWIRwzhDVJ+XiBH8lmSQCeq+W0XIXglUZfvU/gMOJcow 5hAZ/bnUXk2xmMPIdlc+ZDWwbg== X-Received: by 2002:a17:902:b48c:: with SMTP id y12-v6mr26855051plr.313.1521755629525; Thu, 22 Mar 2018 14:53:49 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id e4sm15179474pfa.166.2018.03.22.14.53.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Mar 2018 14:53:48 -0700 (PDT) Date: Thu, 22 Mar 2018 14:53:48 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch v2 -mm 1/6] mm, memcg: introduce per-memcg oom policy tunable In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cgroup aware oom killer is needlessly enforced for the entire system by a mount option. It's unnecessary to force the system into a single oom policy: either cgroup aware, or the traditional process aware. This patch introduces a memory.oom_policy tunable for all mem cgroups. It is currently a no-op: it can only be set to "none", which is its default policy. It will be expanded in the next patch to define cgroup aware oom killer behavior for its subtree. This is an extensible interface that can be used to define cgroup aware assessment of mem cgroup subtrees or the traditional process aware assessment. Reading memory.oom_policy will specify the list of available policies. Another benefit of such an approach is that an admin can lock in a certain policy for the system or for a mem cgroup subtree and can delegate the policy decision to the user to determine if the kill should originate from a subcontainer, as indivisible memory consumers themselves, or selection should be done per process. Signed-off-by: David Rientjes --- Documentation/cgroup-v2.txt | 11 +++++++++++ include/linux/memcontrol.h | 11 +++++++++++ mm/memcontrol.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 57 insertions(+) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1065,6 +1065,17 @@ PAGE_SIZE multiple when read back. If cgroup-aware OOM killer is not enabled, ENOTSUPP error is returned on attempt to access the file. + memory.oom_policy + + A read-write single string file which exists on all cgroups. The + default value is "none". + + If "none", the OOM killer will use the default policy to choose a + victim; that is, it will choose the single process with the largest + memory footprint adjusted by /proc/pid/oom_score_adj (see + Documentation/filesystems/proc.txt). This is the same policy as if + memory cgroups were not even mounted. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -58,6 +58,14 @@ enum memcg_event_item { MEMCG_NR_EVENTS, }; +enum memcg_oom_policy { + /* + * No special oom policy, process selection is determined by + * oom_badness() + */ + MEMCG_OOM_POLICY_NONE, +}; + struct mem_cgroup_reclaim_cookie { pg_data_t *pgdat; int priority; @@ -203,6 +211,9 @@ struct mem_cgroup { /* OOM-Killer disable */ int oom_kill_disable; + /* OOM policy for this subtree */ + enum memcg_oom_policy oom_policy; + /* * Treat the sub-tree as an indivisible memory consumer, * kill all belonging tasks if the memory cgroup selected diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4430,6 +4430,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; + memcg->oom_policy = parent->oom_policy; } if (parent && parent->use_hierarchy) { memcg->use_hierarchy = true; @@ -5547,6 +5548,34 @@ static int memory_stat_show(struct seq_file *m, void *v) return 0; } +static int memory_oom_policy_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + enum memcg_oom_policy policy = READ_ONCE(memcg->oom_policy); + + switch (policy) { + case MEMCG_OOM_POLICY_NONE: + default: + seq_puts(m, "[none]\n"); + }; + return 0; +} + +static ssize_t memory_oom_policy_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = nbytes; + + buf = strstrip(buf); + if (!memcmp("none", buf, min(sizeof("none")-1, nbytes))) + memcg->oom_policy = MEMCG_OOM_POLICY_NONE; + else + ret = -EINVAL; + + return ret; +} + static struct cftype memory_files[] = { { .name = "current", @@ -5588,6 +5617,12 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .seq_show = memory_stat_show, }, + { + .name = "oom_policy", + .flags = CFTYPE_NS_DELEGATABLE, + .seq_show = memory_oom_policy_show, + .write = memory_oom_policy_write, + }, { } /* terminate */ };