Received: by 10.223.176.46 with SMTP id f43csp2815842wra; Thu, 25 Jan 2018 15:55:11 -0800 (PST) X-Google-Smtp-Source: AH8x225OEr6DOViRJ01FA2B5wsCVQKpYpLdb3oZBgqooTlvfq12DTK4b92TDu5D0bM+pZHgJSbOU X-Received: by 10.99.113.15 with SMTP id m15mr15036785pgc.236.1516924511098; Thu, 25 Jan 2018 15:55:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516924511; cv=none; d=google.com; s=arc-20160816; b=xGTuvmlsYquF1Y4NxtbQzeeIau6MJQbbLKa8juQ0uWv9qqiMrSTH7lOB0CWgIY3eI0 WKnbSLEqL+OWcgyRYP9Mfuqvng5oBAh7JVcz5EFQLbToXRiSYhtu74x+Vb/j0fJb3xCZ zvq2wTFefAsSDAq2UKitENC/cnenBy3OFXtWmYQvfnaHfVfttdOQ5OiQweHHADjZhpAV 2ayS76ftFL7Nk2Jx0SU7AeS7zVJc5JQ6VQlT4SyV49wGvxNqA7PGC8nANN38x4OrD0CZ EGYVePbVnBiMbOosBh3YCKjngf6MiLUDqraC5JvgxViz6h4bCDj/G0KlMZ5unY4097SL mNsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=qt2gAMzN9R/15csvM3IUTPHHpODhcWGxuA2KGgg2/QE=; b=x/KSqzjG/m19bCwbDsThYC/VO1Un/Uc1jFTL/ukdg/NEYhuZnjEjVXLTrAEDFi+U1a MNDVA3M6JQXMzdpxlpWvjnRmIk1pl3frkiVqlaF+Tv/h03vbMp4yJqY6bEk7qTtLV87H AGoI4xGASMNLnBfO/qOL+jWvBNA4mdANF/6PDdgYQV3sVNSvsaxg2sSJjsQa3UIxos8s yvsjtYJJ0rJOecI5CmCjRE72cUy+a4xC45e+PzRS6rr78VEmv7wXG6Qtr8z0yuLLfKXV hORZNPdmw/9sTjnHNPvnsUvn/gwi0PdmWWy42FIATBGK4Dq6Nk6R6E/R/51ufx8uGsYt DAOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=iYV2N7g8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 196si5354830pfv.261.2018.01.25.15.54.56; Thu, 25 Jan 2018 15:55:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=iYV2N7g8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837AbeAYXyN (ORCPT + 99 others); Thu, 25 Jan 2018 18:54:13 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:33830 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751651AbeAYXxu (ORCPT ); Thu, 25 Jan 2018 18:53:50 -0500 Received: by mail-it0-f65.google.com with SMTP id m11so22911161iti.1 for ; Thu, 25 Jan 2018 15:53:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=qt2gAMzN9R/15csvM3IUTPHHpODhcWGxuA2KGgg2/QE=; b=iYV2N7g8Er26UaZWUZ+ZjnvRr/75cPhPfjcvPQ1N+ya992W8UTO8QraSELaBSgSIC0 zl5J+u5G0L0QRoBG8wU7JsFBDSEGQQJe5RQq0HfC9YDsuJAGLuRo0EZUZd634WHuenDF zdS6curVN8dqWbbbh7kjUhbM/96SeuggZ+EPnr9skCA9CJ6WFeez0+hokm1pWuYvl9T3 4A3fmzZA5Mv1aoZ0tcOD19gj5/NHB2F4a3FO0BSxXUmrCOoz0j7c3YSWFnhH9/ZoBkua OmNCKg8vP3g6vMtP49mupDJyZ7gfYzL4lNitmpSYkcSjNe/TPeLNR+rCz3RZg9uiSVYT uStQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=qt2gAMzN9R/15csvM3IUTPHHpODhcWGxuA2KGgg2/QE=; b=ZXiEhkVogzMbtLyt7xYFm6pw7BBB0+ixK5PWkT3z/ee/1eluQZk/lqgogI2i0vsr3z k3uOg3kdlSX/quThqXeD397cwMCtyJTOoAv1iTqd/JMrQ0IoqDZA7yYBvMh9UG6j2mMJ FktRoXYiwSzN1XkYrymmvzdj9zbuxDPi4pKF9sXt/4QxVgLZRoZzdRQeoCdDWcnvhPQ9 bN5po0/PneJz1vkxtyr0dthmNnt/ZKWdFW3pU9Lovjy/zuVDgA4YBPizkHIDcPbAUGqs aKFK6nCC+fjW2LiInADNxrzpEO6YfcHtV2hcGcQVdUDr08iCipdh0TosoLTmcTopoH01 kyKQ== X-Gm-Message-State: AKwxytdXQYXAKZuRw0QWCUYjlv+WV72KlI1R10Hi92iU3FqkZD/HXyL8 jnXLTeWkD6dprXYcjhF0Tp1+lA== X-Received: by 10.36.124.216 with SMTP id a207mr14808482itd.92.1516924429989; Thu, 25 Jan 2018 15:53:49 -0800 (PST) Received: from [2620:15c:17:3:855a:6e21:19d4:9b12] ([2620:15c:17:3:855a:6e21:19d4:9b12]) by smtp.gmail.com with ESMTPSA id s70sm2739093itb.0.2018.01.25.15.53.48 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Jan 2018 15:53:49 -0800 (PST) Date: Thu, 25 Jan 2018 15:53:48 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable In-Reply-To: Message-ID: References: User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now that each mem cgroup on the system has a memory.oom_policy tunable to specify oom kill selection behavior, remove the needless "groupoom" mount option that requires (1) the entire system to be forced, perhaps unnecessarily, perhaps unexpectedly, into a single oom policy that differs from the traditional per process selection, and (2) a remount to change. Instead of enabling the cgroup aware oom killer with the "groupoom" mount option, set the mem cgroup subtree's memory.oom_policy to "cgroup". Signed-off-by: David Rientjes --- Documentation/cgroup-v2.txt | 43 +++++++++++++++++++++---------------------- include/linux/cgroup-defs.h | 5 ----- include/linux/memcontrol.h | 5 +++++ kernel/cgroup/cgroup.c | 13 +------------ mm/memcontrol.c | 17 ++++++++--------- 5 files changed, 35 insertions(+), 48 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1074,6 +1074,10 @@ PAGE_SIZE multiple when read back. victim; that is, it will choose the single process with the largest memory footprint. + If "cgroup", the OOM killer will compare mem cgroups as indivisible + memory consumers; that is, they will compare mem cgroup usage rather + than process memory footprint. See the "OOM Killer" section. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified @@ -1280,37 +1284,32 @@ belonging to the affected files to ensure correct memory ownership. OOM Killer ~~~~~~~~~~ -Cgroup v2 memory controller implements a cgroup-aware OOM killer. -It means that it treats cgroups as first class OOM entities. +Cgroup v2 memory controller implements an optional cgroup-aware out of +memory killer, which treats cgroups as indivisible OOM entities. -Cgroup-aware OOM logic is turned off by default and requires -passing the "groupoom" option on mounting cgroupfs. It can also -by remounting cgroupfs with the following command:: +This policy is controlled by memory.oom_policy. When a memory cgroup is +out of memory, its memory.oom_policy will dictate how the OOM killer will +select a process, or cgroup, to kill. Likewise, when the system is OOM, +the policy is dictated by the root mem cgroup. - # mount -o remount,groupoom $MOUNT_POINT +There are currently two available oom policies: -Under OOM conditions the memory controller tries to make the best -choice of a victim, looking for a memory cgroup with the largest -memory footprint, considering leaf cgroups and cgroups with the -memory.oom_group option set, which are considered to be an indivisible -memory consumers. + - "none": default, choose the largest single memory hogging process to + oom kill, as traditionally the OOM killer has always done. -By default, OOM killer will kill the biggest task in the selected -memory cgroup. A user can change this behavior by enabling -the per-cgroup memory.oom_group option. If set, it causes -the OOM killer to kill all processes attached to the cgroup, -except processes with oom_score_adj set to -1000. + - "cgroup": choose the cgroup with the largest memory footprint from the + subtree as an OOM victim and kill at least one process, depending on + memory.oom_group, from it. -This affects both system- and cgroup-wide OOMs. For a cgroup-wide OOM -the memory controller considers only cgroups belonging to the sub-tree -of the OOM'ing cgroup. +When selecting a cgroup as a victim, the OOM killer will kill the process +with the largest memory footprint. A user can control this behavior by +enabling the per-cgroup memory.oom_group option. If set, it causes the +OOM killer to kill all processes attached to the cgroup, except processes +with /proc/pid/oom_score_adj set to -1000 (oom disabled). The root cgroup is treated as a leaf memory cgroup, so it's compared with other leaf memory cgroups and cgroups with oom_group option set. -If there are no cgroups with the enabled memory controller, -the OOM killer is using the "traditional" process-based approach. - Please, note that memory charges are not migrating if tasks are moved between different memory cgroups. Moving tasks with significant memory footprint may affect OOM victim selection logic. diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -81,11 +81,6 @@ enum { * Enable cpuset controller in v1 cgroup to use v2 behavior. */ CGRP_ROOT_CPUSET_V2_MODE = (1 << 4), - - /* - * Enable cgroup-aware OOM killer. - */ - CGRP_GROUP_OOM = (1 << 5), }; /* cftype->flags */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -64,6 +64,11 @@ enum memcg_oom_policy { * oom_badness() */ MEMCG_OOM_POLICY_NONE, + /* + * Local cgroup usage is used to select a target cgroup, treating each + * mem cgroup as an indivisible consumer + */ + MEMCG_OOM_POLICY_CGROUP, }; struct mem_cgroup_reclaim_cookie { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1732,9 +1732,6 @@ static int parse_cgroup_root_flags(char *data, unsigned int *root_flags) if (!strcmp(token, "nsdelegate")) { *root_flags |= CGRP_ROOT_NS_DELEGATE; continue; - } else if (!strcmp(token, "groupoom")) { - *root_flags |= CGRP_GROUP_OOM; - continue; } pr_err("cgroup2: unknown option \"%s\"\n", token); @@ -1751,11 +1748,6 @@ static void apply_cgroup_root_flags(unsigned int root_flags) cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE; else cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE; - - if (root_flags & CGRP_GROUP_OOM) - cgrp_dfl_root.flags |= CGRP_GROUP_OOM; - else - cgrp_dfl_root.flags &= ~CGRP_GROUP_OOM; } } @@ -1763,8 +1755,6 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root { if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE) seq_puts(seq, ",nsdelegate"); - if (cgrp_dfl_root.flags & CGRP_GROUP_OOM) - seq_puts(seq, ",groupoom"); return 0; } @@ -5922,8 +5912,7 @@ static struct kobj_attribute cgroup_delegate_attr = __ATTR_RO(delegate); static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "nsdelegate\n" - "groupoom\n"); + return snprintf(buf, PAGE_SIZE, "nsdelegate\n"); } static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features); diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2798,14 +2798,14 @@ bool mem_cgroup_select_oom_victim(struct oom_control *oc) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return false; - if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM)) - return false; - if (oc->memcg) root = oc->memcg; else root = root_mem_cgroup; + if (root->oom_policy != MEMCG_OOM_POLICY_CGROUP) + return false; + select_victim_memcg(root, oc); return oc->chosen_memcg; @@ -5412,9 +5412,6 @@ static int memory_oom_group_show(struct seq_file *m, void *v) struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); bool oom_group = memcg->oom_group; - if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM)) - return -ENOTSUPP; - seq_printf(m, "%d\n", oom_group); return 0; @@ -5428,9 +5425,6 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, int oom_group; int err; - if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM)) - return -ENOTSUPP; - err = kstrtoint(strstrip(buf), 0, &oom_group); if (err) return err; @@ -5541,6 +5535,9 @@ static int memory_oom_policy_show(struct seq_file *m, void *v) enum memcg_oom_policy policy = READ_ONCE(memcg->oom_policy); switch (policy) { + case MEMCG_OOM_POLICY_CGROUP: + seq_puts(m, "cgroup\n"); + break; case MEMCG_OOM_POLICY_NONE: default: seq_puts(m, "none\n"); @@ -5557,6 +5554,8 @@ static ssize_t memory_oom_policy_write(struct kernfs_open_file *of, buf = strstrip(buf); if (!memcmp("none", buf, min(sizeof("none")-1, nbytes))) memcg->oom_policy = MEMCG_OOM_POLICY_NONE; + else if (!memcmp("cgroup", buf, min(sizeof("cgroup")-1, nbytes))) + memcg->oom_policy = MEMCG_OOM_POLICY_CGROUP; else ret = -EINVAL;