Received: by 10.213.65.68 with SMTP id h4csp65256imn; Mon, 12 Mar 2018 17:59:33 -0700 (PDT) X-Google-Smtp-Source: AG47ELvJUa6GV2Eg9XNdqD+4Cw8cCMrBkp2y1E2mX0JAOOHKa9WmgRNGGecO9bdk1Amop2ZtBKLI X-Received: by 10.99.163.1 with SMTP id s1mr8322664pge.47.1520902773877; Mon, 12 Mar 2018 17:59:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520902773; cv=none; d=google.com; s=arc-20160816; b=LKrRpM7AaSByfYb1Se2TA1H/j6MG9Xhnw5gVucXDGNGdd0HUYCbIvhIopWPmM1VSbd hKX+BBFld5n0qF8k8jTXaTN8gCfNoM24R2qmA0Fy0h1nXTYnz9L+DkUsleGjADCuRYfO HLzM9/mu8hrF8wNdF5KYIoJJLR2TcKJGAv9lkz1pb9CmZxDkkAM0ode53bv8q9VwU09i edd8wstbdyPJvEZUHG94uTWfFdpkCsmnXrQJz6wKlYYSJtc8/BWgBwxRtVY8fV/qS6Hp cosCHNs9rqXOAjRMysi9f1OVATbVVyU//YEhSyNC8WbspPw0l64wZvbGoBcPAqPk0yj0 Vt4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=10Yo9t8r+SqscV3pxVKGKfH22DhxPF9wnkpiFYMsoww=; b=GHsA0qWSbe2GrM9EPNYfSi7eiRZ0JR7I4x0V118SkrCtw49PyWXNrCBiGqHynSF7y2 9nuOlRRQ6OM11fUC06qXjmX7DPbBxvgkdu3eVN/mFZjZTF41E5PXSdjol6b6bTA5XOtG smwbt3TgZtRiVpxrLKUcFLKJ+AwA/sTYiYLYLYZ/Gs7Pf9b/XiPvFAEDqpJKurqSVAxI MNFUwuWqtpA+LARW/plRCyprWlnpmko8RD97ai2tbwRR2nwkIa4AhgudEmhnC6Z8Ssts G2xvovejl4gQSzq55GKy4AuzJfcQo0pGHYJu4zD/8f1lgaN1IMAinzUQD09BzxudFyWn P5DQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nNs4OEGL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f64-v6si6962974plb.377.2018.03.12.17.59.19; Mon, 12 Mar 2018 17:59:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nNs4OEGL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932379AbeCMA56 (ORCPT + 99 others); Mon, 12 Mar 2018 20:57:58 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:42528 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932307AbeCMA5z (ORCPT ); Mon, 12 Mar 2018 20:57:55 -0400 Received: by mail-pg0-f67.google.com with SMTP id x2so3128640pgo.9 for ; Mon, 12 Mar 2018 17:57:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=10Yo9t8r+SqscV3pxVKGKfH22DhxPF9wnkpiFYMsoww=; b=nNs4OEGLryPIxYssxsTP8Bq63AyOVfaBZjT1BNp5oRhU7UgdYDTCu0zGZPAWDVWofy mnlOMOsGz37RKXlq0TFU0JS7ILnVuzn8CKL9FdrEI0yd411naP/+34VgvS5WW/bPxcXt m2lINSL3XCwPBnnnHtmYR+/cQjwk1D2yb/gW+0zoAaZBIZU8EcH8rJ3iN6sV8w1xIB0/ JpX5400Tqyr1ZQxmDoAG29nmw6/UF0zrJTFDZDWsCosMje8L7buKZ+Q2SHPOZEasqpSk dlX4SmoY2KbA5El3ZO0/xo0JbZpocNhVjS1Bcyy56npeZj007uVuZlpoA49xmRT+vhj3 W7PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=10Yo9t8r+SqscV3pxVKGKfH22DhxPF9wnkpiFYMsoww=; b=JxpaQerec9ebZ3Gm1pw7xCJqxteFJ6vxmiK6NHZIkTKtBN1+iDPMyWmvxfsfQPM9ND 6xVsl6PrE07XX2pFpJJXXuqEdMc7axnq5uc7sUmIZLgWLHLYjfsvGvjwaC0ovGoq/itu GiNOAELdR+0oCjO+n2TGbUpZdblHnGamdpU622W0U3KMbYXacFHhFzgPoO2LuhxMRrNS R6MKkPPsLji0IWtObjQ7ZRT+ykXjOMvkaTkhq4X7xaabl6t0cVecMCAHd/Nw1aakCPcm 8601wLPpNi5MwosusPhNHjViQpEIsaEwx5CwUJnhRr88FHI19n0s60kZLPQJ/Qiz7GBc LLFw== X-Gm-Message-State: AElRT7Fb8D/WrQfLfzxXnSwRK3fnVHXeDI1TjFkTELsI0jbwljHw2R76 NlopELNctJRKx0CUbOCTj0nJQA== X-Received: by 10.101.90.140 with SMTP id c12mr8010615pgt.56.1520902675140; Mon, 12 Mar 2018 17:57:55 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id h80sm18847299pfj.181.2018.03.12.17.57.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Mar 2018 17:57:54 -0700 (PDT) Date: Mon, 12 Mar 2018 17:57:53 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch -mm v3 1/3] mm, memcg: introduce per-memcg oom policy tunable In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cgroup aware oom killer is needlessly enforced for the entire system by a mount option. It's unnecessary to force the system into a single oom policy: either cgroup aware, or the traditional process aware. This patch introduces a memory.oom_policy tunable for all mem cgroups. It is currently a no-op: it can only be set to "none", which is its default policy. It will be expanded in the next patch to define cgroup aware oom killer behavior for its subtree. This is an extensible interface that can be used to define cgroup aware assessment of mem cgroup subtrees or the traditional process aware assessment. Another benefit of such an approach is that an admin can lock in a certain policy for the system or for a mem cgroup subtree and can delegate the policy decision to the user to determine if the kill should originate from a subcontainer, as indivisible memory consumers themselves, or selection should be done per process. Signed-off-by: David Rientjes --- Documentation/cgroup-v2.txt | 11 +++++++++++ include/linux/memcontrol.h | 11 +++++++++++ mm/memcontrol.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 57 insertions(+) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1065,6 +1065,17 @@ PAGE_SIZE multiple when read back. If cgroup-aware OOM killer is not enabled, ENOTSUPP error is returned on attempt to access the file. + memory.oom_policy + + A read-write single string file which exists on all cgroups. The + default value is "none". + + If "none", the OOM killer will use the default policy to choose a + victim; that is, it will choose the single process with the largest + memory footprint adjusted by /proc/pid/oom_score_adj (see + Documentation/filesystems/proc.txt). This is the same policy as if + memory cgroups were not even mounted. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -58,6 +58,14 @@ enum memcg_event_item { MEMCG_NR_EVENTS, }; +enum memcg_oom_policy { + /* + * No special oom policy, process selection is determined by + * oom_badness() + */ + MEMCG_OOM_POLICY_NONE, +}; + struct mem_cgroup_reclaim_cookie { pg_data_t *pgdat; int priority; @@ -203,6 +211,9 @@ struct mem_cgroup { /* OOM-Killer disable */ int oom_kill_disable; + /* OOM policy for this subtree */ + enum memcg_oom_policy oom_policy; + /* * Treat the sub-tree as an indivisible memory consumer, * kill all belonging tasks if the memory cgroup selected diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4415,6 +4415,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; + memcg->oom_policy = parent->oom_policy; } if (parent && parent->use_hierarchy) { memcg->use_hierarchy = true; @@ -5532,6 +5533,34 @@ static int memory_stat_show(struct seq_file *m, void *v) return 0; } +static int memory_oom_policy_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + enum memcg_oom_policy policy = READ_ONCE(memcg->oom_policy); + + switch (policy) { + case MEMCG_OOM_POLICY_NONE: + default: + seq_puts(m, "none\n"); + }; + return 0; +} + +static ssize_t memory_oom_policy_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = nbytes; + + buf = strstrip(buf); + if (!memcmp("none", buf, min(sizeof("none")-1, nbytes))) + memcg->oom_policy = MEMCG_OOM_POLICY_NONE; + else + ret = -EINVAL; + + return ret; +} + static struct cftype memory_files[] = { { .name = "current", @@ -5573,6 +5602,12 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .seq_show = memory_stat_show, }, + { + .name = "oom_policy", + .flags = CFTYPE_NS_DELEGATABLE, + .seq_show = memory_oom_policy_show, + .write = memory_oom_policy_write, + }, { } /* terminate */ };