Received: by 10.223.176.46 with SMTP id f43csp2816289wra; Thu, 25 Jan 2018 15:55:42 -0800 (PST) X-Google-Smtp-Source: AH8x225rqmysj9NO7f0rb2O+WoOcbPYQ9uNfJrgsoYgPHiuvxWafGqJ4opjBC1RNPp8B533Eys7Z X-Received: by 2002:a17:902:71c6:: with SMTP id t6-v6mr13271690plm.318.1516924542886; Thu, 25 Jan 2018 15:55:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516924542; cv=none; d=google.com; s=arc-20160816; b=IypSTQy5jLyWKkvCze4nzDd/XF/erAgatG7a++fdhKXbExs9WC2pHGGuVj3S3w4HE9 RTFxc3pH3KBcnMkmbgtslxVP4AQeQ+wra+wEwd/a9tnM6VzIGq2JXhQsuBKxaNSXVvkr K9lyUz4EunUcWP+a9xEPeH2BRwEmFi/JqTadibrQOTn2yFg0nb8ewYN+AFJH7q/+Ja7t edrxEUDV4sikZmOzy4eODap5niWj/gX80kd2TFKntFvqmUKKCiizlBjkHs/Eq+1xkamy OJRKKCpdWapJ3kppuMyFAodDryVCFDcxF+TNPON0MhvEQJ01HxNcU2M3IY8BazpzqkCo tQQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=VZHPE8sIrwXLFKhDh9aTzcgS4f885IPyAe9ayNXDmys=; b=Cupr5XpXyT36uShAwCq9sD1oaXoYsIzihUftmsGuUx/C5BcIl39Clfx05+avHJ3Maa qlyZtpXQVIvUdtyqlyyBk4vpfXkWQL6VafejIJQd3nwZ2fR2GUeMAWiDnWjUNoxm96UW vUnNrt+82dilhmcj8c+Jht5bdiTC3l4DztGUTA933XNbarm19SaPAjHktRqrOV16S+ft aqulvZPmo3UgTDT4J17J0Us099KvJYeEFmP6sYPW8jJ0vWlBbEkOEj1OZOKMLM6wBtNh kGLqrCK885730bvk9iFtcB85JVAi8Cgi1MeSWhYub0uKVj3L0SjBQAoW4RxHYWRMMT6k pkHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Vr9lR1q0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 18si2156364pgh.822.2018.01.25.15.55.27; Thu, 25 Jan 2018 15:55:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Vr9lR1q0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751801AbeAYXyK (ORCPT + 99 others); Thu, 25 Jan 2018 18:54:10 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:38354 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751736AbeAYXxx (ORCPT ); Thu, 25 Jan 2018 18:53:53 -0500 Received: by mail-it0-f65.google.com with SMTP id w14so77053itc.3 for ; Thu, 25 Jan 2018 15:53:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=VZHPE8sIrwXLFKhDh9aTzcgS4f885IPyAe9ayNXDmys=; b=Vr9lR1q0qT91PWSXOvg7E58Fu3qrH/Ek7xGcBv2sOjolUflGnyfKzX0fpTlz4X3VyS 22zHF76P5teRw0dqkHF99YH1OvT9OO7IWsqj+gtFTiBV0Qw4eoxJyMHEexM3mq5BRxdE x7iflMkGTwCV7Pv7X2EFOH5FGuX//09CYsEz24iwdoVC1TUB54ss0b7xwJ5sd/jvrXs/ QHh31PSkC3jd/9H9ENi2i+Ptw5OO1jwCW6Xr+EkYwVwvq0XjFGd50nkDSpSIF6lbTkQx Ej2LaLY84iJOYjiKYtbw4mtL8IOWy3b+8cAbHZSXjvlDYa+9sdpDYVp2Wi/0jIgoMxPP Tsjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=VZHPE8sIrwXLFKhDh9aTzcgS4f885IPyAe9ayNXDmys=; b=n6TLFuFlf25uo0myw3IntxstyrHldVcmV6WAilYw9bx/5A7fRTFKtgCHeiBGg2WKFq ovGjs9R14KUmEoW8AtEHyoP+yO21NT+RZfPG5Edfss6E85s6kZejrF/jlaHDqAw2kfkc ZgOZGua/Vkl5aORDm7YXxi2GhiJCw4NZ/65Ots2zbSJJmXlennpHrH55PkQeKIsWaAAZ dSY6pMfIzKJ8eXDehsQXep4IXwMw1G1v+cdNmFctw3WVkNcZKXdh8jRyLLZB6Ns7VFh/ nKlwrXgUZgw5w51GjQqZKnBLvPBDTGh7rTXTz6PuNvz/NqgllS/7b869J7ox92qkxXfI kIJg== X-Gm-Message-State: AKwxytd0CGQrmLkeaUyNMWAB/AMMn/SV4UqJG5X0IA0dBp8wmMpYL5EL 8cFzkhVgxNM//vMIgKv33txoPQ== X-Received: by 10.36.104.148 with SMTP id v142mr15122886itb.76.1516924432572; Thu, 25 Jan 2018 15:53:52 -0800 (PST) Received: from [2620:15c:17:3:855a:6e21:19d4:9b12] ([2620:15c:17:3:855a:6e21:19d4:9b12]) by smtp.gmail.com with ESMTPSA id t8sm2525388ite.2.2018.01.25.15.53.51 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Jan 2018 15:53:52 -0800 (PST) Date: Thu, 25 Jan 2018 15:53:50 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch -mm v2 3/3] mm, memcg: add hierarchical usage oom policy In-Reply-To: Message-ID: References: User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org One of the three significant concerns brought up about the cgroup aware oom killer is that its decisionmaking is completely evaded by creating subcontainers and attaching processes such that the ancestor's usage does not exceed another cgroup on the system. In this regard, users who do not distribute their processes over a set of subcontainers for mem cgroup control, statistics, or other controllers are unfairly penalized. This adds an oom policy, "tree", that accounts for hierarchical usage when comparing cgroups and the cgroup aware oom killer is enabled by an ancestor. This allows administrators, for example, to require users in their own top-level mem cgroup subtree to be accounted for with hierarchical usage. In other words, they can longer evade the oom killer by using other controllers or subcontainers. Signed-off-by: David Rientjes --- Documentation/cgroup-v2.txt | 12 ++++++++++-- include/linux/memcontrol.h | 5 +++++ mm/memcontrol.c | 12 +++++++++--- 3 files changed, 24 insertions(+), 5 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1078,6 +1078,11 @@ PAGE_SIZE multiple when read back. memory consumers; that is, they will compare mem cgroup usage rather than process memory footprint. See the "OOM Killer" section. + If "tree", the OOM killer will compare mem cgroups and its subtree + as indivisible memory consumers when selecting a hierarchy. This + policy cannot be set on the root mem cgroup. See the "OOM Killer" + section. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified @@ -1301,6 +1306,9 @@ There are currently two available oom policies: subtree as an OOM victim and kill at least one process, depending on memory.oom_group, from it. + - "tree": choose the cgroup with the largest memory footprint considering + itself and its subtree and kill at least one process. + When selecting a cgroup as a victim, the OOM killer will kill the process with the largest memory footprint. A user can control this behavior by enabling the per-cgroup memory.oom_group option. If set, it causes the @@ -1314,8 +1322,8 @@ Please, note that memory charges are not migrating if tasks are moved between different memory cgroups. Moving tasks with significant memory footprint may affect OOM victim selection logic. If it's a case, please, consider creating a common ancestor for -the source and destination memory cgroups and enabling oom_group -on ancestor layer. +the source and destination memory cgroups and setting a policy of "tree" +and enabling oom_group on an ancestor layer. IO diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -69,6 +69,11 @@ enum memcg_oom_policy { * mem cgroup as an indivisible consumer */ MEMCG_OOM_POLICY_CGROUP, + /* + * Tree cgroup usage for all descendant memcg groups, treating each mem + * cgroup and its subtree as an indivisible consumer + */ + MEMCG_OOM_POLICY_TREE, }; struct mem_cgroup_reclaim_cookie { diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2728,7 +2728,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) /* * The oom_score is calculated for leaf memory cgroups (including * the root memcg). - * Non-leaf oom_group cgroups accumulating score of descendant + * Cgroups with oom policy of "tree" accumulate the score of descendant * leaf memory cgroups. */ rcu_read_lock(); @@ -2737,10 +2737,11 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) /* * We don't consider non-leaf non-oom_group memory cgroups - * as OOM victims. + * without the oom policy of "tree" as OOM victims. */ if (memcg_has_children(iter) && iter != root_mem_cgroup && - !mem_cgroup_oom_group(iter)) + !mem_cgroup_oom_group(iter) && + iter->oom_policy != MEMCG_OOM_POLICY_TREE) continue; /* @@ -5538,6 +5539,9 @@ static int memory_oom_policy_show(struct seq_file *m, void *v) case MEMCG_OOM_POLICY_CGROUP: seq_puts(m, "cgroup\n"); break; + case MEMCG_OOM_POLICY_TREE: + seq_puts(m, "tree\n"); + break; case MEMCG_OOM_POLICY_NONE: default: seq_puts(m, "none\n"); @@ -5556,6 +5560,8 @@ static ssize_t memory_oom_policy_write(struct kernfs_open_file *of, memcg->oom_policy = MEMCG_OOM_POLICY_NONE; else if (!memcmp("cgroup", buf, min(sizeof("cgroup")-1, nbytes))) memcg->oom_policy = MEMCG_OOM_POLICY_CGROUP; + else if (!memcmp("tree", buf, min(sizeof("tree")-1, nbytes))) + memcg->oom_policy = MEMCG_OOM_POLICY_TREE; else ret = -EINVAL;