Received: by 10.213.65.68 with SMTP id h4csp1080226imn; Thu, 22 Mar 2018 14:55:43 -0700 (PDT) X-Google-Smtp-Source: AG47ELshcJh72y6Z/wzbNmURtuT7vzfDOwqvp8+skr+U2HWkXSZ4KG0VE7hIzlHQkJI9oUDXk7zS X-Received: by 2002:a17:902:47aa:: with SMTP id r39-v6mr15319644pld.59.1521755743709; Thu, 22 Mar 2018 14:55:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521755743; cv=none; d=google.com; s=arc-20160816; b=umAE0ekXC97OKAWtc+lx6o5E//dXlVDrwMuMEIMtZqBjRQvQivwtcNJKW262CE+9MQ VV4nfxOF9KHz5J7Eo80c8BkXrYIF/09OA9z3kc2AH9JWgeYLejh/OXyoSN065gkgJNw1 kAXTxb8bUQR8cyLAWGQX/x8ZhjzZjgJcJJ32vYn66M4Z7MoeZPjT/0zCez95WOY+MnRW bHIsS4n0QV90aS3rzi35xlnMARuEypGPrvFGo71c/Oqb5xj8Y26+Q+DQ9eoFzioJS+fY GqvSFUSTEMnHjIeHRZTRZxz6qipBgWo4SmN0Vb5TTIWs7ihr24WCMMjOakQ2ZP0Vc5bw 7aSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=1OVC/0rqmrFN5/SWqlW5/0qZ+vKL+05jb5jNi2zN/pc=; b=Kn5PwRFCOT6pQgYd2UJu6JxRugv49SQBzNcEKgpG8pOB3CW/k37YPvRH+S8dZGT0Z3 Gbx+6Z5vYWGlDNsVneKarVkawD7xlZv1sDg+8diiuH/UyJtedQ8IrjYqNGtRtrSdB46y opljltYqy7nR22WYNDnI4vOXh5lQJ6xEc7dQzBLIWyRPiMAozclDhdxeYmEGa/lnxTJ1 AnT+AcIC3iqtfg+HheBdEQ94uZuTIPbli+uRYXIdTSd/5gmqdtElpcJyBo3wHpM64LRE EKykh5eVatPF/k6ZLQ0lPcyiBIu49Ilk9k6GaPC4CosRUo5ftdGTlU21MLs50z9v0gXC eC4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mP2+bC7m; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r20si5552243pfk.125.2018.03.22.14.55.29; Thu, 22 Mar 2018 14:55:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mP2+bC7m; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752098AbeCVVyZ (ORCPT + 99 others); Thu, 22 Mar 2018 17:54:25 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:33243 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751828AbeCVVx4 (ORCPT ); Thu, 22 Mar 2018 17:53:56 -0400 Received: by mail-pf0-f196.google.com with SMTP id 123so3917019pfe.0 for ; Thu, 22 Mar 2018 14:53:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=1OVC/0rqmrFN5/SWqlW5/0qZ+vKL+05jb5jNi2zN/pc=; b=mP2+bC7mE9jkLX5Vbwkhv5UEPxlNdDJti6zl0Ey9O5ujDzG9oEo5hCXYa5TtNEwif7 BeWY0NZVZe866WSNSnva7vFv7/nG8XyqRfRtro7XoAfysXI7xzRMFVx6S8ytCiPwL6id Wl4ayeH5z54EVdywrBZjAVF65A6QagzkxQXYqXsNv8XRPGsmJuEco4o5AaTYQVW5mEr/ n8dyzFQmhhqqGqhrQLbefS6H/WyKeAtFzLLPzAA9+mfyXm2wth9ASirtdsaWt0hGh273 43F0s2h4jBxva+Yx6jaxeb0orZHbAkqzpatgdfG/I7ZO/NnxCS//kgjBaXPmMjGsmeBA xlsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=1OVC/0rqmrFN5/SWqlW5/0qZ+vKL+05jb5jNi2zN/pc=; b=Q6UKvH3Uhk1R0GZUayn9EMUXUs1US0AxekMpTOYiDpQEgepmaBg0HfucrFMwgh/qnB gTTCFj6MG6+0thiYgTIV7u8ElX3CSXIwxeiiXVRj3acj490Q9Ps14fm520aQ2anzkxrS cDxbiV8lVUCTu/3pHkFyMz8e7k7MvmdsdSne25Ft6LwTNHB7zHe0oLI3L7mLmZvfuerU ZMd0+wAO2JZqgzSebuXYqmMK2SuXTwV8zrzgNlb0mu/j3pUO/l72BZ2IvUv+JBnUlPEr 4+PaqXruOrPIq07NqGHJAIrfyT1bprBrZ5pIeRpsR4CYZUgYhmI341ouqfRWcq5c9hUK W6CA== X-Gm-Message-State: AElRT7GZNY2flQnyosyRts2n/OvX8CvYb3IP6lfgVuScuhnjp777PcR/ 6Dhkc1rsB25us/HUzAL2Zae7hQ== X-Received: by 10.98.74.143 with SMTP id c15mr21829395pfj.83.1521755635535; Thu, 22 Mar 2018 14:53:55 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id g26sm15297450pfk.173.2018.03.22.14.53.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Mar 2018 14:53:54 -0700 (PDT) Date: Thu, 22 Mar 2018 14:53:54 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch v2 -mm 6/6] mm, memcg: disregard mempolicies for cgroup-aware oom killer In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cgroup-aware oom killer currently considers the set of allowed nodes for the allocation that triggers the oom killer and discounts usage from disallowed nodes when comparing cgroups. If a cgroup has both the cpuset and memory controllers enabled, it may be possible to restrict allocations to a subset of nodes, for example. Some latency sensitive users use cpusets to allocate only local memory, almost to the point of oom even though there is an abundance of available free memory on other nodes. The same is true for processes that mbind(2) their memory to a set of allowed nodes. This yields very inconsistent results by considering usage from each mem cgroup (and perhaps its subtree) for the allocation's set of allowed nodes for its mempolicy. Allocating a single page for a vma that is mbind to a now-oom node can cause a cgroup that is restricted to that node by its cpuset controller to be oom killed when other cgroups may have much higher overall usage. The cgroup-aware oom killer is described as killing the largest memory consuming cgroup (or subtree) without mentioning the mempolicy of the allocation. For now, discount it. It would be possible to add an additional oom policy for NUMA awareness if it would be generally useful later with the extensible interface. Signed-off-by: David Rientjes --- mm/memcontrol.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2608,19 +2608,15 @@ static inline bool memcg_has_children(struct mem_cgroup *memcg) return ret; } -static long memcg_oom_badness(struct mem_cgroup *memcg, - const nodemask_t *nodemask) +static long memcg_oom_badness(struct mem_cgroup *memcg) { const bool is_root_memcg = memcg == root_mem_cgroup; long points = 0; int nid; - pg_data_t *pgdat; for_each_node_state(nid, N_MEMORY) { - if (nodemask && !node_isset(nid, *nodemask)) - continue; + pg_data_t *pgdat = NODE_DATA(nid); - pgdat = NODE_DATA(nid); if (is_root_memcg) { points += node_page_state(pgdat, NR_ACTIVE_ANON) + node_page_state(pgdat, NR_INACTIVE_ANON); @@ -2656,8 +2652,7 @@ static long memcg_oom_badness(struct mem_cgroup *memcg, * >0: memcg is eligible, and the returned value is an estimation * of the memory footprint */ -static long oom_evaluate_memcg(struct mem_cgroup *memcg, - const nodemask_t *nodemask) +static long oom_evaluate_memcg(struct mem_cgroup *memcg) { struct css_task_iter it; struct task_struct *task; @@ -2691,7 +2686,7 @@ static long oom_evaluate_memcg(struct mem_cgroup *memcg, if (eligible <= 0) return eligible; - return memcg_oom_badness(memcg, nodemask); + return memcg_oom_badness(memcg); } static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) @@ -2751,7 +2746,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) if (memcg_has_children(iter)) continue; - score = oom_evaluate_memcg(iter, oc->nodemask); + score = oom_evaluate_memcg(iter); /* * Ignore empty and non-eligible memory cgroups. @@ -2780,8 +2775,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) if (oc->chosen_memcg != INFLIGHT_VICTIM) { if (root == root_mem_cgroup) { - group_score = oom_evaluate_memcg(root_mem_cgroup, - oc->nodemask); + group_score = oom_evaluate_memcg(root_mem_cgroup); if (group_score > leaf_score) { /* * Discount the sum of all leaf scores to find