Received: by 10.213.65.68 with SMTP id h4csp639768imn; Fri, 16 Mar 2018 14:10:50 -0700 (PDT) X-Google-Smtp-Source: AG47ELtBstEzw882UZjfy1oTg8hGhiJ3Uc5Kgt5AEBqf3PH20Z92I0rWqDi1aEEMvtwRXFbgX+Jv X-Received: by 10.101.69.198 with SMTP id m6mr1928627pgr.244.1521234650464; Fri, 16 Mar 2018 14:10:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521234650; cv=none; d=google.com; s=arc-20160816; b=Vg6erg8EwmIXZQ1vBGxoJCNa45gIRS2Q5Rb7+7PsT+4HzDwR/5vSA1UfOObi93f5w8 Jqsl1awaouzEUIi6s97ENz7qlC7h80C9Lh7QXbfk6aXx1AqlVof+WK440FPZcVY+ohOO mXLF4radUwQfQ+rUgF+d30Ag99s03RGUE2jjRGhCCO1WhZIideVj3SAjJZip1wuRzE8m BjJajbKW+NeJl3clmEYu+1lyTQrHuGQL+eiPZlTvlGoBrZBOLnZA3vIjjSGA9V1UxbXD DsiKL6HQ4/Z9J21xzZJsbB0dJ99SsYtOd9D/MSX1L4tGfxZtLPXXv3FRR18F+wUJbXp+ 71mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=A6vq5i90Jqkyhe9hZRu54QDM5GsjcTOeljWeyNIKdvw=; b=ULZqeHr4vW7RvQR6ZJjXToJ3/7sr8GJyBXVQUj61Y2MokHcOEZgMHL43IrM73A5OAz QsmzNe3eLigMgkpWxOZdMwwkkKq9yTqddBtvENUwi/XiH1hss+sETN/MsWRcYzLupn/f 08hpixnepxxcXY3AnyqU0WEijSHcxudH2v/e7ZcuucNIThsOKHoT6X+4WW9sg9kWjN3M hle8c5X6bqjCSjXUsWNI2Vki9yfXwzqJAZ/HOGKfUjonvzbUd4jteMBUdznHl/rXWcCZ R0Ze4M0OosuYYGMBkOlIR08T0/EWifJCeoQ9vtmRJaBo/EmoP4WBnr4q4TPmzZhu7JWd sWQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eHHOfcqS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m5-v6si6830384plt.235.2018.03.16.14.10.36; Fri, 16 Mar 2018 14:10:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eHHOfcqS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751956AbeCPVJR (ORCPT + 99 others); Fri, 16 Mar 2018 17:09:17 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:45792 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751770AbeCPVJA (ORCPT ); Fri, 16 Mar 2018 17:09:00 -0400 Received: by mail-pg0-f67.google.com with SMTP id s13so4544513pgn.12 for ; Fri, 16 Mar 2018 14:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=A6vq5i90Jqkyhe9hZRu54QDM5GsjcTOeljWeyNIKdvw=; b=eHHOfcqS1GMqX4uwECUjBaN6ci8f9RfOnMf1leuiBBkTA+5xN/bv1bVV19RW5kz3Up 6ji7MHPpfBZmFdA2uZDZ+YRWuaPa6HBcxCO5ulAWCz67hzy12XW0OOcPLxvaATLYMasq lE2rxJoLg/bH1R0fs/IJAuvhx6xblzjSNesVAO8YjeMOMznzyAlnVdzMuUixxJKWarvb cbirwFdzfA99eNqd5uhIhL5/afJTYWHs308AhevJzzFsOXPcXcL1TBtiJkfXwq/1q8YT IkOhJOl7pvSsZ8ctyP+wx8ckQ4AIpa8Q8xEbpSp6aDWp2XNxh4tlB7zfzGzdaoFU8FRP KPAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=A6vq5i90Jqkyhe9hZRu54QDM5GsjcTOeljWeyNIKdvw=; b=p2APykocZnkznzdxX++HVYblKn5+SOUWOEE9vD/720jG17VL3mbMHIEjVjtCBjXC81 FSkR835icQ84EEvIitMGohcUz/DjrEBrHIfh7LzwAdfIOqqttAmpSycFJ9nL3xLxScHx a/WtCNWkDoZk5m9XKtUBLDyE4YUCGDTs1dvACHpbp/UQ/K7PALBj7FsbpPtRH3KhgZ58 mMWu29zJMcEMoER3eCIOxXeyFzGtRYbHhYjqusoNmuM847cQ4kDZVPHTNq9b6Xe/Uyno Fg+FTj22eGZbf/FZEGYhDW/fmFlMg5L9nc/NZiB3npYeBm2YF4Wc3985GxshpsJ8dPnC OBJg== X-Gm-Message-State: AElRT7HJdhBEb3pSQsaW4Lv1JqYfPVLjtsLp20Q/EqGk+DtZdaxknjI/ 4zc1ejNF/lkG/JGolGyegA5HbQ== X-Received: by 10.99.4.202 with SMTP id 193mr535546pge.409.1521234539779; Fri, 16 Mar 2018 14:08:59 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id j83sm18172477pfj.18.2018.03.16.14.08.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Mar 2018 14:08:59 -0700 (PDT) Date: Fri, 16 Mar 2018 14:08:58 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch -mm 6/6] mm, memcg: disregard mempolicies for cgroup-aware oom killer In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cgroup-aware oom killer currently considers the set of allowed nodes for the allocation that triggers the oom killer and discounts usage from disallowed nodes when comparing cgroups. If a cgroup has both the cpuset and memory controllers enabled, it may be possible to restrict allocations to a subset of nodes, for example. Some latency sensitive users use cpusets to allocate only local memory, almost to the point of oom even though there is an abundance of available free memory on other nodes. The same is true for processes that mbind(2) their memory to a set of allowed nodes. This yields very inconsistent results by considering usage from each mem cgroup (and perhaps its subtree) for the allocation's set of allowed nodes for its mempolicy. Allocating a single page for a vma that is mbind to a now-oom node can cause a cgroup that is restricted to that node by its cpuset controller to be oom killed when other cgroups may have much higher overall usage. The cgroup-aware oom killer is described as killing the largest memory consuming cgroup (or subtree) without mentioning the mempolicy of the allocation. For now, discount it. It would be possible to add an additional oom policy for NUMA awareness if it would be generally useful later with the extensible interface. Signed-off-by: David Rientjes --- mm/memcontrol.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2608,19 +2608,15 @@ static inline bool memcg_has_children(struct mem_cgroup *memcg) return ret; } -static long memcg_oom_badness(struct mem_cgroup *memcg, - const nodemask_t *nodemask) +static long memcg_oom_badness(struct mem_cgroup *memcg) { const bool is_root_memcg = memcg == root_mem_cgroup; long points = 0; int nid; - pg_data_t *pgdat; for_each_node_state(nid, N_MEMORY) { - if (nodemask && !node_isset(nid, *nodemask)) - continue; + pg_data_t *pgdat = NODE_DATA(nid); - pgdat = NODE_DATA(nid); if (is_root_memcg) { points += node_page_state(pgdat, NR_ACTIVE_ANON) + node_page_state(pgdat, NR_INACTIVE_ANON); @@ -2656,8 +2652,7 @@ static long memcg_oom_badness(struct mem_cgroup *memcg, * >0: memcg is eligible, and the returned value is an estimation * of the memory footprint */ -static long oom_evaluate_memcg(struct mem_cgroup *memcg, - const nodemask_t *nodemask) +static long oom_evaluate_memcg(struct mem_cgroup *memcg) { struct css_task_iter it; struct task_struct *task; @@ -2691,7 +2686,7 @@ static long oom_evaluate_memcg(struct mem_cgroup *memcg, if (eligible <= 0) return eligible; - return memcg_oom_badness(memcg, nodemask); + return memcg_oom_badness(memcg); } static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) @@ -2751,7 +2746,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) if (memcg_has_children(iter)) continue; - score = oom_evaluate_memcg(iter, oc->nodemask); + score = oom_evaluate_memcg(iter); /* * Ignore empty and non-eligible memory cgroups. @@ -2780,8 +2775,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) if (oc->chosen_memcg != INFLIGHT_VICTIM) { if (root == root_mem_cgroup) { - group_score = oom_evaluate_memcg(root_mem_cgroup, - oc->nodemask); + group_score = oom_evaluate_memcg(root_mem_cgroup); if (group_score > leaf_score) { /* * Discount the sum of all leaf scores to find