Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756416Ab2JKJNj (ORCPT ); Thu, 11 Oct 2012 05:13:39 -0400 Received: from cantor2.suse.de ([195.135.220.15]:47939 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750739Ab2JKJNf (ORCPT ); Thu, 11 Oct 2012 05:13:35 -0400 Date: Thu, 11 Oct 2012 11:13:33 +0200 From: Michal Hocko To: Andrew Morton Cc: linux-mm@kvack.org, David Rientjes , KOSAKI Motohiro , KAMEZAWA Hiroyuki , Johannes Weiner , LKML Subject: Re: [PATCH] memcg: oom: fix totalpages calculation for memory.swappiness==0 Message-ID: <20121011091332.GA29301@dhcp22.suse.cz> References: <20121011085038.GA29295@dhcp22.suse.cz> <1349945859-1350-1-git-send-email-mhocko@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1349945859-1350-1-git-send-email-mhocko@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3201 Lines: 87 On Thu 11-10-12 10:57:39, Michal Hocko wrote: > oom_badness takes totalpages argument which says how many pages are > available and it uses it as a base for the score calculation. The value > is calculated by mem_cgroup_get_limit which considers both limit and > total_swap_pages (resp. memsw portion of it). > > This is usually correct but since fe35004f (mm: avoid swapping out > with swappiness==0) we do not swap when swappiness is 0 which means > that we cannot really use up all the totalpages pages. This in turn > confuses oom score calculation if the memcg limit is much smaller than > the available swap because the used memory (capped by the limit) is > negligible comparing to totalpages so the resulting score is too small > if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj). > A wrong process might be selected as result. > > The same issue exists for the global oom killer as well but it is not > that problematic as the amount of the RAM is usually much bigger than > the swap space. > > The problem can be worked around by checking mem_cgroup_swappiness==0 > and not considering swap at all in such a case. > > Signed-off-by: Michal Hocko > Acked-by: David Rientjes > Cc: stable [3.5+] I have just realized that fe35004f (introduced in 3.5-rc1) has been backported to 3.2 and 3.4 stable kernels so this should be [3.2+] > --- > mm/memcontrol.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7acf43b..93a7e36 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1452,17 +1452,26 @@ static int mem_cgroup_count_children(struct mem_cgroup *memcg) > static u64 mem_cgroup_get_limit(struct mem_cgroup *memcg) > { > u64 limit; > - u64 memsw; > > limit = res_counter_read_u64(&memcg->res, RES_LIMIT); > - limit += total_swap_pages << PAGE_SHIFT; > > - memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT); > /* > - * If memsw is finite and limits the amount of swap space available > - * to this memcg, return that limit. > + * Do not consider swap space if we cannot swap due to swappiness > */ > - return min(limit, memsw); > + if (mem_cgroup_swappiness(memcg)) { > + u64 memsw; > + > + limit += total_swap_pages << PAGE_SHIFT; > + memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT); > + > + /* > + * If memsw is finite and limits the amount of swap space > + * available to this memcg, return that limit. > + */ > + limit = min(limit, memsw); > + } > + > + return limit; > } > > void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > -- > 1.7.10.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/