Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756172AbcC2NGZ (ORCPT ); Tue, 29 Mar 2016 09:06:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:38642 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753523AbcC2NGW (ORCPT ); Tue, 29 Mar 2016 09:06:22 -0400 Subject: Re: [PATCH] mm: fix invalid node in alloc_migrate_target() To: Andrew Morton , Xishi Qiu References: <56F4E104.9090505@huawei.com> <20160325122237.4ca4e0dbca215ccbf4f49922@linux-foundation.org> Cc: Joonsoo Kim , David Rientjes , Naoya Horiguchi , Laura Abbott , zhuhui@xiaomi.com, wangxq10@lzu.edu.cn, Linux MM , LKML From: Vlastimil Babka Message-ID: <56FA7DC8.4000902@suse.cz> Date: Tue, 29 Mar 2016 15:06:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160325122237.4ca4e0dbca215ccbf4f49922@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3488 Lines: 108 On 03/25/2016 08:22 PM, Andrew Morton wrote: > On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu wrote: > >> It is incorrect to use next_node to find a target node, it will >> return MAX_NUMNODES or invalid node. This will lead to crash in >> buddy system allocation. >> >> ... >> >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, >> * now as a simple work-around, we use the next node for destination. >> */ >> if (PageHuge(page)) { >> - nodemask_t src = nodemask_of_node(page_to_nid(page)); >> - nodemask_t dst; >> - nodes_complement(dst, src); >> + int node = next_online_node(page_to_nid(page)); >> + if (node == MAX_NUMNODES) >> + node = first_online_node; >> return alloc_huge_page_node(page_hstate(compound_head(page)), >> - next_node(page_to_nid(page), dst)); >> + node); >> } >> >> if (PageHighMem(page)) > > Indeed. Can you tell us more about this circumstances under which the > kernel will crash? I need to decide which kernel version(s) need the > patch, but the changelog doesn't contain the info needed to make this > decision (it should). > > > > next_node() isn't a very useful interface, really. Just about every > caller does this: > > > node = next_node(node, XXX); > if (node == MAX_NUMNODES) > node = first_node(XXX); > > so how about we write a function which does that, and stop open-coding > the same thing everywhere? Good idea. > And I think your fix could then use such a function: > > int node = that_new_function(page_to_nid(page), node_online_map); > > > > Also, mm/mempolicy.c:offset_il_node() worries me: > > do { > nid = next_node(nid, pol->v.nodes); > c++; > } while (c <= target); > > Can't `nid' hit MAX_NUMNODES? AFAICS it can. interleave_nid() uses this and the nid is then used e.g. in node_zonelist() where it's used for NODE_DATA(nid). That's quite scary. It also predates git. Why don't we see crashes or KASAN finding this? > > And can someone please explain mem_cgroup_select_victim_node() to me? > How can we hit the "node = numa_node_id()" path? Only if > memcg->scan_nodes is empty? is that even valid? The comment seems to > have not much to do with the code? I understand the comment that it's valid to be empty and the comment lists reasons why that can happen (with somewhat broken language). Note that I didn't verify these reasons: - we call this when hitting memcg limit, not when adding pages to LRU, as adding to LRU means it would contain the given LRU's node - adding to unevictable LRU means it's not added to scan_nodes (probably because scanning unevictable lru would be useless) - for other reasons (which?) it might have pages not on LRU and it's so small there are no other pages that would be on LRU > mpol_rebind_nodemask() is similar. > > > > Something like this? > > > From: Andrew Morton > Subject: include/linux/nodemask.h: create next_node_in() helper > > Lots of code does > > node = next_node(node, XXX); > if (node == MAX_NUMNODES) > node = first_node(XXX); > > so create next_node_in() to do this and use it in various places. > > Cc: Xishi Qiu > Cc: Vlastimil Babka Acked-by: Vlastimil Babka Patch doesn't address offset_il_node() which is good, because if it's indeed buggy, it's serious and needs a non-cleanup patch.