Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753220AbcDAInB (ORCPT ); Fri, 1 Apr 2016 04:43:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:39407 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751432AbcDAIm6 (ORCPT ); Fri, 1 Apr 2016 04:42:58 -0400 Subject: Re: [PATCH] mm: fix invalid node in alloc_migrate_target() To: Andrew Morton References: <56F4E104.9090505@huawei.com> <20160325122237.4ca4e0dbca215ccbf4f49922@linux-foundation.org> <56FA7DC8.4000902@suse.cz> <56FD2285.4080600@suse.cz> <20160331140124.481acb67ef8f7356778cc4a0@linux-foundation.org> Cc: Xishi Qiu , Joonsoo Kim , David Rientjes , Naoya Horiguchi , Laura Abbott , zhuhui@xiaomi.com, wangxq10@lzu.edu.cn, Linux MM , LKML From: Vlastimil Babka Message-ID: <56FE348C.2010404@suse.cz> Date: Fri, 1 Apr 2016 10:42:52 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20160331140124.481acb67ef8f7356778cc4a0@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3465 Lines: 108 On 03/31/2016 11:01 PM, Andrew Morton wrote: > On Thu, 31 Mar 2016 15:13:41 +0200 Vlastimil Babka wrote: > >> On 03/29/2016 03:06 PM, Vlastimil Babka wrote: >> > On 03/25/2016 08:22 PM, Andrew Morton wrote: >> >> Also, mm/mempolicy.c:offset_il_node() worries me: >> >> >> >> do { >> >> nid = next_node(nid, pol->v.nodes); >> >> c++; >> >> } while (c <= target); >> >> >> >> Can't `nid' hit MAX_NUMNODES? >> > >> > AFAICS it can. interleave_nid() uses this and the nid is then used e.g. >> > in node_zonelist() where it's used for NODE_DATA(nid). That's quite >> > scary. It also predates git. Why don't we see crashes or KASAN finding this? >> >> Ah, I see. In offset_il_node(), nid is initialized to -1, and the number >> of do-while iterations calling next_node() is up to the number of bits >> set in the pol->v.nodes bitmap, so it can't reach past the last set bit >> and return MAX_NUMNODES. > > Gack. offset_il_node() should be dragged out, strangled, shot then burnt. Ah, but you went with the much less amusing alternative of just fixing it. > static unsigned offset_il_node(struct mempolicy *pol, > struct vm_area_struct *vma, unsigned long off) > { > unsigned nnodes = nodes_weight(pol->v.nodes); > unsigned target; > int c; > int nid = NUMA_NO_NODE; > > if (!nnodes) > return numa_node_id(); > target = (unsigned int)off % nnodes; > c = 0; > do { > nid = next_node(nid, pol->v.nodes); > c++; > } while (c <= target); > return nid; > } > > For starters it is relying upon next_node(-1, ...) behaving like > first_node(). Fair enough I guess, but that isn't very clear. > > static inline int __next_node(int n, const nodemask_t *srcp) > { > return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); > } > > will start from node 0 when it does the n+1. > > Also it is relying upon NUMA_NO_NODE having a value of -1. That's just > grubby - this code shouldn't "know" that NUMA_NO_NODE==-1. It would have > been better to use plain old "-1" here. Yeah looks like a blind change of all "-1" to "NUMA_NO_NODE" happened at some point. > > Does this look clearer and correct? Definitely. > /* > * Do static interleaving for a VMA with known offset @n. Returns the n'th > * node in pol->v.nodes (starting from n=0), wrapping around if n exceeds the > * number of present nodes. > */ > static unsigned offset_il_node(struct mempolicy *pol, > struct vm_area_struct *vma, unsigned long n) > { > unsigned nnodes = nodes_weight(pol->v.nodes); > unsigned target; > int i; > int nid; > > if (!nnodes) > return numa_node_id(); > target = (unsigned int)n % nnodes; > nid = first_node(pol->v.nodes); > for (i = 0; i < target; i++) > nid = next_node(nid, pol->v.nodes); > return nid; > } > > > From: Andrew Morton > Subject: mm/mempolicy.c:offset_il_node() document and clarify > > This code was pretty obscure and was relying upon obscure side-effects of > next_node(-1, ...) and was relying upon NUMA_NO_NODE being equal to -1. > > Clean that all up and document the function's intent. > > Cc: Vlastimil Babka > Cc: Xishi Qiu > Cc: Joonsoo Kim > Cc: David Rientjes > Cc: Naoya Horiguchi > Cc: Laura Abbott > Signed-off-by: Andrew Morton Acked-by: Vlastimil Babka