Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752101AbaBNFrh (ORCPT ); Fri, 14 Feb 2014 00:47:37 -0500 Received: from e7.ny.us.ibm.com ([32.97.182.137]:39879 "EHLO e7.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751760AbaBNFrg (ORCPT ); Fri, 14 Feb 2014 00:47:36 -0500 Date: Thu, 13 Feb 2014 21:47:24 -0800 From: Nishanth Aravamudan To: David Rientjes Cc: Raghavendra K T , Andrew Morton , Fengguang Wu , David Cohen , Al Viro , Damien Ramonda , Jan Kara , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no local memory and limit readahead pages Message-ID: <20140214054724.GA24329@linux.vnet.ibm.com> References: <52F4B8A4.70405@linux.vnet.ibm.com> <52F88C16.70204@linux.vnet.ibm.com> <52F8C556.6090006@linux.vnet.ibm.com> <52FC6F2A.30905@linux.vnet.ibm.com> <52FC98A6.1000701@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: Linux 3.11.0-15-generic (x86_64) User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14021405-5806-0000-0000-0000242246D9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13.02.2014 [14:41:04 -0800], David Rientjes wrote: > On Thu, 13 Feb 2014, Raghavendra K T wrote: > > > Thanks David, unfortunately even after applying that patch, I do not see > > the improvement. > > > > Interestingly numa_mem_id() seem to still return the value of a > > memoryless node. > > May be per cpu _numa_mem_ values are not set properly. Need to dig out .... > > > > I believe ppc will be relying on __build_all_zonelists() to set > numa_mem_id() to be the proper node, and that relies on the ordering of > the zonelist built for the memoryless node. It would be very strange if > local_memory_node() is returning a memoryless node because it is the first > zone for node_zonelist(GFP_KERNEL) (why would a memoryless node be on the > zonelist at all?). > > I think the real problem is that build_all_zonelists() is only called at > init when the boot cpu is online so it's only setting numa_mem_id() > properly for the boot cpu. Does it return a node with memory if you > toggle /proc/sys/vm/numa_zonelist_order? Do > > echo node > /proc/sys/vm/numa_zonelist_order > echo zone > /proc/sys/vm/numa_zonelist_order > echo default > /proc/sys/vm/numa_zonelist_order > > and check if it returns the proper value at either point. This will force > build_all_zonelists() and numa_mem_id() to point to the proper node since > all cpus are now online. Yep, after massaging the code to allow CONFIG_USE_PERCPU_NUMA_NODE_ID, you're right that the memory node is wrong. The cpu node is right (they are all on node 0), but that could be lucky. The memory node is right for the boot cpu. I did notice that some CPUs now think the cpu node is 1, which is wrong. > So the prerequisite for CONFIG_HAVE_MEMORYLESS_NODES is that there is an > arch-specific set_numa_mem() that makes this mapping correct like ia64 > does. If that's the case, then it's (1) completely undocumented and (2) > Nishanth's patch is incomplete because anything that adds > CONFIG_HAVE_MEMORYLESS_NODES needs to do the proper set_numa_mem() for it > to be any different than numa_node_id(). I'll work on getting the set_numa_mem() and set_numa_node() correct for powerpc. Thanks, Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/