Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932193AbbDHXHv (ORCPT ); Wed, 8 Apr 2015 19:07:51 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:46276 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754839AbbDHXHr (ORCPT ); Wed, 8 Apr 2015 19:07:47 -0400 Date: Wed, 8 Apr 2015 16:07:40 -0700 From: Nishanth Aravamudan To: Konstantin Khlebnikov Cc: Grant Likely , devicetree@vger.kernel.org, Rob Herring , linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid() Message-ID: <20150408230740.GB53918@linux.vnet.ibm.com> References: <20150408165920.25007.6869.stgit@buzz> <55255F84.6060608@yandex-team.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55255F84.6060608@yandex-team.ru> X-Operating-System: Linux 3.13.0-40-generic (x86_64) User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15040823-0033-0000-0000-0000042BD3B8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2556 Lines: 95 On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote: > On 08.04.2015 19:59, Konstantin Khlebnikov wrote: > >Node 0 might be offline as well as any other numa node, > >in this case kernel cannot handle memory allocation and crashes. Isn't the bug that numa_node_id() returned an offline node? That shouldn't happen. #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID ... #ifndef numa_node_id /* Returns the number of the current Node. */ static inline int numa_node_id(void) { return raw_cpu_read(numa_node); } #endif ... #else /* !CONFIG_USE_PERCPU_NUMA_NODE_ID */ /* Returns the number of the current Node. */ #ifndef numa_node_id static inline int numa_node_id(void) { return cpu_to_node(raw_smp_processor_id()); } #endif ... So that's either the per-cpu numa_node value, right? Or the result of cpu_to_node on the current processor. > Example: > > [ 0.027133] ------------[ cut here ]------------ > [ 0.027938] kernel BUG at include/linux/gfp.h:322! This is VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); in alloc_pages_exact_node(). And based on the trace below, that's __slab_alloc -> alloc alloc_pages_exact_node <- alloc_slab_page <- allocate_slab <- new_slab <- new_slab_objects < __slab_alloc? which is just passing the node value down, right? Which I think was from: domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size), GFP_KERNEL, of_node_to_nid(of_node)); ? What platform is this on, looks to be x86? qemu emulation of a pathological topology? What was the topology? Note that there is a ton of code that seems to assume node 0 is online. I started working on removing this assumption myself and it just led down a rathole (on power, we always have node 0 online, even if it is memoryless and cpuless, as a result). I am guessing this is just happening early in boot before the per-cpu areas are setup? That's why (I think) x86 has the early_cpu_to_node() function... Or do you not have CONFIG_OF set? So isn't the only change necessary to the include file, and it should just return first_online_node rather than 0? Ah and there's more of those node 0 assumptions :) #define first_online_node 0 #define first_memory_node 0 if MAX_NUMODES == 1... -Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/