Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763124Ab3ECKup (ORCPT ); Fri, 3 May 2013 06:50:45 -0400 Received: from mail-bk0-f48.google.com ([209.85.214.48]:38987 "EHLO mail-bk0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762184Ab3ECKun (ORCPT ); Fri, 3 May 2013 06:50:43 -0400 Date: Fri, 3 May 2013 12:50:37 +0200 From: Vasilis Liaskovitis To: Tang Chen Cc: mingo@redhat.com, hpa@zytor.com, akpm@linux-foundation.org, yinghai@kernel.org, jiang.liu@huawei.com, wency@cn.fujitsu.com, isimatu.yasuaki@jp.fujitsu.com, tj@kernel.org, laijs@cn.fujitsu.com, davem@davemloft.net, mgorman@suse.de, minchan@kernel.org, mina86@mina86.com, x86@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory. Message-ID: <20130503105037.GA4533@dhcp-192-168-178-175.profitbricks.localdomain> References: <1367313683-10267-1-git-send-email-tangchen@cn.fujitsu.com> <1367313683-10267-11-git-send-email-tangchen@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1367313683-10267-11-git-send-email-tangchen@cn.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3453 Lines: 91 Hi, On Tue, Apr 30, 2013 at 05:21:20PM +0800, Tang Chen wrote: > We mark out movable memory ranges and reserve them with MEMBLK_HOTPLUGGABLE flag in > memblock.reserved. This should be done after the memory mapping is initialized > because the kernel now supports allocate pagetable pages on local node, which > are kernel pages. > > The reserved hotpluggable will be freed to buddy when memory initialization > is done. > > This idea is from Wen Congyang and Jiang Liu . > > Suggested-by: Jiang Liu > Suggested-by: Wen Congyang > Signed-off-by: Tang Chen > --- > arch/x86/mm/numa.c | 28 ++++++++++++++++++++++++++++ > include/linux/memblock.h | 3 +++ > mm/memblock.c | 19 +++++++++++++++++++ > 3 files changed, 50 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index 1367fe4..a1f1f90 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -731,6 +731,32 @@ static void __init early_x86_numa_init_mapping(void) > } > #endif > > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > +static void __init early_mem_hotplug_init() > +{ > + int i, nid; > + phys_addr_t start, end; > + > + if (!movablecore_enable_srat) > + return; > + > + for (i = 0; i < numa_meminfo.nr_blks; i++) { > + if (!numa_meminfo.blk[i].hotpluggable) > + continue; > + > + nid = numa_meminfo.blk[i].nid; Should we skip ranges on nodes that the kernel uses? e.g. with if (memblock_is_kernel_node(nid)) continue; > + start = numa_meminfo.blk[i].start; > + end = numa_meminfo.blk[i].end; > + > + memblock_reserve_hotpluggable(start, end - start, nid); > + } > +} - I am getting a "PANIC: early exception" when rebooting with movablecore=acpi after hotplugging memory on node0 or node1 of a 2-node VM. The guest kernel is based on git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm (e9058baf) + these v2 patches. This happens with or without the above memblock_is_kernel_node(nid) check. Perhaps I am missing something or I need a newer "ACPI, numa: Parse numa info early" patch-set? A general question: Disabling hot-pluggability/zone-movable eligibility for a whole node sounds a bit inflexible, if the machine only has one node to begin with. Would it be possible to keep movable information per SRAT entry? I.e if the BIOS presents multiple SRAT entries for one node/PXM (say node 0), and there is no memblock/kernel allocation on one of these SRAT entries, could we still mark this SRAT entry's range as hot-pluggable/movable? Not sure if many real machine BIOSes would do this, but seabios could. This implies that SRAT entries are processed for movable-zone eligilibity before they are merged on node/PXM basis entry-granularity (I think numa_cleanup_meminfo currently does this merge). Of course the kernel should still have enough memory(i.e. non movable zone) to boot. Can we ensure that at least certain amount of memory is non-movable, and then, given more separate SRAT entries for node0 not used by kernel, treat these rest entries as movable? thanks, - Vasilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/