Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753350AbbGAI4Q (ORCPT ); Wed, 1 Jul 2015 04:56:16 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:15267 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751925AbbGAI4F (ORCPT ); Wed, 1 Jul 2015 04:56:05 -0400 Message-ID: <5593AAF4.2000405@huawei.com> Date: Wed, 1 Jul 2015 16:55:16 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Tang Chen CC: , , , , , , , , , , , , , , , , , Subject: Re: [PATCH 1/1] mem-hotplug: Handle node hole when initializing numa_meminfo. References: <1435720614-16480-1-git-send-email-tangchen@cn.fujitsu.com> <559387EF.5050701@huawei.com> <55939CF2.6080108@cn.fujitsu.com> In-Reply-To: <55939CF2.6080108@cn.fujitsu.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3172 Lines: 85 On 2015/7/1 15:55, Tang Chen wrote: > > On 07/01/2015 02:25 PM, Xishi Qiu wrote: >> On 2015/7/1 11:16, Tang Chen wrote: >> >>> When parsing SRAT, all memory ranges are added into numa_meminfo. >>> In numa_init(), before entering numa_cleanup_meminfo(), all possible >>> memory ranges are in numa_meminfo. And numa_cleanup_meminfo() removes >>> all ranges over max_pfn or empty. >>> >>> But, this only works if the nodes are continuous. Let's have a look >>> at the following example: >>> >>> We have an SRAT like this: >>> SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff] >>> SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff] >>> SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff] >>> SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug >>> SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug >>> SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug >>> SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug >>> SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug >>> SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug >>> >>> On boot, only node 0,1,2,3 exist. >>> >>> And the numa_meminfo will look like this: >>> numa_meminfo.nr_blks = 9 >>> 1. on node 0: [0, 60000000] >>> 2. on node 0: [100000000, 20000000000] >>> 3. on node 1: [20000000000, 40000000000] >>> 4. on node 4: [40000000000, 60000000000] >>> 5. on node 5: [60000000000, 80000000000] >>> 6. on node 2: [80000000000, a0000000000] >>> 7. on node 3: [a0000000000, a0800000000] >>> 8. on node 6: [c0000000000, a0800000000] >>> 9. on node 7: [e0000000000, a0800000000] >>> >>> And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because >>> the end address is over max_pfn, which is a0800000000. But 4 and 5 >>> are not removed because their end addresses are less then max_pfn. >>> But in fact, node 4 and 5 don't exist. >>> >>> In a word, numa_cleanup_meminfo() is not able to handle holes between nodes. >>> >>> Since memory ranges in node 4 and 5 are in numa_meminfo, in numa_register_memblks(), >>> node 4 and 5 will be mistakenly set to online. >>> >>> In this patch, we use memblock_overlaps_region() to check if ranges in >>> numa_meminfo overlap with ranges in memory_block. Since memory_block contains >>> all available memory at boot time, if they overlap, it means the ranges >>> exist. If not, then remove them from numa_meminfo. >>> >> Hi Tang Chen, >> >> What's the impact of this problem? >> >> Command "numactl --hard" will show an empty node(no cpu and no memory, >> but pgdat is created), right? > > On my box, if I run lscpu, the output looks like this: > > NUMA node0 CPU(s): 0-14,128-142 > NUMA node1 CPU(s): 15-29,143-157 > NUMA node2 CPU(s): > NUMA node3 CPU(s): > NUMA node4 CPU(s): 62-76,190-204 > NUMA node5 CPU(s): 78-92,206-220 > > Node 2 and 3 are not exist, but they are online. > Yes, because srat->numa_meminfo->alloc pgdat. Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/