Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932872AbbHIGRF (ORCPT ); Sun, 9 Aug 2015 02:17:05 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:26798 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752929AbbHIGRC (ORCPT ); Sun, 9 Aug 2015 02:17:02 -0400 X-IronPort-AV: E=Sophos;i="5.15,520,1432569600"; d="scan'208";a="99429937" Message-ID: <55C6EFFF.5070605@cn.fujitsu.com> Date: Sun, 9 Aug 2015 14:15:27 +0800 From: Tang Chen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Jiang Liu , Tejun Heo CC: , , , , , , , , , , , , , , Subject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation. References: <1436261425-29881-1-git-send-email-tangchen@cn.fujitsu.com> <1436261425-29881-2-git-send-email-tangchen@cn.fujitsu.com> <20150715214802.GL15934@mtj.duckdns.org> <55C03332.2030808@cn.fujitsu.com> <55C0725B.80201@linux.intel.com> In-Reply-To: <55C0725B.80201@linux.intel.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4776 Lines: 167 Hi Liu, Have you posted your new patches ? (I mean memory-less node support patches.) If you are going to post them, please cc me. And BTW, how did you reproduce the memory-less node problem ? Do you have a real memory-less node on your machine ? Thanks. :) On 08/04/2015 04:05 PM, Jiang Liu wrote: > On 2015/8/4 11:36, Tang Chen wrote: >> Hi TJ, >> >> Sorry for the late reply. >> >> On 07/16/2015 05:48 AM, Tejun Heo wrote: >>> ...... >>> so in initialization pharse makes no sense any more. The best near online >>> node for each cpu should be cached somewhere. >>> I'm not really following. Is this because the now offline node can >>> later come online and we'd have to break the constant mapping >>> invariant if we update the mapping later? If so, it'd be nice to >>> spell that out. >> Yes. Will document this in the next version. >> >>>> ...... >>>> +int get_near_online_node(int node) >>>> +{ >>>> + return per_cpu(x86_cpu_to_near_online_node, >>>> + cpumask_first(&node_to_cpuid_mask_map[node])); >>>> +} >>>> +EXPORT_SYMBOL(get_near_online_node); >>> Umm... this function is sitting on a fairly hot path and scanning a >>> cpumask each time. Why not just build a numa node -> numa node array? >> Indeed. Will avoid to scan a cpumask. >> >>> ...... >>> >>>> static inline struct page *alloc_pages_exact_node(int nid, gfp_t >>>> gfp_mask, >>>> unsigned int order) >>>> { >>>> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); >>>> + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); >>>> + >>>> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA) >>>> + if (!node_online(nid)) >>>> + nid = get_near_online_node(nid); >>>> +#endif >>>> return __alloc_pages(gfp_mask, order, node_zonelist(nid, >>>> gfp_mask)); >>>> } >>> Ditto. Also, what's the synchronization rules for NUMA node >>> on/offlining. If you end up updating the mapping later, how would >>> that be synchronized against the above usages? >> I think the near online node map should be updated when node online/offline >> happens. But about this, I think the current numa code has a little >> problem. >> >> As you know, firmware info binds a set of CPUs and memory to a node. But >> at boot time, if the node has no memory (a memory-less node) , it won't >> be online. >> But the CPUs on that node is available, and bound to the near online node. >> (Here, I mean numa_set_node(cpu, node).) >> >> Why does the kernel do this ? I think it is used to ensure that we can >> allocate memory >> successfully by calling functions like alloc_pages_node() and >> alloc_pages_exact_node(). >> By these two fuctions, any CPU should be bound to a node who has memory >> so that >> memory allocation can be successful. >> >> That means, for a memory-less node at boot time, CPUs on the node is >> online, >> but the node is not online. >> >> That also means, "the node is online" equals to "the node has memory". >> Actually, there >> are a lot of code in the kernel is using this rule. >> >> >> But, >> 1) in cpu_up(), it will try to online a node, and it doesn't check if >> the node has memory. >> 2) in try_offline_node(), it offlines CPUs first, and then the memory. >> >> This behavior looks a little wired, or let's say it is ambiguous. It >> seems that a NUMA node >> consists of CPUs and memory. So if the CPUs are online, the node should >> be online. > Hi Chen, > I have posted a patch set to enable memoryless node on x86, > will repost it for review:) Hope it help to solve this issue. > Thanks! > Gerry > >> And also, >> The main purpose of this patch-set is to make the cpuid <-> nodeid >> mapping persistent. >> After this patch-set, alloc_pages_node() and alloc_pages_exact_node() >> won't depend on >> cpuid <-> nodeid mapping any more. So the node should be online if the >> CPUs on it are >> online. Otherwise, we cannot setup interfaces of CPUs under /sys. >> >> >> Unfortunately, since I don't have a machine a with memory-less node, I >> cannot reproduce >> the problem right now. >> >> How do you think the node online behavior should be changed ? >> >> Thanks. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/