Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752444AbdHJByl (ORCPT ); Wed, 9 Aug 2017 21:54:41 -0400 Received: from [183.91.158.132] ([183.91.158.132]:42561 "EHLO heian.cn.fujitsu.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752095AbdHJByk (ORCPT ); Wed, 9 Aug 2017 21:54:40 -0400 X-IronPort-AV: E=Sophos;i="5.41,349,1498492800"; d="scan'208";a="22909712" Subject: Re: [PATCH] x86/boot/KASLR: Extend movable_node option for KASLR To: YASUAKI ISHIMATSU , Baoquan He References: <1501762641-15634-1-git-send-email-douly.fnst@cn.fujitsu.com> <20170803122458.GA5913@localhost.localdomain> <20170803234901.GE1874@x1> <96b436e3-6d48-6a02-5cd4-f23c3a8de240@cn.fujitsu.com> <20170804020022.GF1874@x1> <00b0236b-01f5-5f4b-93bb-a5e510b2b4f3@cn.fujitsu.com> <20170804025540.GG1874@x1> <94f54d02-0512-fb01-b9ca-ed63e1f80bc7@cn.fujitsu.com> <0bd0153e-225d-a0a3-2b9c-f85082bb9477@gmail.com> <1c9e22e1-b639-6449-b589-7b0033315d80@gmail.com> CC: Chao Fan , , , , , , , , , , , From: Dou Liyang Message-ID: Date: Thu, 10 Aug 2017 09:54:25 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1c9e22e1-b639-6449-b589-7b0033315d80@gmail.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.106] X-yoursite-MailScanner-ID: 2CF524724E67.AC4A7 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: douly.fnst@cn.fujitsu.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10596 Lines: 278 Hi, YASUAKI At 08/10/2017 12:55 AM, YASUAKI ISHIMATSU wrote: > > > On 08/09/2017 10:44 AM, Dou Liyang wrote: >> >> Hi YASUAKI, >> >> [...] >>>>>>>> >>>>>>>> we boot up kernel with 4 node: >>>>>>>> >>>>>>>> node 0 size: 1024 MB immovable >>>>>>>> node 1 size: 1024 MB movable >>>>>>>> node 2 size: 1024 MB movable >>>>>>>> node 3 size: 1024 MB movable >>>>>>>> >>>>>>>> If we use "mem=1024M" in the command line, we just can use 1G memory. >>>>>>>> But actually, we should have 4G normally. >>>>>>> >>>>>>> So do you have assumption on the order of immovable nodes and movable >>>>>>> nodes? E.g above your example of nodes, immovable nodes have to be the >>>>>>> lowest address. Is this required by the current hot-plug memory code? >>>>>>> >>>>>> >>>>>> Wow! So great, It seems this is required by the hot-plug memory code. >>>>>> >>>>>> yesterday, I tested the patch in Qemu with 4 node and each time I >>>>>> used different node as immovable node. But no matter what node I used, >>>>>> the immovable nodes always had the lowest address. >>>>>> >>>>>> I am not familiar with memory, I am investigating this and I am going >>>>>> to apply for a physical machine with movable nodes to check. :) >>>>> >>>> >>>> Cc YASUAKI ISHIMATSU >>>> >>>> could you give us some help! >>>> >>>>> Great, thanks for your effort. I asked because this question confuses me >>>>> and I know FJ ever focusd on the memory hot-plug implementation and >>>>> continue working on that, it must be easier for you to consult your >>>>> co-workers who ever worked on this. For normal kernel, seems it has >>>>> to be that normal zone is on immovable node, namely node0. But what if >>>>> people modified bootloader to locate kernel onto the last node and >>>>> configure efi firmware to make the last node un-hot-plugable? I believe >>>>> both of these can be done. Is this allowed? memory hot-plug has a >>>>> requirement about the order of immovable node? And how many immovable >>>>> nodes can we have? I have an slides FJ published, didn't find info about >>>>> these. >>> >>> I read your patch. And I think what Baoquan wrote is right. The patch does >>> care of only your server. As he wrote, if a server wants to build immovable >>> node onto last node, the patch cannot handle such configuration. >>> >> >> Thanks for your reviewing. it is reasonable. I will keep in my mind. >> >> But, I am not sure that when we boot up a system with the following 4 >> nodes, does the BOIS(ACPI firmware) map the immovable node RAM from the >> lowest address first? >> >> node 0 size: 1024 MB immovable >> node 1 size: 1024 MB movable >> node 2 size: 1024 MB movable >> node 3 size: 1024 MB immovable >> >> the order of the physical RAM maps may be node 0, 3, 1, 2. > > > It depends on SRAT table. If system boots up with movable_node, kernel checks > hot pluggable bit of memory affinity structure in SRAT table. And if hot pluggable > bit is set, the memory will be movable. If not set, the memory will be immovable. > > If memory affinity structures in SRAT table are defined as follows, the system > sets up the configuration you mentioned. > > PXM: start : end : hot pluggable bit > 0:0x00000000000:0x0ffffffffff: disable > 1:0x10000000000:0x1ffffffffff: enable > 2:0x30000000000:0x2ffffffffff: enable > 3:0x40000000000:0x3ffffffffff: disable > > We are not sure there is such server. But there is no specification that immovable > node has to be set from lowest address. So kernel should care of such SRAT table. > Yes, this patch didn't consider this situation. It's related to the ACPI table. As I know when the ACPI firmware generates the local APIC entries in MADT, it generates enabled CPUs first and then disabled one(will be hot-plugged). I don't know whether this stratagem is also used in SRAT or not. I will validate the generation order of memory affinity structures in ACPI SRAT. Then modify this patch. Thanks, dou. > Thanks, > Yasuaki Ishimatsuu > >> >> >> Thanks, >> >> dou, >> >>> Thanks, >>> Yasuaki Ishimatsu >>> >>>>> >>>> >>>> Thanks, >>>> dou. >>>> >>>>>> >>>>>>>> >>>>>>>> Above is also one reason for why not using 'mem=' directly. Following >>>>>>>> is other reasons: >>>>>>>> >>>>>>>> 1). each kernel option has its own role, we'd better misuse them. >>>>>>>> 2). movable_node is used as a boot-time switch to make nodes movable >>>>>>>> or not, it should consider any situations, such as KASLR. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> dou. >>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 03, 2017 at 08:17:21PM +0800, Dou Liyang wrote: >>>>>>>>>>> movable_node is a boot-time switch to make hot-pluggable memory >>>>>>>>>>> NUMA nodes to be movable. This option is based on an assumption >>>>>>>>>>> that any node which the kernel resides in is defined as >>>>>>>>>>> un-hotpluggable. Linux can allocates memory near the kernel image >>>>>>>>>>> to try the best to keep the kernel away from hotpluggable memory >>>>>>>>>>> in the same NUMA node. So other nodes can be movable. >>>>>>>>>>> >>>>>>>>>>> But, KASLR doesn't know which node is un-hotpluggable, the all >>>>>>>>>>> hotpluggable memory ranges is recorded in ACPI SRAT table, SRAT >>>>>>>>>>> is not parsed. So, KASLR may randomize the kernel in a movable >>>>>>>>>>> node which will be immovable. >>>>>>>>>>> >>>>>>>>>>> Extend movable_node option to restrict kernel to be randomized in >>>>>>>>>>> immovable nodes by adding a parameter. this parameter sets up >>>>>>>>>>> the boundaries between the movable nodes and immovable nodes. >>>>>>> >>>>>>> And here you mentioned boundaries, means not only one boundary, so how >>>>>>> do you handle the case movable nodes and immovable nodes alternate to be >>>>>>> placed? >>>>>>> >>>>>>> I mean, are you sure the current hot-plug memory code require immovable >>>>>>> node has to be the first node and there's only one immovable node or >>>>>>> there are several immovable node but they are the first few nodes? >>>>>>> >>>>>>> If yes, then this patch looks good to me, I would like to ack it. >>>>>>> >>>>>>> Thanks >>>>>>> Baoquan >>>>>>> >>>>>>>>>>> >>>>>>>>>>> Reported-by: Chao Fan >>>>>>>>>>> Signed-off-by: Dou Liyang >>>>>>>>>>> --- >>>>>>>>>>> Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++-- >>>>>>>>>>> arch/x86/boot/compressed/kaslr.c | 19 ++++++++++++++++--- >>>>>>>>>>> 2 files changed, 25 insertions(+), 5 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>>>>>>>>> index d9c171c..44c7e33 100644 >>>>>>>>>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>>>>>>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>>>>>>>>> @@ -2305,7 +2305,8 @@ >>>>>>>>>>> mousedev.yres= [MOUSE] Vertical screen resolution, used for devices >>>>>>>>>>> reporting absolute coordinates, such as tablets >>>>>>>>>>> >>>>>>>>>>> - movablecore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter >>>>>>>>>>> + movablecore=nn[KMG] >>>>>>>>>>> + [KNL,X86,IA-64,PPC] This parameter >>>>>>>>>>> is similar to kernelcore except it specifies the >>>>>>>>>>> amount of memory used for migratable allocations. >>>>>>>>>>> If both kernelcore and movablecore is specified, >>>>>>>>>>> @@ -2315,12 +2316,18 @@ >>>>>>>>>>> that the amount of memory usable for all allocations >>>>>>>>>>> is not too small. >>>>>>>>>>> >>>>>>>>>>> - movable_node [KNL] Boot-time switch to make hotplugable memory >>>>>>>>>>> + movable_node [KNL] Boot-time switch to make hot-pluggable memory >>>>>>>>>>> NUMA nodes to be movable. This means that the memory >>>>>>>>>>> of such nodes will be usable only for movable >>>>>>>>>>> allocations which rules out almost all kernel >>>>>>>>>>> allocations. Use with caution! >>>>>>>>>>> >>>>>>>>>>> + movable_node=nn[KMG] >>>>>>>>>>> + [KNL] Extend movable_node to work well with KASLR. This >>>>>>>>>>> + parameter is the boundaries between the movable nodes >>>>>>>>>>> + and immovable nodes, the memory which exceeds it will >>>>>>>>>>> + be regarded as hot-pluggable. >>>>>>>>>>> + >>>>>>>>>>> MTD_Partition= [MTD] >>>>>>>>>>> Format: ,,, >>>>>>>>>>> >>>>>>>>>>> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c >>>>>>>>>>> index 91f27ab..7e2351b 100644 >>>>>>>>>>> --- a/arch/x86/boot/compressed/kaslr.c >>>>>>>>>>> +++ b/arch/x86/boot/compressed/kaslr.c >>>>>>>>>>> @@ -89,7 +89,10 @@ struct mem_vector { >>>>>>>>>>> static bool memmap_too_large; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */ >>>>>>>>>>> +/* >>>>>>>>>>> + * Store memory limit specified by the following situations: >>>>>>>>>>> + * "mem=nn[KMG]" or "memmap=nn[KMG]" or "movable_node=nn[KMG]" >>>>>>>>>>> + */ >>>>>>>>>>> unsigned long long mem_limit = ULLONG_MAX; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> @@ -212,7 +215,8 @@ static int handle_mem_memmap(void) >>>>>>>>>>> char *param, *val; >>>>>>>>>>> u64 mem_size; >>>>>>>>>>> >>>>>>>>>>> - if (!strstr(args, "memmap=") && !strstr(args, "mem=")) >>>>>>>>>>> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") && >>>>>>>>>>> + !strstr(args, "movable_node=")) >>>>>>>>>>> return 0; >>>>>>>>>>> >>>>>>>>>>> tmp_cmdline = malloc(len + 1); >>>>>>>>>>> @@ -247,7 +251,16 @@ static int handle_mem_memmap(void) >>>>>>>>>>> free(tmp_cmdline); >>>>>>>>>>> return -EINVAL; >>>>>>>>>>> } >>>>>>>>>>> - mem_limit = mem_size; >>>>>>>>>>> + mem_limit = mem_limit > mem_size ? mem_size : mem_limit; >>>>>>>>>>> + } else if (!strcmp(param, "movable_node")) { >>>>>>>>>>> + char *p = val; >>>>>>>>>>> + >>>>>>>>>>> + mem_size = memparse(p, &p); >>>>>>>>>>> + if (mem_size == 0) { >>>>>>>>>>> + free(tmp_cmdline); >>>>>>>>>>> + return -EINVAL; >>>>>>>>>>> + } >>>>>>>>>>> + mem_limit = mem_limit > mem_size ? mem_size : mem_limit; >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> 2.5.5 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > >