Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751427AbeAEC6z (ORCPT + 1 other); Thu, 4 Jan 2018 21:58:55 -0500 Received: from mail.cn.fujitsu.com ([183.91.158.132]:29045 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751208AbeAEC6y (ORCPT ); Thu, 4 Jan 2018 21:58:54 -0500 X-IronPort-AV: E=Sophos;i="5.43,368,1503331200"; d="scan'208";a="34652635" Date: Fri, 5 Jan 2018 10:58:11 +0800 From: Chao Fan To: Baoquan He CC: , , , , , , , , , , Subject: Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory Message-ID: <20180105025810.GA7714@localhost.localdomain> References: <20180104080219.23893-1-fanc.fnst@cn.fujitsu.com> <20180104080219.23893-2-fanc.fnst@cn.fujitsu.com> <20180104103057.GC7235@x1> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20180104103057.GC7235@x1> User-Agent: Mutt/1.9.1 (2017-09-22) X-Originating-IP: [10.167.225.56] X-yoursite-MailScanner-ID: 112F8486A77F.AB00E X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: fanc.fnst@cn.fujitsu.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 04, 2018 at 06:30:57PM +0800, Baoquan He wrote: >On 01/04/18 at 04:02pm, Chao Fan wrote: >> In current code, kaslr may choose the memory region in movable >> nodes to extract kernel, which will make the nodes can't be hot-removed. >> To solve it, we can specify the memory region in immovable node. >> Create immovable_mem to store the regions in immovable_mem, where should >> be chosen by kaslr. >> >> Also change the "handle_mem_memmap" to "handle_mem_filter", since >> it will not only handle memmap parameter now. >> Since "immovable_mem=" only works with "movable_node", so "immovable_mem=" >> doesn't work alone. If specify "movable_node" without "immovable_mem=", >> disable KASLR. >> >> Multiple regions can be specified, comma delimited. >> Considering the usage of memory, only support for 4 regions. >> 4 regions contains 2 nodes at least, enough for kernel to extract. >> >> Signed-off-by: Chao Fan > >Hi Chao, > >Thanks for your effort on this issue. > >Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest. >Please check the below bug information. There's only one available >position which hugepage can use to allocate. In this case, if we have a >generic parameter to tell kernel where we can randomize into, this >hugepage issue can be solved. We can restrict kernel to randomize beyond >[0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG] >can be adjusted to do this. I am hesitating on whether we should change >this or not. > Hi Baoquan, Luiz, In my personal understanding, there is only one region, [0x40000000, 0x7fffffff] suitable for the 1G page, so we should avoid kaslr to choose this region, right? If my understanding is right, I think it's more similar with mem_avoid. Because we specify where KASLR *choose* in "immovable_mem=", we specify where KASLR *avoid* in "mem_avoid". So I wonder if it's OK to expand mem_avoid, and add a member like MEM_AVOID_HUGEPAGE in "enum mem_avoid_index". But there is a disadvantage, we can only specify the limited regions. Luiz, I am not familiar with HUGE PAGE, I wonder how many 1G HUGE pages does system need in general? We may need to limit it in 2, or 4. Thanks, Chao Fan >Hi maintainers, Kees, > >1) Let's keep Chao's current code, just ask Luiz to use nokaslr to work >around the hugepage allocation failure; >2) Change immovable_mem=nn[KMG]@ss[KMG] to be a generic parameter, >people can use it to restrict kernel to places they want. > >Which one is better? Or any other idea or suggestion? > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Luiz said on kvm guest, they add the following to the kernel command-line: >default_hugepagesz=1G hugepagesz=1G hugepages=1 > >Boot the guest and check number of 1GB pages reserved: > >grep HugePages_Total /proc/meminfo > >When booting with "nokaslr" HugePages_Total is always 1. When booting >without "nokaslr" sometimes HugePages_Total is zero (that is, reserving >the 1GB page fails). > >And 20 reboots, 6 failures to mount single 1G hugepage. > >I reproduced on kvm guest with Luiz's help, and found it's because there's >only one available position for 1G hugepage allocation, [0x40000000, 0x7fffffff]. >That's why they saw 1/4 possibility of failure. If kernel randomized to >[0x0, 0x3fffffff], [0x80000000, 0xbffdffff], or [0x100000000, >0x13fffffff], hugepage can always succeed to allocate. > >dmesg output snippet of kvm guest: >[ +0.000000] e820: BIOS-provided physical RAM map: >[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable >[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved >[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved >[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable >[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved >[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved >[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved >[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable > >Thanks >Baoquan > >> --- >> arch/x86/boot/compressed/kaslr.c | 112 +++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 109 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c >> index 8199a6187251..60e5aa28b510 100644 >> --- a/arch/x86/boot/compressed/kaslr.c >> +++ b/arch/x86/boot/compressed/kaslr.c >> @@ -108,6 +108,19 @@ enum mem_avoid_index { >> >> static struct mem_vector mem_avoid[MEM_AVOID_MAX]; >> >> +#ifdef CONFIG_MEMORY_HOTPLUG >> +/* Only supporting at most 4 immovable memory regions with kaslr */ >> +#define MAX_IMMOVABLE_MEM 4 >> + >> +static bool lack_immovable_mem; >> + >> +/* Store the memory regions in immovable node */ >> +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM]; >> + >> +/* The immovable regions user specify, not more than 4 */ >> +static int num_immovable_region; >> +#endif >> + >> static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two) >> { >> /* Item one is entirely before item two. */ >> @@ -206,15 +219,97 @@ static void mem_avoid_memmap(char *str) >> memmap_too_large = true; >> } >> >> -static int handle_mem_memmap(void) >> +#ifdef CONFIG_MEMORY_HOTPLUG >> +static int parse_immovable_mem(char *p, >> + unsigned long long *start, >> + unsigned long long *size) >> +{ >> + char *oldp; >> + >> + if (!p) >> + return -EINVAL; >> + >> + oldp = p; >> + *size = memparse(p, &p); >> + if (p == oldp) >> + return -EINVAL; >> + >> + switch (*p) { >> + case '@': >> + *start = memparse(p + 1, &p); >> + return 0; >> + default: >> + /* >> + * If w/o offset, only size specified, immovable_mem=nn[KMG] >> + * has the same behaviour as immovable_mem=nn[KMG]@0. It means >> + * the region starts from 0. >> + */ >> + *start = 0; >> + return 0; >> + } >> + >> + return -EINVAL; >> +} >> + >> +static void parse_immovable_mem_regions(char *str) >> +{ >> + static int i; >> + >> + while (str && (i < MAX_IMMOVABLE_MEM)) { >> + int rc; >> + unsigned long long start, size; >> + char *k = strchr(str, ','); >> + >> + if (k) >> + *k++ = 0; >> + >> + rc = parse_immovable_mem(str, &start, &size); >> + if (rc < 0) >> + break; >> + str = k; >> + >> + immovable_mem[i].start = start; >> + immovable_mem[i].size = size; >> + i++; >> + } >> + num_immovable_region = i; >> +} >> +#else >> +static inline void parse_immovable_mem_regions(char *str) >> +{ >> +} >> +#endif >> + >> +static int handle_mem_filter(void) >> { >> char *args = (char *)get_cmd_line_ptr(); >> size_t len = strlen((char *)args); >> + bool enable_movable_node = false; >> char *tmp_cmdline; >> char *param, *val; >> u64 mem_size; >> >> - if (!strstr(args, "memmap=") && !strstr(args, "mem=")) >> +#ifdef CONFIG_MEMORY_HOTPLUG >> + if (strstr(args, "movable_node")) { >> + /* >> + * Confirm "movable_node" specified, otherwise >> + * "immovable_mem=" doesn't work. >> + */ >> + enable_movable_node = true; >> + >> + /* >> + * If only specify "movable_node" without "immovable_mem=", >> + * disable KASLR. >> + */ >> + if (!strstr(args, "immovable_mem=")) { >> + lack_immovable_mem = true; >> + return 0; >> + } >> + } >> +#endif >> + >> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") && >> + !enable_movable_node) >> return 0; >> >> tmp_cmdline = malloc(len + 1); >> @@ -239,6 +334,9 @@ static int handle_mem_memmap(void) >> >> if (!strcmp(param, "memmap")) { >> mem_avoid_memmap(val); >> + } else if (!strcmp(param, "immovable_mem=") && >> + enable_movable_node) { >> + parse_immovable_mem_regions(val); >> } else if (!strcmp(param, "mem")) { >> char *p = val; >> >> @@ -378,7 +476,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, >> /* We don't need to set a mapping for setup_data. */ >> >> /* Mark the memmap regions we need to avoid */ >> - handle_mem_memmap(); >> + handle_mem_filter(); >> >> #ifdef CONFIG_X86_VERBOSE_BOOTUP >> /* Make sure video RAM can be used. */ >> @@ -673,6 +771,14 @@ static unsigned long find_random_phys_addr(unsigned long minimum, >> return 0; >> } >> >> +#ifdef CONFIG_MEMORY_HOTPLUG >> + /* Check if specify "movable_node" without "immovable_mem=". */ >> + if (lack_immovable_mem) { >> + debug_putstr("Fail KASLR when movable_node specified without immovable_mem=.\n"); >> + return 0; >> + } >> +#endif >> + >> /* Make sure minimum is aligned. */ >> minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN); >> >> -- >> 2.14.3 >> >> >> > >