Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2811458imm; Wed, 16 May 2018 21:04:40 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqRlpfln8ipfANFfwIpgnBWuAiP/4coP7yW6VhT3Ru+bEs2dx43LMd2f8oHVC2GIvAMGvyv X-Received: by 2002:a62:9c93:: with SMTP id u19-v6mr3717219pfk.74.1526529880761; Wed, 16 May 2018 21:04:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526529880; cv=none; d=google.com; s=arc-20160816; b=MCA4/dVhRw6COuMa5fU0Q1YhOgCWljGck87FwfQCwVxbDl0tPk9cfyCsYBkw0TI3y/ TEBQN/n8smmqcE0u1WSFsZ7wOIeNTEpvkKvPNJwJhnHLnlrH3zi99Giz2sr/ntJOg8oj SE7LxHlSYtPSZdqy9rQqCBa9Rai1TEEaaWk4S+swVW5sOF1KGgy5h47Expi15jzdi2yZ mU/n+ed+WUOxfVuqps8E3Z0NmFLXHPUl0dES+Cibf1c8wmTW15sUFVW09QwZ0AvITzAO o4bN1VF/DEeU/CR/bG85nvSqSDbHQOxmkdTKMf8c7cK2UcUuID7G1ezAGzUGNeGTpjXu IT6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=ufMgFQMu/bnivZAZOH3b8O19A1TI64yEIzrPnIc4+x0=; b=aNDl5+5wYTgQ4dSt098pNOLSR0D2hYYA8H4G7LK7MQCh4r1txEnJ3Td8znDz5/tVTM rzOMKbRYWbhImr4h/+uCVBR29G8gSuRoRBSG2tjbkNPfWJOTfIQIIJnJJjCZU4ShZLFJ qZfr4QGFqBLeEcNsiiMj5xBxW40OUwFA79Y9ybJiKHFnWQKNr91KxxOYgl3R6S1pKfJN bKvLN3htIDLmacVoRIr8zkDdGkPLEL4jetezCQE+q28M2HoS2Lc7ZcLQeQ1Pu3SjoRW5 9rHa6Tz5ZIgzvBOIutZvXphTYbo+gUkk0Lzqkd6qN95Ee7W5KNhWeGciXeSfjjlH563r 5WZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y12-v6si4039427pfl.283.2018.05.16.21.04.26; Wed, 16 May 2018 21:04:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751850AbeEQEDv (ORCPT + 99 others); Thu, 17 May 2018 00:03:51 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49426 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750723AbeEQEDu (ORCPT ); Thu, 17 May 2018 00:03:50 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 447C0407049F; Thu, 17 May 2018 04:03:50 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 407A610B2B45; Thu, 17 May 2018 04:03:46 +0000 (UTC) Date: Thu, 17 May 2018 12:03:43 +0800 From: Baoquan He To: Chao Fan Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, lcapitulino@redhat.com, keescook@chromium.org, tglx@linutronix.de, x86@kernel.org, hpa@zytor.com, yasu.isimatu@gmail.com, indou.takao@jp.fujitsu.com, douly.fnst@cn.fujitsu.com Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling Message-ID: <20180517040343.GL24627@MiWiFi-R3L-srv> References: <20180516100532.14083-1-bhe@redhat.com> <20180516100532.14083-2-bhe@redhat.com> <20180517032702.GA6521@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180517032702.GA6521@localhost.localdomain> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 17 May 2018 04:03:50 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 17 May 2018 04:03:50 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'bhe@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Chao, On 05/17/18 at 11:27am, Chao Fan wrote: > >+/* Store the number of 1GB huge pages which user specified.*/ > >+static unsigned long max_gb_huge_pages; > >+ > >+static int parse_gb_huge_pages(char *param, char* val) > >+{ > >+ char *p; > >+ u64 mem_size; > >+ static bool gbpage_sz = false; > >+ > >+ if (!strcmp(param, "hugepagesz")) { > >+ p = val; > >+ mem_size = memparse(p, &p); > >+ if (mem_size == PUD_SIZE) { > >+ if (gbpage_sz) > >+ warn("Repeadly set hugeTLB page size of 1G!\n"); > >+ gbpage_sz = true; > >+ } else > >+ gbpage_sz = false; > >+ } else if (!strcmp(param, "hugepages") && gbpage_sz) { > >+ p = val; > >+ max_gb_huge_pages = simple_strtoull(p, &p, 0); > >+ debug_putaddr(max_gb_huge_pages); > >+ } > >+} > >+ > >+ > > static int handle_mem_memmap(void) > > { > > char *args = (char *)get_cmd_line_ptr(); > >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size) > > } > > } > > > >+/* Skip as many 1GB huge pages as possible in the passed region. */ > >+static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size) > >+{ > >+ int i = 0; > >+ unsigned long addr, size; > >+ struct mem_vector tmp; > >+ > >+ if (!max_gb_huge_pages) { > >+ store_slot_info(region, image_size); > >+ return; > >+ } > >+ > >+ addr = ALIGN(region->start, PUD_SIZE); > >+ /* If Did we raise the address above the passed in memory entry? */ > >+ if (addr < region->start + region->size) > >+ size = region->size - (addr - region->start); > >+ > >+ /* Check how many 1GB huge pages can be filtered out*/ > >+ while (size > PUD_SIZE && max_gb_huge_pages) { > >+ size -= PUD_SIZE; > >+ max_gb_huge_pages--; > > The global variable 'max_gb_huge_pages' means how many huge pages > user specified when you get it from command line. > But here, everytime we find a position which is good for huge page > allocation, the 'max_gdb_huge_page' decreased. So in my understanding, > it is used to store how many huge pages that we still need to search memory > for good slots to filter out, right? > If it's right, maybe the name 'max_gb_huge_pages' is not very suitable. > If my understanding is wrong, please tell me. No, you have understood it very right. I finished the draft patch last week, but changed this variable name and the function names several time, still I feel they are not good. However I can't get a better name. Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected from kernel command-line. And in this function it will be decreased. But we can't define another global variable only for decreasing in this place. And you can see that in this patchset I only take cares of 1GB huge pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why 1GB only? Because 2MB is not impacted by KASLR, please check the code in hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be pre-allocated in hugetlb_nrpages_setup(), and if you look into hugetlb_nrpages_setup(), you will find that it will call alloc_bootmem_huge_page() to allocate huge pages one by one, but not at one time. That is why I always add 'gb' in the middle of the global variable and the newly added functions. And it will answer your below questions. When walk over all memory regions, 'max_gb_huge_pages' is still not 0, what should we do? It's normal and done as expected. Here hugetlb only try its best to allocate as many as possible according to 'max_gb_huge_pages'. If can't fully satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add below to command-line: default_hugepagesz=1G hugepagesz=1G hugepages=20 Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G) and [3G,4G) are touched by bios reservation and pci/firmware reservation. Then this 14 huge pages are maximal value which is expected. It's not a bug in huge page. But with kaslr enabled, it sometime only get 13 1GB huge pages because KASLR put kernel into one of those good 1GB huge pages. This is a bug. I am not very familiar with huge page handling, just read code recently because of this kaslr bug. Hope Luiz and people from his team can help correct and clarify if anything is not right. Especially the function names, I feel it's not good, if anyone have a better idea, I will really appreciate that. > > >+ i++; > >+ } > >+ > >+ if (!i) { > >+ store_slot_info(region, image_size); > >+ return; > >+ } > >+ > >+ /* Process the remaining regions after filtering out. */ > >+ > This line may be unusable. Hmm, I made it on purpose. Because 1GB huge pages may be digged out from the middle, then the remaing head and tail regions still need be handled. I put it here to mean that it covers below two code blocks. I can remove it if people think it's not appropriate. > >+ if (addr >= region->start + image_size) { > >+ tmp.start = region->start; > >+ tmp.size = addr - region->start; > >+ store_slot_info(&tmp, image_size); > >+ } > >+ > >+ size = region->size - (addr - region->start) - i * PUD_SIZE; > >+ if (size >= image_size) { > >+ tmp.start = addr + i*PUD_SIZE; > >+ tmp.size = size; > >+ store_slot_info(&tmp, image_size); > >+ } > > I have another question not related to kaslr. > Here you try to avoid the memory from addr to (addr + i * PUD_SIZE), > but I wonder if after walking all memory regions, 'max_gb_huge_pages' > is still more than 0, which means there isn't enough memory slots for > huge page, what will happen? Please check the response at the beginning of response. Thanks Baoquan > > > >+} > >+ > > static unsigned long slots_fetch_random(void) > > { > > unsigned long slot; > >-- > >2.13.6 > > > > > > > >