Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1312495imm; Tue, 3 Jul 2018 09:00:41 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ8/TFpDHWCNsCsKL4mryk83oXlkuL8NFQdjgrX6xbjeptaljukd8Q2yr0WeWvff96h7sCP X-Received: by 2002:a63:2c0d:: with SMTP id s13-v6mr26621123pgs.37.1530633641242; Tue, 03 Jul 2018 09:00:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530633641; cv=none; d=google.com; s=arc-20160816; b=VxtNp5z92OrIstKKWgvwQIuuUQXheZHbdfQKw1wRTZaRULGohxT93fadyUDEudqtgL ZHNqSkUtbdTfMulnCA3NW6Dl0V/If+K4VXBCwFL2djP5SQDBPfnJQOnuecfjxTmFdBeM YcOm2GtR50w+j0cDhqqt20a4slnGw++GuFsHbW4x15SMndwt6QAXSHrol2BMDJy8CVsQ Qg7fJ8T1HHiuAOEqq1al7N8AEl0oT8CEe9vq2q0SvAJPQt+sfez5oieQZgDmyeSHki0E pZ9j4flmkpTn1CJGBCfRSwYhhhCVdeGWFjJ1yVfY81jNhA07nnCP0X+7YOZ0hLF5eQYn phtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:mime-version:robot-unsubscribe:robot-id :git-commit-id:subject:to:references:in-reply-to:reply-to:cc :message-id:from:date:arc-authentication-results; bh=kBl2L+Nre8pQdzcbpF0OuZU8J3lpVG6UPPluFe5vqEc=; b=R9lx/K0Ys5oPvfQvVHnJaomEnSzYK6lLreC4lpZAfhoAbXjlHatPx/R3TR4e0hRZIw VeuYW49rei+iUH+b9URG+Lze4Yw0g2sTm5pi/sQUW55yvrb1eq35VRcS588idbh1Un83 77q9iYtPcWXPieMJrhPc04WUK+XEND3ZwyfA3X/ISFnPQXJZFB7SfIuogZ2rtqd0+teA fxdbHzwPtioDNf4WD3u/YiHCp3kO2Q6VDwA6YM475/kDMilRFEAxnqrZv54LuchJ7Q1D bwTQfUyWMf7O6GfUpBQdUzpcwD0R4KyHWak9e+4VRdJtLNYSkglP3PZjX0YMC1ExmgAK iLwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 5-v6si1353383plx.517.2018.07.03.09.00.26; Tue, 03 Jul 2018 09:00:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933609AbeGCP7N (ORCPT + 99 others); Tue, 3 Jul 2018 11:59:13 -0400 Received: from terminus.zytor.com ([198.137.202.136]:37793 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932747AbeGCP7K (ORCPT ); Tue, 3 Jul 2018 11:59:10 -0400 Received: from terminus.zytor.com (localhost [127.0.0.1]) by terminus.zytor.com (8.15.2/8.15.2) with ESMTPS id w63FwaUL338437 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 3 Jul 2018 08:58:36 -0700 Received: (from tipbot@localhost) by terminus.zytor.com (8.15.2/8.15.2/Submit) id w63FwaQ1338434; Tue, 3 Jul 2018 08:58:36 -0700 Date: Tue, 3 Jul 2018 08:58:36 -0700 X-Authentication-Warning: terminus.zytor.com: tipbot set sender to tipbot@zytor.com using -f From: tip-bot for Baoquan He Message-ID: Cc: mingo@kernel.org, hpa@zytor.com, bhe@redhat.com, tglx@linutronix.de, peterz@infradead.org, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org Reply-To: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, peterz@infradead.org, tglx@linutronix.de, bhe@redhat.com, hpa@zytor.com, mingo@kernel.org In-Reply-To: <20180625031656.12443-3-bhe@redhat.com> References: <20180625031656.12443-3-bhe@redhat.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:x86/boot] x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR) Git-Commit-ID: 747ff6265db4c2b77e8c7384f8054916a0c1eb39 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, DATE_IN_FUTURE_96_Q autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on terminus.zytor.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 747ff6265db4c2b77e8c7384f8054916a0c1eb39 Gitweb: https://git.kernel.org/tip/747ff6265db4c2b77e8c7384f8054916a0c1eb39 Author: Baoquan He AuthorDate: Mon, 25 Jun 2018 11:16:56 +0800 Committer: Ingo Molnar CommitDate: Tue, 3 Jul 2018 10:50:13 +0200 x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR) When KASLR is enabled then 1GB huge pages allocations might regress sporadically. To reproduce on a KVM guest with 4GB RAM: - add the following options to the kernel command-line: 'default_hugepagesz=1G hugepagesz=1G hugepages=1' - boot the guest and check number of 1GB pages reserved: # grep HugePages_Total /proc/meminfo - sporadically, every couple of bootups the output of this command shows that when booting with "nokaslr" HugePages_Total is always 1, while booting without "nokaslr" sometimes HugePages_Total is set as 0 (that is, reserving the 1GB page failed). Note that you may need to boot a few times to trigger the issue, because it's somewhat non-deterministic. The root cause is that kernel may be put into the only good 1GB huge page in the [0x40000000, 0x7fffffff] physical range randomly. Below is the dmesg output snippet from the KVM guest. We can see that only [0x40000000, 0x7fffffff] region is good 1GB huge page, [0x100000000, 0x13fffffff] will be touched by the memblock top-down allocation: [...] e820: BIOS-provided physical RAM map: [...] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [...] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [...] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved [...] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable [...] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved [...] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved [...] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved [...] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable Besides, on bare-metal machines with larger memory, one less 1GB huge page might be available with KASLR enabled. That too is because the kernel image might be randomized into those "good" 1GB huge pages. To fix this, firstly parse the kernel command-line to get how many 1GB huge pages are specified. Then try to skip the specified number of 1GB huge pages when decide which memory region kernel can be randomized into. Also change the name of handle_mem_memmap() as handle_mem_options() since it handles not only 'mem=' and 'memmap=', but also 'hugepagesxxx' now. Signed-off-by: Baoquan He Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: douly.fnst@cn.fujitsu.com Cc: fanc.fnst@cn.fujitsu.com Cc: indou.takao@jp.fujitsu.com Cc: keescook@chromium.org Cc: lcapitulino@redhat.com Cc: yasu.isimatu@gmail.com Link: http://lkml.kernel.org/r/20180625031656.12443-3-bhe@redhat.com [ Rewrote the changelog, fixed style problems in the code. ] Signed-off-by: Ingo Molnar --- arch/x86/boot/compressed/kaslr.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index d97647b5ffb7..531c9876f573 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -244,7 +244,7 @@ static void parse_gb_huge_pages(char *param, char *val) } -static int handle_mem_memmap(void) +static int handle_mem_options(void) { char *args = (char *)get_cmd_line_ptr(); size_t len = strlen((char *)args); @@ -252,7 +252,8 @@ static int handle_mem_memmap(void) char *param, *val; u64 mem_size; - if (!strstr(args, "memmap=") && !strstr(args, "mem=")) + if (!strstr(args, "memmap=") && !strstr(args, "mem=") && + !strstr(args, "hugepages")) return 0; tmp_cmdline = malloc(len + 1); @@ -277,6 +278,8 @@ static int handle_mem_memmap(void) if (!strcmp(param, "memmap")) { mem_avoid_memmap(val); + } else if (strstr(param, "hugepages")) { + parse_gb_huge_pages(param, val); } else if (!strcmp(param, "mem")) { char *p = val; @@ -416,7 +419,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, /* We don't need to set a mapping for setup_data. */ /* Mark the memmap regions we need to avoid */ - handle_mem_memmap(); + handle_mem_options(); #ifdef CONFIG_X86_VERBOSE_BOOTUP /* Make sure video RAM can be used. */ @@ -629,7 +632,7 @@ static void process_mem_region(struct mem_vector *entry, /* If nothing overlaps, store the region and return. */ if (!mem_avoid_overlap(®ion, &overlap)) { - store_slot_info(®ion, image_size); + process_gb_huge_pages(®ion, image_size); return; } @@ -639,7 +642,7 @@ static void process_mem_region(struct mem_vector *entry, beginning.start = region.start; beginning.size = overlap.start - region.start; - store_slot_info(&beginning, image_size); + process_gb_huge_pages(&beginning, image_size); } /* Return if overlap extends to or past end of region. */