Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1787859imm; Wed, 16 May 2018 03:07:42 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrNoxvvDWT7+BM25xIGzlaHsQyrhAtyillvoEC9WoUl3tAVIR8/86w7+KJ2YvNfy7xXTi7o X-Received: by 2002:a17:902:301:: with SMTP id 1-v6mr258494pld.328.1526465262779; Wed, 16 May 2018 03:07:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526465262; cv=none; d=google.com; s=arc-20160816; b=Te5aloOq7lMhQ+v3r1mZqa1sKfR+e+H1weDZSUmm9LfuUZtFkgXWioFZCiUytskBsx xO2g1IYZV7Gcu54fAfaTCoLe6yilvBHTwNqzHGyrGtGrl8B1L4QCo3bzS5ho7YjOLv3u EHNRSrCfHGIbU3uUVNftyvfRbXTDERI+Uf0QQwDaVEfuIKDwP8Yr8Kd+Yj1pGrJgY4/B gkV8lwsny3z0A5a1S3JtzxPu9Bjs/el7m0CJtjNILPSKhveJGwUM6Qlxp4sR75Ho6x4R Zg0u0n8WG1jSvuUQugdnRaAeA8wzdK/O/TYZ6cWHvtrEQBI0J5lGZ+mjY7zuLv2oEdkc wRhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=Ayo0kWpfh+JR+OQfUQXHtlT43YV2ZL04Ifph67j1C+Y=; b=P6hhau8FoUtIwM5LwvlPrUv5xw6O4WKQYzho0tTx9zA4j75LvYpOq6GwXZ1rTLvMxE 0QNw3CD3uNLD4to7SCImUFeqBM6AKRksHTRzMXpSt78BlLad075mvGtKsTBH3JINpd3N gQUqfpypSwzR/jkEVzSCovz6j2M+/l+Q4/K8OfAhrWIu0chcmEPvifSb/zTtFC44qQe1 9gZZpe4XbSktAj86niqCSx0Wev16CTTd2Exazz5Ck3kiDjqrMEfKqUqsNavVN83ZMCTu nix01q9bDm6zBNSBgjvMxuK9Sg0o5Yh4nfSbYeFfmzifgYdOfuZISnaWkEEcHsFSV/7l QWTw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1-v6si2297610pld.3.2018.05.16.03.07.28; Wed, 16 May 2018 03:07:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753485AbeEPKGI (ORCPT + 99 others); Wed, 16 May 2018 06:06:08 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36910 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752951AbeEPKFw (ORCPT ); Wed, 16 May 2018 06:05:52 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A20AA77020; Wed, 16 May 2018 10:05:51 +0000 (UTC) Received: from MiWiFi-R3L-srv.redhat.com (ovpn-8-19.pek2.redhat.com [10.72.8.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id 265B52024CBB; Wed, 16 May 2018 10:05:46 +0000 (UTC) From: Baoquan He To: linux-kernel@vger.kernel.org, mingo@kernel.org, lcapitulino@redhat.com, keescook@chromium.org, tglx@linutronix.de Cc: x86@kernel.org, hpa@zytor.com, fanc.fnst@cn.fujitsu.com, yasu.isimatu@gmail.com, indou.takao@jp.fujitsu.com, douly.fnst@cn.fujitsu.com, Baoquan He Subject: [PATCH 2/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Date: Wed, 16 May 2018 18:05:32 +0800 Message-Id: <20180516100532.14083-3-bhe@redhat.com> In-Reply-To: <20180516100532.14083-1-bhe@redhat.com> References: <20180516100532.14083-1-bhe@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Wed, 16 May 2018 10:05:51 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Wed, 16 May 2018 10:05:51 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'bhe@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For 1GB huge pages allocation, a regression bug is reported when KASLR is enabled. On KVM guest with 4GB RAM, and add the following to the kernel command-line: 'default_hugepagesz=1G hugepagesz=1G hugepages=1' Then boot the guest and check number of 1GB pages reserved: grep HugePages_Total /proc/meminfo When booting with "nokaslr" HugePages_Total is always 1. When booting without "nokaslr" sometimes HugePages_Total is zero (that is, reserving the 1GB page fails). It may need to boot a few times to trigger the issue. After investigation, the root cause is that kernel may be put in the only good 1GB huge page [0x40000000, 0x7fffffff] randomly. Below is the dmesg output snippet of the KVM guest. We can see that only [0x40000000, 0x7fffffff] region is good 1GB huge page, [0x100000000, 0x13fffffff] will be touched by memblock top-down allocation. [ +0.000000] e820: BIOS-provided physical RAM map: [ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved [ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable [ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved [ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved [ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved [ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable And also on those bare-metal machines with larger memory, one less 1GB huge page might be got with KASLR enabled than 'nokaslr' specified case. It's also because that kernel might be randomized into those good 1GB huge pages. To fix this, firstly parse kernel command-line to get how many 1GB huge pages are specified. Then try to skip the specified number of 1GB huge pages when decide which memory region kernel can be randomized into. And also change the name of handle_mem_memmap() as handle_mem_options() since it doesn't only handle 'mem=' and 'memmap=', but include 'hugepagesxxx' now. Signed-off-by: Baoquan He --- arch/x86/boot/compressed/kaslr.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 13bd879cdc5d..b4819faab602 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -241,7 +241,7 @@ static int parse_gb_huge_pages(char *param, char* val) } -static int handle_mem_memmap(void) +static int handle_mem_options(void) { char *args = (char *)get_cmd_line_ptr(); size_t len = strlen((char *)args); @@ -249,7 +249,8 @@ static int handle_mem_memmap(void) char *param, *val; u64 mem_size; - if (!strstr(args, "memmap=") && !strstr(args, "mem=")) + if (!strstr(args, "memmap=") && !strstr(args, "mem=") && + !strstr(args,"hugepages")) return 0; tmp_cmdline = malloc(len + 1); @@ -274,6 +275,8 @@ static int handle_mem_memmap(void) if (!strcmp(param, "memmap")) { mem_avoid_memmap(val); + } else if (strstr(param, "hugepages")) { + parse_gb_huge_pages(param, val); } else if (!strcmp(param, "mem")) { char *p = val; @@ -413,7 +416,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, /* We don't need to set a mapping for setup_data. */ /* Mark the memmap regions we need to avoid */ - handle_mem_memmap(); + handle_mem_options(); #ifdef CONFIG_X86_VERBOSE_BOOTUP /* Make sure video RAM can be used. */ @@ -617,7 +620,7 @@ static void process_mem_region(struct mem_vector *entry, /* If nothing overlaps, store the region and return. */ if (!mem_avoid_overlap(®ion, &overlap)) { - store_slot_info(®ion, image_size); + process_gb_huge_page(®ion, image_size); return; } @@ -627,7 +630,7 @@ static void process_mem_region(struct mem_vector *entry, beginning.start = region.start; beginning.size = overlap.start - region.start; - store_slot_info(&beginning, image_size); + process_gb_huge_page(&beginning, image_size); } /* Return if overlap extends to or past end of region. */ -- 2.13.6