Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4202231imm; Fri, 18 May 2018 00:44:41 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrufOWIIj1DbBvXwBX+3cMCX7WH5xYMM/zCs6bCefHpEDkCVzHAx37Va9tZLGZJLZj+xDxM X-Received: by 2002:a62:df4c:: with SMTP id u73-v6mr8312552pfg.10.1526629481488; Fri, 18 May 2018 00:44:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526629481; cv=none; d=google.com; s=arc-20160816; b=Ay37FdzxDmOZiX5svn9YvCXxzd5knUETCmtBNUgY8+R9Na25R0GuMM0NoChr3olMW7 qx6JNy5ODQ9ox+76dIjWHj/hEU5qKuljpFxgaiMUoFIG2NeiRu8rKnGa8+duQyPJQKXI 1+Y9ix55l1yc+uAVam/3TBDDEvcCwVifTf6PXp4ZwkJV0k4SXG7QlQr4EXAjvZE1xacx Th3AMZOhdXdiDRSXkgyJL4PUs7uhV1EBQfMbpF1D7dhMvubHLn6g6dzgKGsPxx44257U JkHnDk8BqINfkXCFjb8+BFv9nUWWkRSCiQBJXZB4Qtw9EZrfPjGxG6uxrF6IM5s9LqNK hT3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=2sp8z9mHtg9ZF+ljmfK5Xs8tqiBU9vOJz27iQZ5XFY8=; b=NwqxtdzTlkoMxYp1wgwfkvATT7FOtRTPiEnfaSyUsZ60cowhbcN3uYapnauaOeIzN5 kA3B2WfhiIyCBmJSy6kW9M6r4DDTXEJVWLt9961+kQmliW3ipcwQAgo5ZqAgJ1mbYcdv seexval53rKJRyibP2ZMble0uJkdSItmt5gSrHNlar2tjqm7lyRNF1A9AwLy98PlsnRL /nba/8ZJ5DiVm54AH/MVNL/+357A1p1nc62cicBvxrbbk/boIfIbmrBCbfPIWAAkP1u6 lgxNz0JfWYFe4a2/zakgy8/2BO7Z45mHYOE9Dbvj9T3rmVnrxVPbMmvzaNQyAlx1ITQ3 lphA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g9-v6si6862729plt.232.2018.05.18.00.44.27; Fri, 18 May 2018 00:44:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751506AbeERHoG (ORCPT + 99 others); Fri, 18 May 2018 03:44:06 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:37158 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750882AbeERHoF (ORCPT ); Fri, 18 May 2018 03:44:05 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ACCC940711DD; Fri, 18 May 2018 07:44:04 +0000 (UTC) Received: from localhost (ovpn-8-19.pek2.redhat.com [10.72.8.19]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3D0DA2166BAD; Fri, 18 May 2018 07:44:02 +0000 (UTC) Date: Fri, 18 May 2018 15:43:59 +0800 From: Baoquan He To: Ingo Molnar Cc: linux-kernel@vger.kernel.org, lcapitulino@redhat.com, keescook@chromium.org, tglx@linutronix.de, x86@kernel.org, hpa@zytor.com, fanc.fnst@cn.fujitsu.com, yasu.isimatu@gmail.com, indou.takao@jp.fujitsu.com, douly.fnst@cn.fujitsu.com Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Message-ID: <20180518074359.GR24627@MiWiFi-R3L-srv> References: <20180516100532.14083-1-bhe@redhat.com> <20180518070046.GA18660@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180518070046.GA18660@gmail.com> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 18 May 2018 07:44:04 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 18 May 2018 07:44:04 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'bhe@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/18/18 at 09:00am, Ingo Molnar wrote: > > * Baoquan He wrote: > > > This is a regression bug fix. Luiz's team reported that 1GB huge page > > allocation will get one less 1GB page randomly when KASLR is enabled. On > > their KVM guest with 4GB RAM, which only has one good 1GB huge page, > > they found the 1GB huge page allocation sometime failed with below > > kernel option adding. > > > > default_hugepagesz=1G hugepagesz=1G hugepages=1 > > > > This is because kernel may be randomized into those good 1GB huge pages. > > > > I ever thought to solve this by specifying available memory regions > > which kernel KASLR can be randomized into to avoid those good 1GB huge > > pages. Chao's patches can be used to fix it: > > https://lkml.org/lkml/2018/2/28/217 > > > > Later, Ingo suggested avoiding them in boot KASLR code. > > https://lkml.org/lkml/2018/3/12/312 > > Yes, but these patches don't appear to implement what I suggested: > > > So there's apparently a mis-design here: > > > > - KASLR needs to be done very early on during bootup: - it's not realistic to > > expect KASLR to be done with a booted up kernel, because pointers to various > > KASLR-ed objects are already widely spread out in memory. > > > > - But for some unfathomable reason the memory hotplug attribute of memory > > regions is not part of the regular memory map but part of late-init ACPI data > > structures. > > > > The right solution would be _not_ to fudge the KASLR location, but to provide > > the memory hotplug information to early code, preferably via the primary memory > > map. KASLR can then make use of it and avoid those regions, just like it avoids > > other memory regions already. > > > > In addition to that hardware makers (including virtualized hardware) should also > > fix their systems to provide memory hotplug information to early code. > > So my question: why don't we pass in the information that these are hotplug pages > that should not be KASLR randomized into? > > If that attribute of memory regions was present then KASLR could simply skip the > hotplug regions! OK, I realized my saying above is misled because I didn't explain the background clearly. Let me add it: Previously, FJ reported the movable_node issue that KASLR will put kernel into movable_node. That cause those movable_nodes can't be hot plugged any more. So finally we plannned to solve it by adding a new kernel parameter : kaslr_boot_mem=nn[KMG]@ss[KMG] We want customer to specify memory regions which KASLR can make use to randomize kernel into. Outside of the specified regions, we need avoid to put kernel into those regions even though they are also available RAM. As for movable_node issue, we can add immovable regions into kaslr_boot_mem=nn[KMG]@ss[KMG]. During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages regression bug, I reproduced the bug and found out the root cause, then realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to kernel command-line, then the good 1GB region [1G, 2G) won't be taken into account for kernel physical randomization. Later, you pointed out that 'kaslr_boot_mem=' way need user to specify memory region manually, it's not good, suggested to solve them by getting information and solving them in KASLR boot code. So they are two issues now, for the movable_node issue, we need get hotplug information from SRAT table and then avoid them; for this 1GB hugepage issue, we need get information from kernel command-line, then avoid them. This patch is for the hugepage issue only. Since FJ reported the hotplug issue and they assigned engineers to work on it, I would like to wait for them to post according to your suggestion. I will add this to cover letter of v2 post. Thanks Baoquan