Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751967AbdI1H4K (ORCPT ); Thu, 28 Sep 2017 03:56:10 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:32788 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751511AbdI1H4J (ORCPT ); Thu, 28 Sep 2017 03:56:09 -0400 X-Google-Smtp-Source: AOwi7QDt1o9Ufv5h6y+M9KdKgqss5m7k3s+ODDDE3HExOnFfQ6hPtMlCkTqitcqynxWNERHaz5zD1w== Date: Thu, 28 Sep 2017 09:56:05 +0200 From: Ingo Molnar To: Baoquan He Cc: linux-kernel@vger.kernel.org, x86@kernel.org, mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com, thgarnie@google.com, keescook@chromium.org, akpm@linux-foundation.org, yamada.masahiro@socionext.com, rja@hpe.com, frank.ramsay@hpe.com Subject: Re: [PATCH v2 RESEND 2/2] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system Message-ID: <20170928075605.g74zm5xeglosmvct@gmail.com> References: <1504770150-25456-1-git-send-email-bhe@redhat.com> <1504770150-25456-3-git-send-email-bhe@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1504770150-25456-3-git-send-email-bhe@redhat.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3380 Lines: 92 * Baoquan He wrote: > On SGI UV system, kernel often hangs when KASLR is enabled. Disabling > KASLR makes kernel work well. > > The back trace is: > > kernel BUG at arch/x86/mm/init_64.c:311! > invalid opcode: 0000 [#1] SMP > [...] > RIP: 0010:__init_extra_mapping+0x188/0x196 > [...] > Call Trace: > init_extra_mapping_uc+0x13/0x15 > map_high+0x67/0x75 > map_mmioh_high_uv3+0x20a/0x219 > uv_system_init_hub+0x12d9/0x1496 > uv_system_init+0x27/0x29 > native_smp_prepare_cpus+0x28d/0x2d8 > kernel_init_freeable+0xdd/0x253 > ? rest_init+0x80/0x80 > kernel_init+0xe/0x110 > ret_from_fork+0x2c/0x40 > > This is because the SGI UV system need map its MMIOH region to the direct > mapping section, and the mapping happens in rest_init() which is much > later than the calling of kernel_randomize_memory() to do mm KASLR. So > mm KASLR can't count in the size of the MMIOH region when caculate the > needed size of address space for the direct mapping section. > > When KASLR is disabled, there are 64TB address space for both system RAM > and the MMIOH regions to share. When KASLR is enabled, the current code > of mm KASLR only reserves the actual size of system RAM plus extra 10TB > for the direct mapping. Thus later the MMIOH mapping could go beyond > the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area. > Then BUG_ON() in __init_extra_mapping() will be triggered. > > E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH > regions: > > [ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000 > [ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000 > > They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are > spread out to 1TB regions. Then above two SGI MMIOH regions also will be > mapped into the direct mapping section. > > To fix it, we need check if it's SGI UV system by calling is_early_uv_system() > in kernel_randomize_memory(). If yes, do not adapt the size of the direct > mapping section, just keep it as 64TB. > > Signed-off-by: Baoquan He > Reviewed-by: Thomas Garnier > Acked-by: Mike Travis > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x86@kernel.org > Cc: Thomas Garnier > Cc: Kees Cook > Cc: Andrew Morton > Cc: Masahiro Yamada > --- > arch/x86/mm/kaslr.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c > index af599167fe3c..4d68c08df82d 100644 > --- a/arch/x86/mm/kaslr.c > +++ b/arch/x86/mm/kaslr.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include "mm_internal.h" > > @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void) > CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; > > /* Adapt phyiscal memory region size based on available memory */ > - if (memory_tb < kaslr_regions[0].size_tb) > + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system()) > kaslr_regions[0].size_tb = memory_tb; This is really an ugly hack. Is kaslr_regions[] incorrect? If so then it should be corrected instead of uglifying the code that uses it... Thanks, Ingo