Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp1266994ybp; Fri, 11 Oct 2019 11:29:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqwHwea78kN5IWAcgnu6iHXZgn8gadOmaoROPyiWopddhqzjr7o+3LBYcFwOyv/lqkd2LRR9 X-Received: by 2002:a17:906:5e50:: with SMTP id b16mr15145456eju.156.1570818557517; Fri, 11 Oct 2019 11:29:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570818557; cv=none; d=google.com; s=arc-20160816; b=yU1OtNTD1ibOuNUnTQ9rCdRecLxmUptaV1++HcoaxIOXJG0LzwTf3u5PoQ3Wfa4e3V 0/p7lW1Sb6mYSTxFX/k/8VcyMVUPIQBMPW2vBmQ2SDz5fV7GSoqNXShbMCJAi1rfLK4c RqZ145iynIoMbjqCU80Y8cq2Xz6MgCPOGZuX9F0XebhgtNEVS1GpQ5TtcmnKIPkYC0UO MlyqxocfRNvYVnfmdfTb+esIxEYftq+C0QrqQK2cLrN5k+il+ASHSrV6B0zjelafT7LL 6kYq51T4XQyq9LKgFhScOxfP0/gCumo/GBySY4i4S3Qc6v2S96uf+f0o9fBtxjYESdNs TW3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :robot-unsubscribe:robot-id:message-id:mime-version:references :in-reply-to:cc:subject:to:reply-to:from:date; bh=zgeq4A066dLJXN8tnUnGWmRF48g84NysKBCV3uWqRhA=; b=pnHyB1PPk/XAq31igRE7BsZ0cmzRYPcPUd0TX/tG5STeyJwqlha1SiPK9ayDDhiEqK BRaG0J3DSqMM4om3e8nfDoXn7DqNbarf18he9SR8fOmzxTrIUxx8LbiVgptf3migPsGn fqiRe0Li0PB/GF7ZmgSpSKAVq1T/1PHYsIIr09oUJK1WhgppBBbOnvJJbOjAJF2w6JK3 jsqkAr0e7uOgYojrIlc9IGQndNKTwim2c7d/ZiOM7gdmoZYrAdp1d90kH963x6jbsVmf HyU/71by4tyufS/su5vybLxrbnOnYrkairgypOSbXhg2ZpWQEOyuGq2Whvguz2uCFcpO IqsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id jo18si6159392ejb.27.2019.10.11.11.28.54; Fri, 11 Oct 2019 11:29:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728812AbfJKS2R (ORCPT + 99 others); Fri, 11 Oct 2019 14:28:17 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33598 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728501AbfJKS2Q (ORCPT ); Fri, 11 Oct 2019 14:28:16 -0400 Received: from [5.158.153.53] (helo=tip-bot2.lab.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iIzdq-0008Sm-2Q; Fri, 11 Oct 2019 20:27:54 +0200 Received: from [127.0.1.1] (localhost [IPv6:::1]) by tip-bot2.lab.linutronix.de (Postfix) with ESMTP id B2A9E1C0324; Fri, 11 Oct 2019 20:27:53 +0200 (CEST) Date: Fri, 11 Oct 2019 18:27:53 -0000 From: "tip-bot2 for Steve Wahl" Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/urgent] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Cc: Steve Wahl , Borislav Petkov , Dave Hansen , "Kirill A. Shutemov" , Baoquan He , Brijesh Singh , dimitri.sivanich@hpe.com, Feng Tang , "H. Peter Anvin" , Ingo Molnar , Jordan Borgner , Juergen Gross , mike.travis@hpe.com, russ.anderson@hpe.com, stable@vger.kernel.org, Thomas Gleixner , "x86-ml" , Zhenzhong Duan , Ingo Molnar , Borislav Petkov , linux-kernel@vger.kernel.org In-Reply-To: <9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com> References: <9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com> MIME-Version: 1.0 Message-ID: <157081847364.9978.9626612100722839628.tip-bot2@tip-bot2> X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 2aa85f246c181b1fa89f27e8e20c5636426be624 Gitweb: https://git.kernel.org/tip/2aa85f246c181b1fa89f27e8e20c5636426be624 Author: Steve Wahl AuthorDate: Tue, 24 Sep 2019 16:03:55 -05:00 Committer: Borislav Petkov CommitterDate: Fri, 11 Oct 2019 18:38:15 +02:00 x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Our hardware (UV aka Superdome Flex) has address ranges marked reserved by the BIOS. Access to these ranges is caught as an error, causing the BIOS to halt the system. Initial page tables mapped a large range of physical addresses that were not checked against the list of BIOS reserved addresses, and sometimes included reserved addresses in part of the mapped range. Including the reserved range in the map allowed processor speculative accesses to the reserved range, triggering a BIOS halt. Used early in booting, the page table level2_kernel_pgt addresses 1 GiB divided into 2 MiB pages, and it was set up to linearly map a full 1 GiB of physical addresses that included the physical address range of the kernel image, as chosen by KASLR. But this also included a large range of unused addresses on either side of the kernel image. And unlike the kernel image's physical address range, this extra mapped space was not checked against the BIOS tables of usable RAM addresses. So there were times when the addresses chosen by KASLR would result in processor accessible mappings of BIOS reserved physical addresses. The kernel code did not directly access any of this extra mapped space, but having it mapped allowed the processor to issue speculative accesses into reserved memory, causing system halts. This was encountered somewhat rarely on a normal system boot, and much more often when starting the crash kernel if "crashkernel=512M,high" was specified on the command line (this heavily restricts the physical address of the crash kernel, in our case usually within 1 GiB of reserved space). The solution is to invalidate the pages of this table outside the kernel image's space before the page table is activated. It fixes this problem on our hardware. [ bp: Touchups. ] Signed-off-by: Steve Wahl Signed-off-by: Borislav Petkov Acked-by: Dave Hansen Acked-by: Kirill A. Shutemov Cc: Baoquan He Cc: Brijesh Singh Cc: dimitri.sivanich@hpe.com Cc: Feng Tang Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jordan Borgner Cc: Juergen Gross Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Cc: stable@vger.kernel.org Cc: Thomas Gleixner Cc: x86-ml Cc: Zhenzhong Duan Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com --- arch/x86/kernel/head64.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 29ffa49..206a4b6 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -222,13 +222,31 @@ unsigned long __head __startup_64(unsigned long physaddr, * we might write invalid pmds, when the kernel is relocated * cleanup_highmap() fixes this up along with the mappings * beyond _end. + * + * Only the region occupied by the kernel image has so far + * been checked against the table of usable memory regions + * provided by the firmware, so invalidate pages outside that + * region. A page table entry that maps to a reserved area of + * memory would allow processor speculation into that area, + * and on some hardware (particularly the UV platform) even + * speculative access to some reserved areas is caught as an + * error, causing the BIOS to halt the system. */ pmd = fixup_pointer(level2_kernel_pgt, physaddr); - for (i = 0; i < PTRS_PER_PMD; i++) { + + /* invalidate pages before the kernel image */ + for (i = 0; i < pmd_index((unsigned long)_text); i++) + pmd[i] &= ~_PAGE_PRESENT; + + /* fixup pages that are part of the kernel image */ + for (; i <= pmd_index((unsigned long)_end); i++) if (pmd[i] & _PAGE_PRESENT) pmd[i] += load_delta; - } + + /* invalidate pages after the kernel image */ + for (; i < PTRS_PER_PMD; i++) + pmd[i] &= ~_PAGE_PRESENT; /* * Fixup phys_base - remove the memory encryption mask to obtain