Received: by 10.223.185.116 with SMTP id b49csp1009921wrg; Wed, 21 Feb 2018 10:30:30 -0800 (PST) X-Google-Smtp-Source: AH8x225+g3gVzvxV/aHFKimfgworY7+WrLkn0Mpy84yxeHazsES+/AEx7tWtRXn3JxssQX3V9rFW X-Received: by 2002:a17:902:9a02:: with SMTP id v2-v6mr3972989plp.312.1519237830876; Wed, 21 Feb 2018 10:30:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519237830; cv=none; d=google.com; s=arc-20160816; b=WnGoeNY8n4KmEt4MWWwh3Qp5DdkaJGnks0Z5beQOyubi5MBbCAuTPnKov7G0joY0zr VEtvfW6t69YxI+ItrQRqa5EIL6u48CXrN7O76/hcDzWOzcEZb1rO8iwotZXQJeZXFkw3 rqIBTCr1PvTyQ5xO5mgi+zt9JblFRexBmw+UQwJf2WvN24+eelVFBcuwrGmL8mHmxsTd 0m2UTud38nR7Qnyz18V4GMcfeTsU+KRN4HT2L3pIaS13VT2ILsGMX71vzKWqFMjeoKcu rhfeR3uJrK9YakxSLxEKqwL+rLVlsqzcGniNM0GcABeExGCOit1N3HGNxk7umDb8ISNJ Y5mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=thziLbpOyK1v29UlK7xX+q7ArK6xcZw8aDtMeXmYKPk=; b=AYyHXv5FOtjSseYXLTPFabBe5+JqAttscWsfpwpHzB8thxiIvw5r+PRdZ35SnlUpQu k2nf5//RJZ5t4Cz7j8MWfWpzldPbwzuV9tkqO52iNZ3MZJhCnkjDRT1B8zOkch36QDTZ xVb0tHbWXrML/UgHLwk3X4rvwbNlI3NEV+6MTZ0373XHBLoYV49PWuRMz2G8mDpcnL2s QjD28yrMalQmvBSe5T3TM9GAiTkqn9jCQ0Uy5r40wwIST6q7W2z9MQKOEpW8+PtUfy6n ro27+RN0MmSuUWyEpyfrthrH+tfJQRKTsKEyin6lIqcDYmk4C7TKnVFvMAjf29lbBsHj n+UQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w186si6522019pgd.745.2018.02.21.10.30.16; Wed, 21 Feb 2018 10:30:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964878AbeBUNFu (ORCPT + 99 others); Wed, 21 Feb 2018 08:05:50 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:40422 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933217AbeBUNFp (ORCPT ); Wed, 21 Feb 2018 08:05:45 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id A128A11C8; Wed, 21 Feb 2018 13:05:44 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tony Luck , Andrew Morton , Andy Lutomirski , Borislav Petkov , Brian Gerst , Dave , Denys Vlasenko , Josh Poimboeuf , Linus Torvalds , Naoya Horiguchi , Peter Zijlstra , "Robert (Persistent Memory)" , Thomas Gleixner , linux-mm@kvack.org, Ingo Molnar Subject: [PATCH 4.14 152/167] x86/mm, mm/hwpoison: Dont unconditionally unmap kernel 1:1 pages Date: Wed, 21 Feb 2018 13:49:23 +0100 Message-Id: <20180221124533.132182139@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180221124524.639039577@linuxfoundation.org> References: <20180221124524.639039577@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Tony Luck commit fd0e786d9d09024f67bd71ec094b110237dc3840 upstream. In the following commit: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages") ... we added code to memory_failure() to unmap the page from the kernel 1:1 virtual address space to avoid speculative access to the page logging additional errors. But memory_failure() may not always succeed in taking the page offline, especially if the page belongs to the kernel. This can happen if there are too many corrected errors on a page and either mcelog(8) or drivers/ras/cec.c asks to take a page offline. Since we remove the 1:1 mapping early in memory_failure(), we can end up with the page unmapped, but still in use. On the next access the kernel crashes :-( There are also various debug paths that call memory_failure() to simulate occurrence of an error. Since there is no actual error in memory, we don't need to map out the page for those cases. Revert most of the previous attempt and keep the solution local to arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when: 1) there is a real error 2) memory_failure() succeeds. All of this only applies to 64-bit systems. 32-bit kernel doesn't map all of memory into kernel space. It isn't worth adding the code to unmap the piece that is mapped because nobody would run a 32-bit kernel on a machine that has recoverable machine checks. Signed-off-by: Tony Luck Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Cc: Denys Vlasenko Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Naoya Horiguchi Cc: Peter Zijlstra Cc: Robert (Persistent Memory) Cc: Thomas Gleixner Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org #v4.14 Fixes: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages") Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/page_64.h | 4 ---- arch/x86/kernel/cpu/mcheck/mce-internal.h | 15 +++++++++++++++ arch/x86/kernel/cpu/mcheck/mce.c | 17 +++++++++++------ include/linux/mm_inline.h | 6 ------ mm/memory-failure.c | 2 -- 5 files changed, 26 insertions(+), 18 deletions(-) --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -52,10 +52,6 @@ static inline void clear_page(void *page void copy_page(void *to, void *from); -#ifdef CONFIG_X86_MCE -#define arch_unmap_kpfn arch_unmap_kpfn -#endif - #endif /* !__ASSEMBLY__ */ #ifdef CONFIG_X86_VSYSCALL_EMULATION --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h @@ -115,4 +115,19 @@ static inline void mce_unregister_inject extern struct mca_config mca_cfg; +#ifndef CONFIG_X86_64 +/* + * On 32-bit systems it would be difficult to safely unmap a poison page + * from the kernel 1:1 map because there are no non-canonical addresses that + * we can use to refer to the address without risking a speculative access. + * However, this isn't much of an issue because: + * 1) Few unmappable pages are in the 1:1 map. Most are in HIGHMEM which + * are only mapped into the kernel as needed + * 2) Few people would run a 32-bit kernel on a machine that supports + * recoverable errors because they have too much memory to boot 32-bit. + */ +static inline void mce_unmap_kpfn(unsigned long pfn) {} +#define mce_unmap_kpfn mce_unmap_kpfn +#endif + #endif /* __X86_MCE_INTERNAL_H__ */ --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -106,6 +106,10 @@ static struct irq_work mce_irq_work; static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); +#ifndef mce_unmap_kpfn +static void mce_unmap_kpfn(unsigned long pfn); +#endif + /* * CPU/chipset specific EDAC code can register a notifier call here to print * MCE errors in a human-readable form. @@ -582,7 +586,8 @@ static int srao_decode_notifier(struct n if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) { pfn = mce->addr >> PAGE_SHIFT; - memory_failure(pfn, MCE_VECTOR, 0); + if (memory_failure(pfn, MCE_VECTOR, 0)) + mce_unmap_kpfn(pfn); } return NOTIFY_OK; @@ -1049,12 +1054,13 @@ static int do_memory_failure(struct mce ret = memory_failure(m->addr >> PAGE_SHIFT, MCE_VECTOR, flags); if (ret) pr_err("Memory error not recovered"); + else + mce_unmap_kpfn(m->addr >> PAGE_SHIFT); return ret; } -#if defined(arch_unmap_kpfn) && defined(CONFIG_MEMORY_FAILURE) - -void arch_unmap_kpfn(unsigned long pfn) +#ifndef mce_unmap_kpfn +static void mce_unmap_kpfn(unsigned long pfn) { unsigned long decoy_addr; @@ -1065,7 +1071,7 @@ void arch_unmap_kpfn(unsigned long pfn) * We would like to just call: * set_memory_np((unsigned long)pfn_to_kaddr(pfn), 1); * but doing that would radically increase the odds of a - * speculative access to the posion page because we'd have + * speculative access to the poison page because we'd have * the virtual address of the kernel 1:1 mapping sitting * around in registers. * Instead we get tricky. We create a non-canonical address @@ -1090,7 +1096,6 @@ void arch_unmap_kpfn(unsigned long pfn) if (set_memory_np(decoy_addr, 1)) pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn); - } #endif --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -127,10 +127,4 @@ static __always_inline enum lru_list pag #define lru_to_page(head) (list_entry((head)->prev, struct page, lru)) -#ifdef arch_unmap_kpfn -extern void arch_unmap_kpfn(unsigned long pfn); -#else -static __always_inline void arch_unmap_kpfn(unsigned long pfn) { } -#endif - #endif --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1146,8 +1146,6 @@ int memory_failure(unsigned long pfn, in return 0; } - arch_unmap_kpfn(pfn); - orig_head = hpage = compound_head(p); num_poisoned_pages_inc();