Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1592751imm; Wed, 8 Aug 2018 21:33:49 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzRnyhJxuaRTsqs6+5H6+wc83SGcUnBT28vRRi9ilrvFCawWOfCQrExGbG+H72J8u89YlPW X-Received: by 2002:a63:e56:: with SMTP id 22-v6mr546717pgo.438.1533789229321; Wed, 08 Aug 2018 21:33:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533789229; cv=none; d=google.com; s=arc-20160816; b=O8yBJk6ihYX0YrHyKOdcHadhD1U84TnvEF5tw+p8cnVqx7YAZbTZpAYmlTQsWqoaRe xOuxk1oNgTg3KLuRsE/Gizx3m3E4ERX4jJvqYUn19MMVL1F/JQegNwFrIJdPGk+cCw9y wL1zixCgf4UwWS5o8W/rdA8oUPFFFBAOB72Ct22kEKd3kjsjicZAzjlykwNiRJr9KEMe GIAeBi3jJGlBCqla/vC2HQBZeoI73O1Nj/0KK1mDy5I4F3msuhjFEvoXU/eEjOWbk5tU XiPIQpzpPTbWthpJUMG4nhtLKE0xFOu61dVcOZ08PbMhoOjtNAGxUWvzcbU9wKy8E6X8 jsSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=s+heNWudSUpC7/F7dv4DaKhKv/gw2dBXPHjbNx4llog=; b=eufLxORx7Bpf1H+58HBJI/jEQhC/W1JRJS34wnUJRevSgmY8f/K0b0TumRchzWs4nd D9amGrAfBBqGAmR/Rm40UPr2BEU2Pe50fMkhlrEgtRmbiG1eT7ot0eNr+GMNblLSPtXk 2csDqrjdWfmPSKQoP0Iqabsb88STaYc7c2lN6EmUNGkhlPWX3DKXXvw3AbxsjgmKPYhM bMpMBW2k8sMeN7HqoHldXRlkv21WOGATUmttyH2IckHNaYWlZ7Rj5Bt0GG1KDRiEaf2t 8//iRTzqUOGHrG+0gsNFxzCY4Ur91yqEOMJ5qtFu8/1JPGRYamMjLNq4lyP1WI7BuKtD n3dQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 68-v6si6607839pff.55.2018.08.08.21.33.34; Wed, 08 Aug 2018 21:33:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728431AbeHIGyy (ORCPT + 99 others); Thu, 9 Aug 2018 02:54:54 -0400 Received: from mga11.intel.com ([192.55.52.93]:22077 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727211AbeHIGyx (ORCPT ); Thu, 9 Aug 2018 02:54:53 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Aug 2018 21:32:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,213,1531810800"; d="scan'208";a="61024056" Received: from sai-dev-mach.sc.intel.com ([143.183.140.52]) by fmsmga007.fm.intel.com with ESMTP; 08 Aug 2018 21:32:02 -0700 From: Sai Praneeth Prakhya To: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ricardo.ner@intel.com, matt@codeblueprint.co.uk, Sai Praneeth , Lee Chun-Yi , Al Stone , Borislav Petkov , Ingo Molnar , Andy Lutomirski , Bhupesh Sharma , Peter Zijlstra , Ard Biesheuvel Subject: [PATCH V1 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware Date: Wed, 8 Aug 2018 21:31:15 -0700 Message-Id: <1533789077-16156-5-git-send-email-sai.praneeth.prakhya@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1533789077-16156-1-git-send-email-sai.praneeth.prakhya@intel.com> References: <1533789077-16156-1-git-send-email-sai.praneeth.prakhya@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sai Praneeth EFI regions could briefly be divided into 3 types. 1. EFI_BOOT_SERVICES_ regions 2. EFI_RUNTIME_SERVICES_ regions 3. Other EFI regions like EFI_LOADER_ etc. As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory region except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses during boot time or at runtime (i.e. when the kernel is up and running). Presently, the kernel can fix up illegal accesses to EFI_BOOT_SERVICES_ regions *only* during kernel boot phase. If the firmware triggers illegal accesses to *any* other EFI regions during kernel boot, the kernel panics or if this happens during kernel runtime then the kernel hangs. Kernel panics/hangs because the memory region requested by the firmware isn't mapped, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which detects illegal accesses by the firmware and 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi page fault handler fixes it up by mapping the requested region. 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the efi page fault handler freezes efi_rts_wq and schedules a new process. 3. If the access is to any other efi region like above but if the efi runtime service is efi_reset_system(), then the efi page fault handler will reboot the machine through BIOS. Illegal accesses to EFI_BOOT_SERVICES_ and to other regions are dealt differently in efi page fault handler because, *generally* EFI_BOOT_SERVICES_ regions are smaller in size relative to other efi regions and hence could be reserved and can be dynamically mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ cannot be reserved as they are very huge in size and reserving them will make the kernel un-bootable. The efi specific page fault handler offers us two advantages: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Finally, this new mapping will not impact a reboot from kexec, as kexec is only concerned about runtime memory regions. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 7 ++ arch/x86/mm/fault.c | 9 ++ arch/x86/platform/efi/quirks.c | 152 ++++++++++++++++++++++++++++++++ drivers/firmware/efi/runtime-wrappers.c | 7 ++ include/linux/efi.h | 1 + 5 files changed, 176 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index c97f2e955cab..4942fa04d74b 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -144,8 +144,15 @@ extern void efi_switch_mm(struct mm_struct *mm); #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES extern void __init efi_save_original_memmap(void); +extern int efi_illegal_accesses_fixup(unsigned long phys_addr, + struct pt_regs *regs); #else static inline void __init efi_save_original_memmap(void) { } +static inline int efi_illegal_accesses_fixup(unsigned long phys_addr, + struct pt_regs *regs) +{ + return 0; +} #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ struct efi_setup_data { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..afd42e76058e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw */ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* fixup for buggy UEFI firmware*/ #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include /* vma_pkey() */ +#include /* fixup for buggy UEFI firmware*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* + * Buggy firmware could trigger illegal accesses to some EFI regions + * which might page fault, try to fixup or recover from such faults. + */ + if (efi_illegal_accesses_fixup(address, regs)) + return; + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 84b213a1460a..230924a35161 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -16,6 +16,7 @@ #include #include #include +#include #define EFI_MIN_RESERVE 5120 @@ -702,4 +703,155 @@ void __init efi_save_original_memmap(void) original_memory_map_present = true; } + +/* + * From the original EFI memory map passed by the firmware, return a + * pointer to the memory descriptor that describes the given physical + * address. If not found, return NULL. + */ +static efi_memory_desc_t *efi_get_md(unsigned long phys_addr) +{ + efi_memory_desc_t *md; + + for_each_efi_memory_desc_in_map(&original_memory_map, md) { + if (md->phys_addr <= phys_addr && + (phys_addr < (md->phys_addr + + (md->num_pages << EFI_PAGE_SHIFT)))) { + return md; + } + } + return NULL; +} + +/* + * Detect illegal accesses by the firmware and + * 1. If the illegally accessed region is EFI_BOOT_SERVICES_, + * fix it up by mapping the requested region. + * 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or + * EFI_LOADER_), then + * a. Freeze efi_rts_wq. + * b. Return error status to the efi caller process. + * c. Disable EFI Runtime Services forever and + * d. Schedule another process by explicitly calling scheduler. + * + * @return: Return 1, if the page fault is handled by mapping the + * requested region. Return 0 otherwise. + */ +int efi_illegal_accesses_fixup(unsigned long phys_addr, struct pt_regs *regs) +{ + char buf[64]; + efi_memory_desc_t *md; + unsigned long long phys_addr_end, size_in_MB; + + /* Fix page faults caused *only* by the firmware */ + if (current->active_mm != &efi_mm) + return 0; + + /* + * Address range 0x0000 - 0x0fff is always mapped in the efi_pgd, so + * page faulting on these addresses isn't expected. + */ + if (phys_addr >= 0x0000 && phys_addr <= 0x0fff) + return 0; + + /* + * Original memory map is needed to retrieve the memory descriptor + * that the firmware has faulted on. So, check if the kernel had + * saved the original memory map passed by the firmware during boot. + */ + if (!original_memory_map_present) { + pr_info("Original memory map not found, aborting fixing illegal " + "access by firmware\n"); + return 0; + } + + /* + * EFI Memory map could sometimes have holes, eg: SMRAM. So, make + * sure that a valid memory descriptor is present for the physical + * address that triggered page fault. + */ + md = efi_get_md(phys_addr); + if (!md) { + pr_info("Failed to find EFI memory descriptor for PA: 0x%lx\n", + phys_addr); + return 0; + } + + /* + * EFI_RUNTIME_SERVICES_ regions are mapped into efi_pgd + * by the kernel during boot and hence accesses to these regions + * should never page fault. + */ + if (md->type == EFI_RUNTIME_SERVICES_CODE || + md->type == EFI_RUNTIME_SERVICES_DATA) { + pr_info("Kernel shouldn't page fault on accesses to " + "EFI_RUNTIME_SERVICES_ regions\n"); + return 0; + } + + /* + * Now it's clear that an illegal access by the firmware has caused + * the page fault. Print stack trace and memory descriptor as it is + * useful to know which EFI Runtime Service is buggy and what did it + * try to access. + */ + phys_addr_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; + size_in_MB = md->num_pages >> (20 - EFI_PAGE_SHIFT); + WARN(1, FW_BUG "Detected illegal access by Firmware at PA: 0x%lx\n", + phys_addr); + pr_info("EFI Memory Descriptor for offending PA is:\n"); + pr_info("%s range=[0x%016llx-0x%016llx] (%lluMB)\n", + efi_md_typeattr_format(buf, sizeof(buf), md), md->phys_addr, + phys_addr_end, size_in_MB); + + /* + * Fix illegal accesses by firmware to EFI_BOOT_SERVICES_ + * regions by creating VA->PA mappings. Further accesses to these + * regions will not page fault. + */ + if (md->type == EFI_BOOT_SERVICES_CODE || + md->type == EFI_BOOT_SERVICES_DATA) { + efi_map_region(md); + pr_info("Fixed illegal access at PA: 0x%lx\n", phys_addr); + return 1; + } + + /* + * Buggy efi_reset_system() is handled differently from other EFI + * Runtime Services as it doesn't use efi_rts_wq. Although, + * native_machine_emergency_restart() says that machine_real_restart() + * could fail, it's better not to compilcate this fault handler + * because this case occurs *very* rarely and hence could be improved + * on a need by basis. + */ + if (efi_rts_work.efi_rts_id == RESET_SYSTEM) { + pr_info("efi_reset_system() buggy! Reboot through BIOS\n"); + machine_real_restart(MRR_BIOS); + return 0; + } + + /* + * Firmware didn't page fault on EFI_RUNTIME_SERVICES_ or + * EFI_BOOT_SERVICES_ regions. This means that the + * firmware has illegally accessed some other EFI region which can't + * be fixed. Hence, freeze efi_rts_wq. + */ + set_current_state(TASK_UNINTERRUPTIBLE); + + /* + * Before calling EFI Runtime Service, the kernel has switched the + * calling process to efi_mm. Hence, switch back to task_mm. + */ + arch_efi_call_virt_teardown(); + + /* Signal error status to the efi caller process */ + efi_rts_work.status = EFI_ABORTED; + complete(&efi_rts_work.efi_rts_comp); + + clear_bit(EFI_RUNTIME_SERVICES, &efi.flags); + pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n"); + schedule(); + + return 0; +} #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index b18b2d864c2c..5ca44ca22011 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -61,6 +61,11 @@ struct efi_runtime_work efi_rts_work; ({ \ efi_rts_work.status = EFI_ABORTED; \ \ + if (!efi_enabled(EFI_RUNTIME_SERVICES)) { \ + pr_err("Aborting! EFI Runtime Services disabled\n"); \ + goto exit; \ + } \ + \ init_completion(&efi_rts_work.efi_rts_comp); \ INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \ efi_rts_work.arg1 = _arg1; \ @@ -79,6 +84,7 @@ struct efi_runtime_work efi_rts_work; else \ pr_err("Failed to queue work to efi_rts_wq.\n"); \ \ +exit: \ efi_rts_work.status; \ }) @@ -393,6 +399,7 @@ static void virt_efi_reset_system(int reset_type, "could not get exclusive access to the firmware\n"); return; } + efi_rts_work.efi_rts_id = RESET_SYSTEM; __efi_call_virt(reset_system, reset_type, status, data_size, data); up(&efi_runtime_lock); } diff --git a/include/linux/efi.h b/include/linux/efi.h index 169b5837c72c..bfa1537994f5 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1682,6 +1682,7 @@ enum efi_rts_ids { SET_VARIABLE, QUERY_VARIABLE_INFO, GET_NEXT_HIGH_MONO_COUNT, + RESET_SYSTEM, UPDATE_CAPSULE, QUERY_CAPSULE_CAPS, }; -- 2.7.4