From: Sai Praneeth <[email protected]>
There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a debug config option which when enabled
detects and recovers from page faults caused by buggy firmware.
The above said illegal accesses trigger page fault in ring 0 because
firmware executes at ring 0 and if unhandled it hangs the kernel.
Provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.
Upon detetcing that the illegally accessed region is any region other
than EFI_RUNTIME_SERVICES_<CODE/DATA>, the efi page fault handler will
check if the access is by efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
schedules a new process.
This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.
Testing the patch set:
----------------------
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.
Changes from RFC to V1:
-----------------------
1. Drop "long jump" technique of dealing with illegal access and instead
use scheduling away from efi_rts_wq.
Changes from V1 to V2:
----------------------
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
was part of init/main.c file. As it is an architecture agnostic code,
moved the change to arch/x86/platform/efi/quirks.c file.
Changes from V2 to V3:
----------------------
1. Drop treating illegal access to EFI_BOOT_SERVICES_<CODE/DATA> regions
separatley from illegal accesses to other regions like
EFI_CONVENTIONAL_MEMORY or EFI_LOADER_<CODE/DATA>.
In previous versions, illegal access to EFI_BOOT_SERVICES_<CODE/DATA>
regions were handled by mapping requested region to efi_pgd but from
V3 they are handled similar to illegal access to other regions i.e by
freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.
Note:
-----
Patch set based on "next" branch in efi tree.
[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt
Sai Praneeth (5):
efi: Make efi_rts_work accessible to efi page fault handler
efi: Introduce __efi_init attribute
x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
x86/efi: Add efi page fault handler to recover from the page faults
caused by firmware
x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS
arch/x86/Kconfig | 17 +++
arch/x86/include/asm/efi.h | 11 ++
arch/x86/mm/fault.c | 9 ++
arch/x86/platform/efi/efi.c | 2 +
arch/x86/platform/efi/quirks.c | 188 ++++++++++++++++++++++++++++++++
drivers/firmware/efi/efi.c | 4 +-
drivers/firmware/efi/runtime-wrappers.c | 60 +++-------
include/linux/efi.h | 51 ++++++++-
8 files changed, 295 insertions(+), 47 deletions(-)
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
--
2.7.4
From: Sai Praneeth <[email protected]>
There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
after the kernel has assumed control of the platform. This violates UEFI
specification.
If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to recover
from the page fault triggered by the firmware so that the machine
doesn't hang.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/Kconfig | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..7dc270c17d0b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,23 @@ config EFI_MIXED
If unsure, say N.
+config EFI_WARN_ON_ILLEGAL_ACCESS
+ bool "Warn about illegal memory accesses by firmware" if EXPERT
+ depends on EFI
+ help
+ Enable this debug feature so that the kernel can detect illegal
+ memory accesses by firmware and issue a warning. Also,
+ 1. If the illegally accessed region is any region other than
+ EFI_RUNTIME_SERVICES_<CODE/DATA>, then the kernel freezes
+ efi_rts_wq and schedules a new process. Also, it disables EFI
+ Runtime Services, so that it will never again call buggy firmware.
+ 2. If the illegal access is by efi_reset_system(), then the
+ platform is rebooted through BIOS.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
--
2.7.4
From: Sai Praneeth <[email protected]>
After the kernel has booted, if the firmware accesses *any* efi regions
other than EFI_RUNTIME_SERVICES_<CODE/DATA>, the efi page fault handler
would freeze efi_rts_wq and schedules a new process. To do this, the efi
page fault handler needs efi_rts_work. Hence, make it accessible.
There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/runtime-wrappers.c | 53 ++++++---------------------------
include/linux/efi.h | 36 ++++++++++++++++++++++
2 files changed, 45 insertions(+), 44 deletions(-)
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
- GET_TIME,
- SET_TIME,
- GET_WAKEUP_TIME,
- SET_WAKEUP_TIME,
- GET_VARIABLE,
- GET_NEXT_VARIABLE,
- SET_VARIABLE,
- QUERY_VARIABLE_INFO,
- GET_NEXT_HIGH_MONO_COUNT,
- UPDATE_CAPSULE,
- QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work: Details of EFI Runtime Service work
- * @arg<1-5>: EFI Runtime Service function arguments
- * @status: Status of executing EFI Runtime Service
- * @efi_rts_id: EFI Runtime Service function identifier
- * @efi_rts_comp: Struct used for handling completions
- */
-struct efi_runtime_work {
- void *arg1;
- void *arg2;
- void *arg3;
- void *arg4;
- void *arg5;
- efi_status_t status;
- struct work_struct work;
- enum efi_rts_ids efi_rts_id;
- struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
/*
* efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
*/
#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
({ \
- struct efi_runtime_work efi_rts_work; \
efi_rts_work.status = EFI_ABORTED; \
\
init_completion(&efi_rts_work.efi_rts_comp); \
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
*/
static void efi_call_rts(struct work_struct *work)
{
- struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
- efi_rts_work = container_of(work, struct efi_runtime_work, work);
- arg1 = efi_rts_work->arg1;
- arg2 = efi_rts_work->arg2;
- arg3 = efi_rts_work->arg3;
- arg4 = efi_rts_work->arg4;
- arg5 = efi_rts_work->arg5;
+ arg1 = efi_rts_work.arg1;
+ arg2 = efi_rts_work.arg2;
+ arg3 = efi_rts_work.arg3;
+ arg4 = efi_rts_work.arg4;
+ arg5 = efi_rts_work.arg5;
- switch (efi_rts_work->efi_rts_id) {
+ switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
(efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
*/
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
- efi_rts_work->status = status;
- complete(&efi_rts_work->efi_rts_comp);
+ efi_rts_work.status = status;
+ complete(&efi_rts_work.efi_rts_comp);
}
static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
extern int efi_tpm_eventlog_init(void);
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ QUERY_VARIABLE_INFO,
+ GET_NEXT_HIGH_MONO_COUNT,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ * @efi_rts_id: EFI Runtime Service function identifier
+ * @efi_rts_comp: Struct used for handling completions
+ */
+struct efi_runtime_work {
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+ enum efi_rts_ids efi_rts_id;
+ struct completion efi_rts_comp;
+};
+
/* Workqueue to queue EFI Runtime Services */
extern struct workqueue_struct *efi_rts_wq;
+extern struct efi_runtime_work efi_rts_work;
+
#endif /* _LINUX_EFI_H */
--
2.7.4
From: Sai Praneeth <[email protected]>
The efi page fault handler that recovers from page faults caused by the
firmware needs the original memory map passed by the firmware. It looks
up this memory map to find the type of the memory region at which the
page fault occurred. Presently, EFI subsystem discards the original
memory map passed by the firmware and replaces it with a new memory map
that has only EFI_RUNTIME_SERVICES_<CODE/DATA> regions. But illegal
accesses by firmware can occur at any region. Hence, _only_ if
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the
original memory map passed by the firmware, so that efi page fault
handler could detect/recover from illegal accesses to *any* efi region.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 6 ++++++
arch/x86/platform/efi/efi.c | 2 ++
arch/x86/platform/efi/quirks.c | 48 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..788ed4cbce22 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+extern void __init efi_save_original_memmap(void);
+#else
+static inline void __init efi_save_original_memmap(void) { }
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
+
struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..7a3ea4cd5939 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)
pa = __pa(new_memmap);
+ efi_save_original_memmap();
+
/*
* Unregister the early EFI memmap from efi_init() and install
* the new EFI memory map that we are about to pass to the
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..36b0b042ba56 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -654,3 +654,51 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff,
}
#endif
+
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+
+static bool original_memory_map_present;
+static struct efi_memory_map original_memory_map;
+
+/*
+ * The efi page fault handler that recovers from page faults caused by
+ * buggy firmware needs original memory map passed by firmware. Hence,
+ * build a new EFI memmap that has all entries and save it for later use.
+ */
+void __init efi_save_original_memmap(void)
+{
+ efi_memory_desc_t *md;
+ void *remapped_phys, *new_md;
+ phys_addr_t new_phys, new_size;
+
+ new_size = efi.memmap.desc_size * efi.memmap.nr_map;
+ new_phys = efi_memmap_alloc(efi.memmap.nr_map);
+ if (!new_phys) {
+ pr_err("Failed to allocate new EFI memmap\n");
+ return;
+ }
+
+ remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
+ if (!remapped_phys) {
+ pr_err("Failed to remap new EFI memmap\n");
+ __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size));
+ return;
+ }
+
+ new_md = remapped_phys;
+ for_each_efi_memory_desc(md) {
+ memcpy(new_md, md, efi.memmap.desc_size);
+ new_md += efi.memmap.desc_size;
+ }
+
+ original_memory_map.late = 1;
+ original_memory_map.phys_map = new_phys;
+ original_memory_map.map = remapped_phys;
+ original_memory_map.nr_map = efi.memmap.nr_map;
+ original_memory_map.desc_size = efi.memmap.desc_size;
+ original_memory_map.map_end = remapped_phys + new_size;
+ original_memory_map.desc_version = efi.memmap.desc_version;
+
+ original_memory_map_present = true;
+}
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
--
2.7.4
From: Sai Praneeth <[email protected]>
As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory regions except
EFI_RUNTIME_SERVICES_<CODE/DATA> regions is considered illegal. A buggy
firmware could trigger these illegal accesses when an efi runtime
service is invoked and if this happens when the kernel is up and
running, the kernel hangs.
Kernel hangs because the memory region requested by the firmware isn't
mapped in efi_pgd, which causes a page fault in ring 0 and the kernel
fails to handle it, leading to die(). To save kernel from hanging, add
an efi specific page fault handler which detects illegal accesses by the
firmware and if the access is to any region other than
EFI_RUNTIME_SERVICES_<CODE/DATA>, then
1. The efi page fault handler freezes efi_rts_wq and schedules a new
process.
2. If the efi runtime service is efi_reset_system(), then the efi page
fault handler will reboot the machine through BIOS and not through
efi_reset_system().
The efi specific page fault handler offers us two advantages:
1. Recovers from potential hangs that could be caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 5 ++
arch/x86/mm/fault.c | 9 ++
arch/x86/platform/efi/quirks.c | 140 ++++++++++++++++++++++++++++++++
drivers/firmware/efi/runtime-wrappers.c | 7 ++
include/linux/efi.h | 1 +
5 files changed, 162 insertions(+)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 788ed4cbce22..f3d9c3c2359e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,8 +143,13 @@ extern void efi_switch_mm(struct mm_struct *mm);
#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
extern void __init efi_save_original_memmap(void);
+extern int efi_illegal_accesses_fixup(unsigned long phys_addr);
#else
static inline void __init efi_save_original_memmap(void) { }
+static inline int efi_illegal_accesses_fixup(unsigned long phys_addr)
+{
+ return 0;
+}
#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
struct efi_setup_data {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..4f6939d8e13f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
#include <linux/prefetch.h> /* prefetchw */
#include <linux/context_tracking.h> /* exception_enter(), ... */
#include <linux/uaccess.h> /* faulthandler_disabled() */
+#include <linux/efi.h> /* fixup for buggy UEFI firmware*/
#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/traps.h> /* dotraplinkage, ... */
@@ -24,6 +25,7 @@
#include <asm/vsyscall.h> /* emulate_vsyscall */
#include <asm/vm86.h> /* struct vm86 */
#include <asm/mmu_context.h> /* vma_pkey() */
+#include <asm/efi.h> /* fixup for buggy UEFI firmware*/
#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
/*
+ * Buggy firmware could trigger illegal accesses to some EFI regions
+ * which might page fault, try to recover from such faults.
+ */
+ if (efi_illegal_accesses_fixup(address))
+ return;
+
+ /*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice:
*/
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36b0b042ba56..2aba28a90800 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
#include <asm/efi.h>
#include <asm/uv/uv.h>
#include <asm/cpu_device_id.h>
+#include <asm/reboot.h>
#define EFI_MIN_RESERVE 5120
@@ -701,4 +702,143 @@ void __init efi_save_original_memmap(void)
original_memory_map_present = true;
}
+
+/*
+ * From the original EFI memory map passed by the firmware, return a
+ * pointer to the memory descriptor that describes the given physical
+ * address. If not found, return NULL.
+ */
+static efi_memory_desc_t *efi_get_md(unsigned long phys_addr)
+{
+ efi_memory_desc_t *md;
+
+ for_each_efi_memory_desc_in_map(&original_memory_map, md) {
+ if (md->phys_addr <= phys_addr &&
+ (phys_addr < (md->phys_addr +
+ (md->num_pages << EFI_PAGE_SHIFT)))) {
+ return md;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * Detect illegal access by the firmware and if the illegally accessed
+ * region is any region described by efi memory map and other than
+ * EFI_RUNTIME_SERVICES_<CODE/DATA>, then
+ * 1. If the efi runtime service is efi_reset_system(), then reboot
+ * through BIOS.
+ * 2. If the efi runtime service is _not_ efi_reset_system(), then
+ * a. Freeze efi_rts_wq.
+ * b. Return error status to the efi caller process.
+ * c. Disable EFI Runtime Services forever and
+ * d. Schedule another process by explicitly calling scheduler.
+ *
+ * @return: Returns 0, if the page fault is not handled. This function
+ * will never return if the page fault is handled successfully.
+ */
+int efi_illegal_accesses_fixup(unsigned long phys_addr)
+{
+ char buf[64];
+ efi_memory_desc_t *md;
+ unsigned long long phys_addr_end, size_in_MB;
+
+ /* Fix page faults caused *only* by the firmware */
+ if (current->active_mm != &efi_mm)
+ return 0;
+
+ /*
+ * Address range 0x0000 - 0x0fff is always mapped in the efi_pgd, so
+ * page faulting on these addresses isn't expected.
+ */
+ if (phys_addr >= 0x0000 && phys_addr <= 0x0fff)
+ return 0;
+
+ /*
+ * Original memory map is needed to retrieve the memory descriptor
+ * that the firmware has faulted on. So, check if the kernel had
+ * saved the original memory map passed by the firmware during boot.
+ */
+ if (!original_memory_map_present) {
+ pr_info("Original memory map not found, abort recovering from "
+ "illegal access by firmware\n");
+ return 0;
+ }
+
+ /*
+ * EFI Memory map could sometimes have holes, eg: SMRAM. So, make
+ * sure that a valid memory descriptor is present for the physical
+ * address that triggered page fault.
+ */
+ md = efi_get_md(phys_addr);
+ if (!md) {
+ pr_info("Failed to find EFI memory descriptor for PA: 0x%lx\n",
+ phys_addr);
+ return 0;
+ }
+
+ /*
+ * EFI_RUNTIME_SERVICES_<CODE/DATA> regions are mapped into efi_pgd
+ * by the kernel during boot and hence accesses to these regions
+ * should never page fault.
+ */
+ if (md->type == EFI_RUNTIME_SERVICES_CODE ||
+ md->type == EFI_RUNTIME_SERVICES_DATA) {
+ pr_info("Kernel shouldn't page fault on accesses to "
+ "EFI_RUNTIME_SERVICES_<CODE/DATA> regions\n");
+ return 0;
+ }
+
+ /*
+ * Now it's clear that an illegal access by the firmware has caused
+ * the page fault. Print stack trace and memory descriptor as it is
+ * useful to know which EFI Runtime Service is buggy and what did it
+ * try to access.
+ */
+ phys_addr_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
+ size_in_MB = md->num_pages >> (20 - EFI_PAGE_SHIFT);
+ WARN(1, FW_BUG "Detected illegal access by Firmware at PA: 0x%lx\n",
+ phys_addr);
+ pr_info("EFI Memory Descriptor for offending PA is:\n");
+ pr_info("%s range=[0x%016llx-0x%016llx] (%lluMB)\n",
+ efi_md_typeattr_format(buf, sizeof(buf), md), md->phys_addr,
+ phys_addr_end, size_in_MB);
+
+ /*
+ * Buggy efi_reset_system() is handled differently from other EFI
+ * Runtime Services as it doesn't use efi_rts_wq. Although,
+ * native_machine_emergency_restart() says that machine_real_restart()
+ * could fail, it's better not to compilcate this fault handler
+ * because this case occurs *very* rarely and hence could be improved
+ * on a need by basis.
+ */
+ if (efi_rts_work.efi_rts_id == RESET_SYSTEM) {
+ pr_info("efi_reset_system() buggy! Reboot through BIOS\n");
+ machine_real_restart(MRR_BIOS);
+ return 0;
+ }
+
+ /*
+ * Firmware didn't page fault on EFI_RUNTIME_SERVICES_<CODE/DATA>.
+ * This means that the firmware has illegally accessed some other
+ * EFI region which can't be fixed. Hence, freeze efi_rts_wq.
+ */
+ set_current_state(TASK_UNINTERRUPTIBLE);
+
+ /*
+ * Before calling EFI Runtime Service, the kernel has switched the
+ * calling process to efi_mm. Hence, switch back to task_mm.
+ */
+ arch_efi_call_virt_teardown();
+
+ /* Signal error status to the efi caller process */
+ efi_rts_work.status = EFI_ABORTED;
+ complete(&efi_rts_work.efi_rts_comp);
+
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n");
+ schedule();
+
+ return 0;
+}
#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index b18b2d864c2c..5ca44ca22011 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -61,6 +61,11 @@ struct efi_runtime_work efi_rts_work;
({ \
efi_rts_work.status = EFI_ABORTED; \
\
+ if (!efi_enabled(EFI_RUNTIME_SERVICES)) { \
+ pr_err("Aborting! EFI Runtime Services disabled\n"); \
+ goto exit; \
+ } \
+ \
init_completion(&efi_rts_work.efi_rts_comp); \
INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
efi_rts_work.arg1 = _arg1; \
@@ -79,6 +84,7 @@ struct efi_runtime_work efi_rts_work;
else \
pr_err("Failed to queue work to efi_rts_wq.\n"); \
\
+exit: \
efi_rts_work.status; \
})
@@ -393,6 +399,7 @@ static void virt_efi_reset_system(int reset_type,
"could not get exclusive access to the firmware\n");
return;
}
+ efi_rts_work.efi_rts_id = RESET_SYSTEM;
__efi_call_virt(reset_system, reset_type, status, data_size, data);
up(&efi_runtime_lock);
}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 6a07e3166fd1..aa64fb88d4c8 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1682,6 +1682,7 @@ enum efi_rts_ids {
SET_VARIABLE,
QUERY_VARIABLE_INFO,
GET_NEXT_HIGH_MONO_COUNT,
+ RESET_SYSTEM,
UPDATE_CAPSULE,
QUERY_CAPSULE_CAPS,
};
--
2.7.4
From: Sai Praneeth <[email protected]>
Buggy firmware could illegally access some efi regions even after the
kernel has assumed control of the platform. When
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault
handler will detect and recover from these illegal accesses.
efi_md_typeattr_format() and memory_type_name are used by the efi page
fault handler to print information about memory descriptor that was
illegally accessed. As the page fault handler is present during/after
kernel boot it doesn't have an __init attribute, but
efi_md_typeattr_format() has it and thus during kernel build, "WARNING:
modpost: Found * section mismatch(es)" build warning is observed. To fix
it, remove __init attribute for efi_md_typeattr_format().
In order to not keep efi_md_typeattr_format() and memory_type_name
needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add
a new __efi_init attribute whose value changes based on whether the
config option is selected or not.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/efi.c | 4 ++--
include/linux/efi.h | 14 +++++++++++++-
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index d8a33a781a57..16571429b19c 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -768,7 +768,7 @@ int __init efi_get_fdt_params(struct efi_fdt_params *params)
}
#endif /* CONFIG_EFI_PARAMS_FROM_FDT */
-static __initdata char memory_type_name[][20] = {
+static __efi_initdata char memory_type_name[][20] = {
"Reserved",
"Loader Code",
"Loader Data",
@@ -786,7 +786,7 @@ static __initdata char memory_type_name[][20] = {
"Persistent Memory",
};
-char * __init efi_md_typeattr_format(char *buf, size_t size,
+char * __efi_init efi_md_typeattr_format(char *buf, size_t size,
const efi_memory_desc_t *md)
{
char *pos;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 855992b15269..6a07e3166fd1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1107,10 +1107,22 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm,
for_each_efi_memory_desc_in_map(&efi.memmap, md)
/*
+ * __efi_init - if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is enabled, remove __init
+ * modifier.
+ */
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+#define __efi_init
+#define __efi_initdata
+#else
+#define __efi_init __init
+#define __efi_initdata __initdata
+#endif
+
+/*
* Format an EFI memory descriptor's type and attributes to a user-provided
* character buffer, as per snprintf(), and return the buffer.
*/
-char * __init efi_md_typeattr_format(char *buf, size_t size,
+char * __efi_init efi_md_typeattr_format(char *buf, size_t size,
const efi_memory_desc_t *md);
/**
--
2.7.4
Hi Boris and Ard,
> Buggy firmware could illegally access some efi regions even after the kernel has
> assumed control of the platform. When
> "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault
> handler will detect and recover from these illegal accesses.
> efi_md_typeattr_format() and memory_type_name are used by the efi page
> fault handler to print information about memory descriptor that was illegally
> accessed. As the page fault handler is present during/after kernel boot it doesn't
> have an __init attribute, but
> efi_md_typeattr_format() has it and thus during kernel build, "WARNING:
> modpost: Found * section mismatch(es)" build warning is observed. To fix it,
> remove __init attribute for efi_md_typeattr_format().
>
> In order to not keep efi_md_typeattr_format() and memory_type_name
> needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected,
> add a new __efi_init attribute whose value changes based on whether the config
> option is selected or not.
In previous versions (i.e. up to V2), where we handled EFI_BOOT_SERVICES_<CODE/DATA>
regions differently, it made sense to have a separate attribute like __efi_init because many
function definitions were modified. From V3, do you think it's still OK to have __efi_init or
should I just remove __init attribute (and not have __efi_init) for efi_md_typeattr_format()
and memory_type_name because we are just modifying two.
Regards,
Sai
On Tue, Sep 04, 2018 at 03:12:27PM -0700, Sai Praneeth Prakhya wrote:
> +void __init efi_save_original_memmap(void)
> +{
> + efi_memory_desc_t *md;
> + void *remapped_phys, *new_md;
> + phys_addr_t new_phys, new_size;
> +
> + new_size = efi.memmap.desc_size * efi.memmap.nr_map;
> + new_phys = efi_memmap_alloc(efi.memmap.nr_map);
> + if (!new_phys) {
> + pr_err("Failed to allocate new EFI memmap\n");
> + return;
> + }
> +
> + remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
> + if (!remapped_phys) {
> + pr_err("Failed to remap new EFI memmap\n");
> + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size));
> + return;
> + }
> +
> + new_md = remapped_phys;
> + for_each_efi_memory_desc(md) {
> + memcpy(new_md, md, efi.memmap.desc_size);
> + new_md += efi.memmap.desc_size;
> + }
Should we ioremap_prot(remapped_phys, new_size, PROT_NONE), here? Such
that nobody can accidentally use this thing?
> + original_memory_map.late = 1;
> + original_memory_map.phys_map = new_phys;
> + original_memory_map.map = remapped_phys;
> + original_memory_map.nr_map = efi.memmap.nr_map;
> + original_memory_map.desc_size = efi.memmap.desc_size;
> + original_memory_map.map_end = remapped_phys + new_size;
> + original_memory_map.desc_version = efi.memmap.desc_version;
> +
> + original_memory_map_present = true;
> +}
> +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
> --
> 2.7.4
>
On 5 September 2018 at 00:12, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> The efi page fault handler that recovers from page faults caused by the
> firmware needs the original memory map passed by the firmware. It looks
> up this memory map to find the type of the memory region at which the
> page fault occurred. Presently, EFI subsystem discards the original
> memory map passed by the firmware and replaces it with a new memory map
> that has only EFI_RUNTIME_SERVICES_<CODE/DATA> regions. But illegal
> accesses by firmware can occur at any region. Hence, _only_ if
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the
> original memory map passed by the firmware, so that efi page fault
> handler could detect/recover from illegal accesses to *any* efi region.
>
Why do we care about the memory map at all when a fault occurs during
the invocation of a EFI runtime service?
I think reasoning about what went wrong and why, and distinguishing
between allowable and non-allowable faults is a slippery slope, so
[taking Thomas's feedback into account], I think we can simplify this
series further and just block all subsequent EFI runtime services
calls if any permission or page fault occurs while executing them.
Would we still need to preserve the old memory map in that case?
> Suggested-by: Matt Fleming <[email protected]>
> Based-on-code-from: Ricardo Neri <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Cc: Al Stone <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/include/asm/efi.h | 6 ++++++
> arch/x86/platform/efi/efi.c | 2 ++
> arch/x86/platform/efi/quirks.c | 48 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 56 insertions(+)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index cec5fae23eb3..788ed4cbce22 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -141,6 +141,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables);
> extern void efi_delete_dummy_variable(void);
> extern void efi_switch_mm(struct mm_struct *mm);
>
> +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
> +extern void __init efi_save_original_memmap(void);
> +#else
> +static inline void __init efi_save_original_memmap(void) { }
> +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
> +
> struct efi_setup_data {
> u64 fw_vendor;
> u64 runtime;
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 9061babfbc83..7a3ea4cd5939 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)
>
> pa = __pa(new_memmap);
>
> + efi_save_original_memmap();
> +
> /*
> * Unregister the early EFI memmap from efi_init() and install
> * the new EFI memory map that we are about to pass to the
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 844d31cb8a0c..36b0b042ba56 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -654,3 +654,51 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff,
> }
>
> #endif
> +
> +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
> +
> +static bool original_memory_map_present;
> +static struct efi_memory_map original_memory_map;
> +
> +/*
> + * The efi page fault handler that recovers from page faults caused by
> + * buggy firmware needs original memory map passed by firmware. Hence,
> + * build a new EFI memmap that has all entries and save it for later use.
> + */
> +void __init efi_save_original_memmap(void)
> +{
> + efi_memory_desc_t *md;
> + void *remapped_phys, *new_md;
> + phys_addr_t new_phys, new_size;
> +
> + new_size = efi.memmap.desc_size * efi.memmap.nr_map;
> + new_phys = efi_memmap_alloc(efi.memmap.nr_map);
> + if (!new_phys) {
> + pr_err("Failed to allocate new EFI memmap\n");
> + return;
> + }
> +
> + remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
> + if (!remapped_phys) {
> + pr_err("Failed to remap new EFI memmap\n");
> + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size));
> + return;
> + }
> +
> + new_md = remapped_phys;
> + for_each_efi_memory_desc(md) {
> + memcpy(new_md, md, efi.memmap.desc_size);
> + new_md += efi.memmap.desc_size;
> + }
> +
> + original_memory_map.late = 1;
> + original_memory_map.phys_map = new_phys;
> + original_memory_map.map = remapped_phys;
> + original_memory_map.nr_map = efi.memmap.nr_map;
> + original_memory_map.desc_size = efi.memmap.desc_size;
> + original_memory_map.map_end = remapped_phys + new_size;
> + original_memory_map.desc_version = efi.memmap.desc_version;
> +
> + original_memory_map_present = true;
> +}
> +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
> --
> 2.7.4
>
On Wed, Sep 05, 2018 at 02:27:49PM +0200, Ard Biesheuvel wrote:
> On 5 September 2018 at 00:12, Sai Praneeth Prakhya
> <[email protected]> wrote:
> > From: Sai Praneeth <[email protected]>
> >
> > The efi page fault handler that recovers from page faults caused by the
> > firmware needs the original memory map passed by the firmware. It looks
> > up this memory map to find the type of the memory region at which the
> > page fault occurred. Presently, EFI subsystem discards the original
> > memory map passed by the firmware and replaces it with a new memory map
> > that has only EFI_RUNTIME_SERVICES_<CODE/DATA> regions. But illegal
> > accesses by firmware can occur at any region. Hence, _only_ if
> > CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the
> > original memory map passed by the firmware, so that efi page fault
> > handler could detect/recover from illegal accesses to *any* efi region.
> >
>
> Why do we care about the memory map at all when a fault occurs during
> the invocation of a EFI runtime service?
>
> I think reasoning about what went wrong and why, and distinguishing
> between allowable and non-allowable faults is a slippery slope, so
> [taking Thomas's feedback into account], I think we can simplify this
> series further and just block all subsequent EFI runtime services
> calls if any permission or page fault occurs while executing them.
>
> Would we still need to preserve the old memory map in that case?
I thought the reason for having this was being able to know the fault is
in an EFI area. But of course, I'm not wel versed in this whole EFI
crapola.
On 5 September 2018 at 14:56, Peter Zijlstra <[email protected]> wrote:
> On Wed, Sep 05, 2018 at 02:27:49PM +0200, Ard Biesheuvel wrote:
>> On 5 September 2018 at 00:12, Sai Praneeth Prakhya
>> <[email protected]> wrote:
>> > From: Sai Praneeth <[email protected]>
>> >
>> > The efi page fault handler that recovers from page faults caused by the
>> > firmware needs the original memory map passed by the firmware. It looks
>> > up this memory map to find the type of the memory region at which the
>> > page fault occurred. Presently, EFI subsystem discards the original
>> > memory map passed by the firmware and replaces it with a new memory map
>> > that has only EFI_RUNTIME_SERVICES_<CODE/DATA> regions. But illegal
>> > accesses by firmware can occur at any region. Hence, _only_ if
>> > CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the
>> > original memory map passed by the firmware, so that efi page fault
>> > handler could detect/recover from illegal accesses to *any* efi region.
>> >
>>
>> Why do we care about the memory map at all when a fault occurs during
>> the invocation of a EFI runtime service?
>>
>> I think reasoning about what went wrong and why, and distinguishing
>> between allowable and non-allowable faults is a slippery slope, so
>> [taking Thomas's feedback into account], I think we can simplify this
>> series further and just block all subsequent EFI runtime services
>> calls if any permission or page fault occurs while executing them.
>>
>> Would we still need to preserve the old memory map in that case?
>
> I thought the reason for having this was being able to know the fault is
> in an EFI area. But of course, I'm not wel versed in this whole EFI
> crapola.
I'm not entirely sure whether that really matters. The EFI services
access the stack and can access byref/pointer arguments which are not
covered by the EFI memory map as runtime services code/data, and so
they can trigger page faults by running off the vmapped stack or
writing to const byref arguments.
EFI runtime services using boot services regions after they are no
longer available are a known source of headaches, but I don't see why
we should restrict ourselves to such cases if we bother to wire up
fault handling specifically for EFI services calls.
So any page or permission fault occurring in the context of a EFI
runtime services invocation should be treated the same, I think.
On Wed, 5 Sep 2018, Ard Biesheuvel wrote:
> On 5 September 2018 at 14:56, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Sep 05, 2018 at 02:27:49PM +0200, Ard Biesheuvel wrote:
> >> Would we still need to preserve the old memory map in that case?
> >
> > I thought the reason for having this was being able to know the fault is
> > in an EFI area. But of course, I'm not wel versed in this whole EFI
> > crapola.
>
> I'm not entirely sure whether that really matters. The EFI services
> access the stack and can access byref/pointer arguments which are not
> covered by the EFI memory map as runtime services code/data, and so
> they can trigger page faults by running off the vmapped stack or
> writing to const byref arguments.
>
> EFI runtime services using boot services regions after they are no
> longer available are a known source of headaches, but I don't see why
> we should restrict ourselves to such cases if we bother to wire up
> fault handling specifically for EFI services calls.
>
> So any page or permission fault occurring in the context of a EFI
> runtime services invocation should be treated the same, I think.
I agree. Keep it simple. If the EFI crap fails, then assist with the reboot
and otherwise just kill it.
Thanks,
tglx
> On Wed, 5 Sep 2018, Ard Biesheuvel wrote:
> > On 5 September 2018 at 14:56, Peter Zijlstra <[email protected]> wrote:
> > > On Wed, Sep 05, 2018 at 02:27:49PM +0200, Ard Biesheuvel wrote:
> > >> Would we still need to preserve the old memory map in that case?
> > >
> > > I thought the reason for having this was being able to know the
> > > fault is in an EFI area. But of course, I'm not wel versed in this
> > > whole EFI crapola.
> >
> > I'm not entirely sure whether that really matters. The EFI services
> > access the stack and can access byref/pointer arguments which are not
> > covered by the EFI memory map as runtime services code/data, and so
> > they can trigger page faults by running off the vmapped stack or
> > writing to const byref arguments.
> >
> > EFI runtime services using boot services regions after they are no
> > longer available are a known source of headaches, but I don't see why
> > we should restrict ourselves to such cases if we bother to wire up
> > fault handling specifically for EFI services calls.
> >
> > So any page or permission fault occurring in the context of a EFI
> > runtime services invocation should be treated the same, I think.
>
> I agree. Keep it simple. If the EFI crap fails, then assist with the reboot and
> otherwise just kill it.
The reasons for saving old memory map are
(in my view, these are the less important ones because they are very unlikely to happen)
1. Make sure that a memory descriptor exists for the physical address that was
faulted on (EFI Memory Map could sometime have holes). Assuming a case that the
physical address that caused page fault doesn't have a valid efi memory descriptor, the
efi page fault handler shouldn't take any action because it hasn't triaged the problem yet.
2. Make sure that the faulted physical address is _not_ efi runtime service code/data region.
Efi runtime service code/data regions are always mapped by kernel in efi_pgd and accesses
to these regions should _never_ page fault. Assuming that something like this happens,
efi page fault handler shouldn't take any action because it's not any illegal access by firmware
but it's a kernel bug.
Generally, the above two scenarios should never happen. I am just being paranoid and wanted
to make sure that the efi page fault handler is fixing the right firmware bug that I came across
and not something else. I also agree that, we could make the patch set and efi page fault handler
much simpler by not saving old memory map. So, I am OK if we are not checking for the above
two scenarios. If they are really needed, we could add them later.
That said, a more important reason (in my view) is to print out the memory descriptor that
we faulted on. This is a *proof* showing that it's buggy firmware that caused page fault and
hence is not a kernel bug. This proof is important because whenever a stack trace is printed
with some efi function, kernel is the usual suspect and hence we need to show that it's not
kernel fault. It could also help firmware engineers to fix the bug easily.
dmesg would show something like this when buggy efi_reset_system() accesses reserved region:
[ 296.141511] efi: EFI Memory Descriptor for offending PA is:
[ 296.141844] efi: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x000000007e915000-0x000000007e933fff] (0MB)
[ 296.142522] efi: efi_reset_system() buggy! Reboot through BIOS
So, I would be concerned if we miss this proof.
Regards,
Sai
> >> I agree. Keep it simple. If the EFI crap fails, then assist with the
> >> reboot and otherwise just kill it.
> >
> > The reasons for saving old memory map are (in my view, these are the
> > less important ones because they are very unlikely to happen)
> >
> > 1. Make sure that a memory descriptor exists for the physical address
> > that was faulted on (EFI Memory Map could sometime have holes).
> > Assuming a case that the physical address that caused page fault
> > doesn't have a valid efi memory descriptor, the efi page fault handler shouldn't
> take any action because it hasn't triaged the problem yet.
> >
> > 2. Make sure that the faulted physical address is _not_ efi runtime service
> code/data region.
> > Efi runtime service code/data regions are always mapped by kernel in
> > efi_pgd and accesses to these regions should _never_ page fault.
> > Assuming that something like this happens, efi page fault handler
> > shouldn't take any action because it's not any illegal access by firmware but it's
> a kernel bug.
> >
>
> What about attempts to modify code regions or attempts to execute data
> regions? What kind of fault will that trigger, and are they being handled at the
> moment?
AFAIK, at least in the x86 world, attempts to write to read only regions or attempts
to execute XP (execute protected) pages will result in page fault and I don't think
we are handling them.
>
> As I pointed out, EFI runtime services code may legally access the stack or
> dereference pointer arguments, but could still contain bugs that result in out of
> bounds accesses or writes to read-only regions.
Yes, agreed. In fact, I did see these bugs.
> So I don't really care about the address of the illegal access, any fault that
> occurs while running in the firmware should be treated the same.
Ok.. makes sense.
> In fact, cross
> referencing the value of IP with RuntimeServicesCode regions may be more
> useful
This is to verify that firmware is indeed executing code from RuntimeServicesCode
regions when it faulted. Is that correct? Or did you mean something else?
> > That said, a more important reason (in my view) is to print out the
> > memory descriptor that we faulted on. This is a *proof* showing that
> > it's buggy firmware that caused page fault and hence is not a kernel
> > bug. This proof is important because whenever a stack trace is printed
> > with some efi function, kernel is the usual suspect and hence we need to show
> that it's not kernel fault. It could also help firmware engineers to fix the bug
> easily.
> >
> > dmesg would show something like this when buggy efi_reset_system()
> accesses reserved region:
> >
> > [ 296.141511] efi: EFI Memory Descriptor for offending PA is:
> > [ 296.141844] efi: [Reserved | | | | | | | | |WB|WT|WC|UC]
> range=[0x000000007e915000-0x000000007e933fff] (0MB)
> > [ 296.142522] efi: efi_reset_system() buggy! Reboot through BIOS
> >
> > So, I would be concerned if we miss this proof.
> >
>
> You can dump the entire memory map by putting efi=debug on the kernel
> command line, so all we need to do is report the physical address, and you can
> easily figure out for yourself which memory map entry covers it.
That's true. In fact, that's how I debugged this issue and hence thought that it might be
useful to have all that info at one place (i.e. in efi page fault handler).
But, as you said, to make the code look simpler, I will roll out a V4 without saving
original memory map.
Regards,
Sai
On 5 September 2018 at 19:53, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> On Wed, 5 Sep 2018, Ard Biesheuvel wrote:
>> > On 5 September 2018 at 14:56, Peter Zijlstra <[email protected]> wrote:
>> > > On Wed, Sep 05, 2018 at 02:27:49PM +0200, Ard Biesheuvel wrote:
>> > >> Would we still need to preserve the old memory map in that case?
>> > >
>> > > I thought the reason for having this was being able to know the
>> > > fault is in an EFI area. But of course, I'm not wel versed in this
>> > > whole EFI crapola.
>> >
>> > I'm not entirely sure whether that really matters. The EFI services
>> > access the stack and can access byref/pointer arguments which are not
>> > covered by the EFI memory map as runtime services code/data, and so
>> > they can trigger page faults by running off the vmapped stack or
>> > writing to const byref arguments.
>> >
>> > EFI runtime services using boot services regions after they are no
>> > longer available are a known source of headaches, but I don't see why
>> > we should restrict ourselves to such cases if we bother to wire up
>> > fault handling specifically for EFI services calls.
>> >
>> > So any page or permission fault occurring in the context of a EFI
>> > runtime services invocation should be treated the same, I think.
>>
>> I agree. Keep it simple. If the EFI crap fails, then assist with the reboot and
>> otherwise just kill it.
>
> The reasons for saving old memory map are
> (in my view, these are the less important ones because they are very unlikely to happen)
>
> 1. Make sure that a memory descriptor exists for the physical address that was
> faulted on (EFI Memory Map could sometime have holes). Assuming a case that the
> physical address that caused page fault doesn't have a valid efi memory descriptor, the
> efi page fault handler shouldn't take any action because it hasn't triaged the problem yet.
>
> 2. Make sure that the faulted physical address is _not_ efi runtime service code/data region.
> Efi runtime service code/data regions are always mapped by kernel in efi_pgd and accesses
> to these regions should _never_ page fault. Assuming that something like this happens,
> efi page fault handler shouldn't take any action because it's not any illegal access by firmware
> but it's a kernel bug.
>
What about attempts to modify code regions or attempts to execute data
regions? What kind of fault will that trigger, and are they being
handled at the moment?
As I pointed out, EFI runtime services code may legally access the
stack or dereference pointer arguments, but could still contain bugs
that result in out of bounds accesses or writes to read-only regions.
So I don't really care about the address of the illegal access, any
fault that occurs while running in the firmware should be treated the
same. In fact, cross referencing the value of IP with
RuntimeServicesCode regions may be more useful than trying to infer
whether the access itself was to a valid region.
> Generally, the above two scenarios should never happen. I am just being paranoid and wanted
> to make sure that the efi page fault handler is fixing the right firmware bug that I came across
> and not something else. I also agree that, we could make the patch set and efi page fault handler
> much simpler by not saving old memory map. So, I am OK if we are not checking for the above
> two scenarios. If they are really needed, we could add them later.
>
> That said, a more important reason (in my view) is to print out the memory descriptor that
> we faulted on. This is a *proof* showing that it's buggy firmware that caused page fault and
> hence is not a kernel bug. This proof is important because whenever a stack trace is printed
> with some efi function, kernel is the usual suspect and hence we need to show that it's not
> kernel fault. It could also help firmware engineers to fix the bug easily.
>
> dmesg would show something like this when buggy efi_reset_system() accesses reserved region:
>
> [ 296.141511] efi: EFI Memory Descriptor for offending PA is:
> [ 296.141844] efi: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x000000007e915000-0x000000007e933fff] (0MB)
> [ 296.142522] efi: efi_reset_system() buggy! Reboot through BIOS
>
> So, I would be concerned if we miss this proof.
>
You can dump the entire memory map by putting efi=debug on the kernel
command line, so all we need to do is report the physical address, and
you can easily figure out for yourself which memory map entry covers
it.