2018-08-09 04:32:54

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 0/6] Add efi page fault handler to fix/recover from

From: Sai Praneeth <[email protected]>

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a debug config option which when enabled
detects and fixes/recovers from page faults caused by buggy firmware.

The above said illegal accesses trigger page fault in ring 0 because
firmware executes at ring 0 and if unhandled it hangs the kernel. We
provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Depending on the illegally accessed efi region, the efi page fault
handler handles illegal accesses differently.
1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
the efi page fault handler fixes it up by mapping the requested
region.
2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
EFI_LOADER_<CODE/DATA>), then the efi page fault handler freezes
efi_rts_wq and schedules a new process.
3. If the access is to any other efi region like above but if the efi
runtime service is efi_reset_system(), then the efi page fault
handler will reboot the machine through BIOS.

Illegal accesses to EFI_BOOT_SERVICES_<CODE/DATA> and to other regions
are dealt differently in efi page fault handler because, *generally*
EFI_BOOT_SERVICES_<CODE/DATA> regions are smaller in size relative to
other efi regions and hence could be reserved and can be dynamically
mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and
EFI_LOADER_<CODE/DATA> cannot be reserved as they are very huge in size
and reserving them will make the kernel un-bootable.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
----------------------
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
-----------------------
1. Drop "long jump" technique of dealing with illegal access and instead
use scheduling away from efi_rts_wq.

Note:
-----
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (6):
efi: Make efi_rts_work accessible to efi page fault handler
x86/efi: Remove __init attribute from memory mapping functions
x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
x86/efi: Add efi page fault handler to fixup/recover from page faults
caused by firmware
x86/mm: If in_atomic(), allocate pages without sleeping
x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

arch/x86/Kconfig | 21 ++++
arch/x86/include/asm/efi.h | 24 +++-
arch/x86/mm/fault.c | 9 ++
arch/x86/mm/pageattr.c | 16 ++-
arch/x86/platform/efi/efi.c | 10 +-
arch/x86/platform/efi/efi_32.c | 2 +-
arch/x86/platform/efi/efi_64.c | 9 +-
arch/x86/platform/efi/quirks.c | 201 ++++++++++++++++++++++++++++++++
drivers/firmware/efi/efi.c | 6 +-
drivers/firmware/efi/runtime-wrappers.c | 60 +++-------
include/linux/efi.h | 53 ++++++++-
init/main.c | 3 +-
12 files changed, 350 insertions(+), 64 deletions(-)

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>

--
2.7.4



2018-08-09 04:33:02

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 1/6] efi: Make efi_rts_work accessible to efi page fault handler

From: Sai Praneeth <[email protected]>

If the firmware illegally accesses any efi regions other than
EFI_BOOT_SERVICES_<CODE/DATA>, the efi page fault handler would freeze
efi_rts_wq and schedules a new process. To do this, the efi page fault
handler needs efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/runtime-wrappers.c | 53 ++++++---------------------------
include/linux/efi.h | 36 ++++++++++++++++++++++
2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)

-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
- GET_TIME,
- SET_TIME,
- GET_WAKEUP_TIME,
- SET_WAKEUP_TIME,
- GET_VARIABLE,
- GET_NEXT_VARIABLE,
- SET_VARIABLE,
- QUERY_VARIABLE_INFO,
- GET_NEXT_HIGH_MONO_COUNT,
- UPDATE_CAPSULE,
- QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work: Details of EFI Runtime Service work
- * @arg<1-5>: EFI Runtime Service function arguments
- * @status: Status of executing EFI Runtime Service
- * @efi_rts_id: EFI Runtime Service function identifier
- * @efi_rts_comp: Struct used for handling completions
- */
-struct efi_runtime_work {
- void *arg1;
- void *arg2;
- void *arg3;
- void *arg4;
- void *arg5;
- efi_status_t status;
- struct work_struct work;
- enum efi_rts_ids efi_rts_id;
- struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;

/*
* efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
*/
#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
({ \
- struct efi_runtime_work efi_rts_work; \
efi_rts_work.status = EFI_ABORTED; \
\
init_completion(&efi_rts_work.efi_rts_comp); \
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
*/
static void efi_call_rts(struct work_struct *work)
{
- struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;

- efi_rts_work = container_of(work, struct efi_runtime_work, work);
- arg1 = efi_rts_work->arg1;
- arg2 = efi_rts_work->arg2;
- arg3 = efi_rts_work->arg3;
- arg4 = efi_rts_work->arg4;
- arg5 = efi_rts_work->arg5;
+ arg1 = efi_rts_work.arg1;
+ arg2 = efi_rts_work.arg2;
+ arg3 = efi_rts_work.arg3;
+ arg4 = efi_rts_work.arg4;
+ arg5 = efi_rts_work.arg5;

- switch (efi_rts_work->efi_rts_id) {
+ switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
(efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
*/
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
- efi_rts_work->status = status;
- complete(&efi_rts_work->efi_rts_comp);
+ efi_rts_work.status = status;
+ complete(&efi_rts_work.efi_rts_comp);
}

static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {

extern int efi_tpm_eventlog_init(void);

+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ QUERY_VARIABLE_INFO,
+ GET_NEXT_HIGH_MONO_COUNT,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ * @efi_rts_id: EFI Runtime Service function identifier
+ * @efi_rts_comp: Struct used for handling completions
+ */
+struct efi_runtime_work {
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+ enum efi_rts_ids efi_rts_id;
+ struct completion efi_rts_comp;
+};
+
/* Workqueue to queue EFI Runtime Services */
extern struct workqueue_struct *efi_rts_wq;

+extern struct efi_runtime_work efi_rts_work;
+
#endif /* _LINUX_EFI_H */
--
2.7.4


2018-08-09 04:33:12

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 2/6] x86/efi: Remove __init attribute from memory mapping functions

From: Sai Praneeth <[email protected]>

Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA
regions even after the kernel has assumed control of the platform. When
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is enabled, the efi page fault
handler will detect/fixup these illegal accesses. The below modified
functions are used by the page fault handler to fixup illegal accesses
to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault handler is
present during/after kernel boot it doesn't have an __init attribute,
but the below functions have it and thus during kernel build, "WARNING:
modpost: Found * section mismatch(es)" build warning is observed. To fix
it, remove __init attribute for all these functions.

In order to not keep these functions needlessly when
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is not selected, add a new
__efi_init_fixup attribute whose value changes based on whether the
config option is selected or not.

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 11 ++++++-----
arch/x86/platform/efi/efi.c | 4 ++--
arch/x86/platform/efi/efi_32.c | 2 +-
arch/x86/platform/efi/efi_64.c | 9 +++++----
drivers/firmware/efi/efi.c | 6 +++---
include/linux/efi.h | 16 ++++++++++++++--
6 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..9b70743400f3 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -103,8 +103,9 @@ struct efi_scratch {
preempt_enable(); \
})

-extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
- u32 type, u64 attribute);
+extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr,
+ unsigned long size, u32 type,
+ u64 attribute);

#ifdef CONFIG_KASAN
/*
@@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void);
extern pgd_t * __init efi_call_phys_prolog(void);
extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
extern void __init efi_print_memmap(void);
-extern void __init efi_memory_uc(u64 addr, unsigned long size);
-extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size);
+extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md);
extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
extern void efi_sync_low_kernel_mappings(void);
extern int __init efi_alloc_page_tables(void);
extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
-extern void __init old_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md);
extern void __init runtime_code_page_mkexec(void);
extern void __init efi_runtime_update_mappings(void);
extern void __init efi_dump_pagetable(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..439c2c40bf03 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void)
}
}

-void __init efi_memory_uc(u64 addr, unsigned long size)
+void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size)
{
unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
u64 npages;
@@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size)
set_memory_uc(addr, npages);
}

-void __init old_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup old_map_region(efi_memory_desc_t *md)
{
u64 start_pfn, end_pfn, end;
unsigned long size;
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 324b93328b37..8f31452bd204 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
return 0;
}

-void __init efi_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup efi_map_region(efi_memory_desc_t *md)
{
old_map_region(md);
}
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 448267f1c073..a04298312fdd 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
return 0;
}

-static void __init __map_region(efi_memory_desc_t *md, u64 va)
+static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va)
{
unsigned long flags = _PAGE_RW;
unsigned long pfn;
@@ -426,7 +426,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
md->phys_addr, va);
}

-void __init efi_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup efi_map_region(efi_memory_desc_t *md)
{
unsigned long size = md->num_pages << PAGE_SHIFT;
u64 pa = md->phys_addr;
@@ -488,8 +488,9 @@ void __init efi_map_region_fixed(efi_memory_desc_t *md)
__map_region(md, md->virt_addr);
}

-void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
- u32 type, u64 attribute)
+void __iomem *__efi_init_fixup efi_ioremap(unsigned long phys_addr,
+ unsigned long size, u32 type,
+ u64 attribute)
{
unsigned long last_map_pfn;

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index d8a33a781a57..7bc3fac7bfe0 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -768,7 +768,7 @@ int __init efi_get_fdt_params(struct efi_fdt_params *params)
}
#endif /* CONFIG_EFI_PARAMS_FROM_FDT */

-static __initdata char memory_type_name[][20] = {
+static __efi_initdata_fixup char memory_type_name[][20] = {
"Reserved",
"Loader Code",
"Loader Data",
@@ -786,8 +786,8 @@ static __initdata char memory_type_name[][20] = {
"Persistent Memory",
};

-char * __init efi_md_typeattr_format(char *buf, size_t size,
- const efi_memory_desc_t *md)
+char * __efi_init_fixup efi_md_typeattr_format(char *buf, size_t size,
+ const efi_memory_desc_t *md)
{
char *pos;
int type_len;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 855992b15269..169b5837c72c 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1107,11 +1107,23 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm,
for_each_efi_memory_desc_in_map(&efi.memmap, md)

/*
+ * __efi_init_fixup - if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is enabled,
+ * remove __init modifier.
+ */
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+#define __efi_init_fixup
+#define __efi_initdata_fixup
+#else
+#define __efi_init_fixup __init
+#define __efi_initdata_fixup __initdata
+#endif
+
+/*
* Format an EFI memory descriptor's type and attributes to a user-provided
* character buffer, as per snprintf(), and return the buffer.
*/
-char * __init efi_md_typeattr_format(char *buf, size_t size,
- const efi_memory_desc_t *md);
+char * __efi_init_fixup efi_md_typeattr_format(char *buf, size_t size,
+ const efi_memory_desc_t *md);

/**
* efi_range_is_wc - check the WC bit on an address range
--
2.7.4


2018-08-09 04:33:14

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 3/6] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware

From: Sai Praneeth <[email protected]>

The efi page fault handler that fixes up page faults caused by the
firmware needs the original memory map passed by the firmware. It looks
up this memory map to find the type of the memory region at which the
page fault occurred. Presently, EFI subsystem discards the original
memory map passed by the firmware and replaces it with a new memory map
that has only EFI_RUNTIME_SERVICES_<CODE/DATA> regions. But illegal
accesses by firmware can occur at any region. Hence, _only_ if
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is defined, create a backup of the
original memory map passed by the firmware, so that efi page fault
handler could detect/fix illegal accesses to *any* efi region.

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 6 ++++++
arch/x86/platform/efi/efi.c | 2 ++
arch/x86/platform/efi/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 57 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 9b70743400f3..c97f2e955cab 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);

+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+extern void __init efi_save_original_memmap(void);
+#else
+static inline void __init efi_save_original_memmap(void) { }
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
+
struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 439c2c40bf03..7d18b7ed5d41 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)

pa = __pa(new_memmap);

+ efi_save_original_memmap();
+
/*
* Unregister the early EFI memmap from efi_init() and install
* the new EFI memory map that we are about to pass to the
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..84b213a1460a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff,
}

#endif
+
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+
+static bool original_memory_map_present;
+static struct efi_memory_map original_memory_map;
+
+/*
+ * The page fault handler that fixes up page faults caused by buggy
+ * firmware needs original memory map (memory map passed by firmware).
+ * Hence, build a new EFI memmap that has *all* entries and save it for
+ * later use.
+ */
+void __init efi_save_original_memmap(void)
+{
+ efi_memory_desc_t *md;
+ void *remapped_phys, *new_md;
+ phys_addr_t new_phys, new_size;
+
+ new_size = efi.memmap.desc_size * efi.memmap.nr_map;
+ new_phys = efi_memmap_alloc(efi.memmap.nr_map);
+ if (!new_phys) {
+ pr_err("Failed to allocate new EFI memmap\n");
+ return;
+ }
+
+ remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
+ if (!remapped_phys) {
+ pr_err("Failed to remap new EFI memmap\n");
+ __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size));
+ return;
+ }
+
+ new_md = remapped_phys;
+ for_each_efi_memory_desc(md) {
+ memcpy(new_md, md, efi.memmap.desc_size);
+ new_md += efi.memmap.desc_size;
+ }
+
+ original_memory_map.late = 1;
+ original_memory_map.phys_map = new_phys;
+ original_memory_map.map = remapped_phys;
+ original_memory_map.nr_map = efi.memmap.nr_map;
+ original_memory_map.desc_size = efi.memmap.desc_size;
+ original_memory_map.map_end = remapped_phys + new_size;
+ original_memory_map.desc_version = efi.memmap.desc_version;
+
+ original_memory_map_present = true;
+}
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
--
2.7.4


2018-08-09 04:33:18

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

From: Sai Praneeth <[email protected]>

There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
after the kernel has assumed control of the platform. This violates UEFI
specification.

If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to
fixup/recover from the page fault triggered by the firmware so that the
machine doesn't hang.

To support this feature, two changes should be made to the existing efi
subsystem
1. Map EFI_BOOT_SERVICES_<CODE/DATA> regions only when
EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
Presently, the kernel maps EFI_BOOT_SERVICES_<CODE/DATA> regions as
a workaround for buggy firmware that accesses them even when they
shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESSES enabled (and hence efi
page fault handler) kernel can now detect and handle such accesses
dynamically. Hence, rather than safely mapping
EFI_BOOT_SERVICES_<CODE/DATA> regions *all* the time, map them on
demand.

2. If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call
efi_free_boot_services()
Presently, during early boot phase EFI_BOOT_SERVICES_<CODE/DATA>
regions are marked as reserved by kernel
(see efi_reserve_boot_services()) and are freed before entering
runtime (see efi_free_boot_services()). But, while dynamically
fixing page faults caused by the firmware, efi page fault handler
assumes that EFI_BOOT_SERVICES_<CODE/DATA> regions are still intact.
Hence, to make this assumption true, don't call
efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESSES is enabled.

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/Kconfig | 21 +++++++++++++++++++++
arch/x86/platform/efi/efi.c | 4 ++++
init/main.c | 3 ++-
3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..278e5820e8dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,27 @@ config EFI_MIXED

If unsure, say N.

+config EFI_WARN_ON_ILLEGAL_ACCESSES
+ bool "Warn about illegal memory accesses by firmware"
+ depends on EFI
+ help
+ Enable this debug feature so that the kernel can detect illegal
+ memory accesses by firmware and issue a warning. Also,
+ 1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
+ the kernel fixes it up by mapping the requested region.
+ 2. If the illegally accessed region is any other region (Eg:
+ EFI_CONVENTIONAL_MEMORY or EFI_LOADER_<CODE/DATA>), then the
+ kernel freezes efi_rts_wq and schedules a new process. Also, it
+ disables EFI Runtime Services, so that it will never again call
+ buggy firmware.
+ 3. If the access is to any other efi region like above but if the
+ buggy efi runtime service is efi_reset_system(), then the
+ platform is rebooted through BIOS.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7d18b7ed5d41..0ddb22a03d88 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md)
/*
* Map boot services regions as a workaround for buggy
* firmware that accesses them even when they shouldn't.
+ * (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is disabled)
*
* See efi_{reserve,free}_boot_services().
*/
+ if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES))
+ return false;
+
if (md->type == EFI_BOOT_SERVICES_CODE ||
md->type == EFI_BOOT_SERVICES_DATA)
return true;
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..dce0520861a1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();

- if (efi_enabled(EFI_RUNTIME_SERVICES)) {
+ if (efi_enabled(EFI_RUNTIME_SERVICES) &&
+ !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {
efi_free_boot_services();
}

--
2.7.4


2018-08-09 04:33:27

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 5/6] x86/mm: If in_atomic(), allocate pages without sleeping

From: Sai Praneeth <[email protected]>

A page fault occurs when any EFI Runtime Service tries to reference a
memory region which it shouldn't. If the illegally accessed region
is EFI_BOOT_SERVICES_<CODE/DATA>, the efi specific page fault handler
fixes it up by dynamically creating VA->PA mappings using
efi_map_region().

Originally, efi_map_region() and hence the functionality of creating
mappings for efi regions was intended to be used *only* during boot time
(please note __init modifier) and hence when called during runtime (i.e.
from efi page fault handler), the page allocators complain. Calling
efi_map_region() during runtime complains because "gfp_allowed_mask"
value changes from boot time to runtime (GFP_BOOT_MASK to
__GFP_BITS_MASK). During boot, even though efi_map_region() calls
alloc_<pte/pmd>_page with GFP_KERNEL, the page allocator doesn't
complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask",
but during runtime it isn't cleared and hence prints below stack trace.

BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts
1 lock held by fwts/2022:
irq event stamp: 45714
hardirqs last enabled at (45713): [<ffffffff81c00a54>] restore_regs_and_return_to_kernel+0x0/0x2c
hardirqs last disabled at (45714): [<ffffffff81c0112c>] error_entry+0x7c/0x100
softirqs last enabled at (44732): [<ffffffff81e00387>] __do_softirq+0x387/0x49a
softirqs last disabled at (44707): [<ffffffff8106fabb>] irq_exit+0xbb/0xc0
CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack+0x5e/0x8b
___might_sleep+0x20c/0x240
__alloc_pages_nodemask+0xc2/0x330
get_zeroed_page+0x12/0x40
alloc_pmd_page+0x13/0x50
populate_pmd+0xc0/0x2e0
? __lock_acquire+0x439/0x740
__cpa_process_fault+0x2e1/0x5d0
__change_page_attr_set_clr+0x7c3/0xcd0
? console_unlock+0x34d/0x660
? kernel_map_pages_in_pgd+0x8c/0x160
kernel_map_pages_in_pgd+0x8c/0x160
? printk+0x43/0x4b
? __map_region+0x3c/0x60
__map_region+0x3c/0x60
efi_map_region+0x83/0xd0
efi_illegal_accesses_fixup+0x1ca/0x1e0
no_context+0x112/0x390
__do_page_fault+0xc7/0x4f0
page_fault+0x1e/0x30
RIP: 0010:0xfffffffeffc7ccf1
RSP: 0018:ffffc9000075bbf0 EFLAGS: 00010282
RAX: 0000000000000048 RBX: ffffc9000075be10 RCX: ffffc9000075bad0
RDX: 00000000000003f8 RSI: ffffc9000075be10 RDI: fffffffeffc7cccf
RBP: ffffc9000075bdc8 R08: 0000000000000048 R09: 0000000000000048
R10: 00000000000003fd R11: 00000000000003f8 R12: ffff880032a92d80
R13: 0000000000000003 R14: 00007ffcf1eb9d50 R15: 0000000000000000
? efi_call+0xd1/0x160
? __lock_acquire+0x439/0x740
? _raw_spin_unlock+0x24/0x30
? virt_efi_get_next_high_mono_count+0x77/0xf0
? efi_test_ioctl+0x1ab/0xc20
? selinux_file_ioctl+0x122/0x1c0
? do_vfs_ioctl+0x92/0x6b0
? do_vfs_ioctl+0x92/0x6b0
? security_file_ioctl+0x3c/0x50
? selinux_capable+0x20/0x20
? ksys_ioctl+0x66/0x70
? __x64_sys_ioctl+0x16/0x20
? do_syscall_64+0x50/0x170
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the above warning by conditionally changing the allocation from
GFP_KERNEL to GFP_ATOMIC, so that efi page fault handler could use
efi_map_region() during runtime. This change shouldn't effect any other
generic page allocations because this allocation is used only by efi
functions [1].

[1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c

if (cpa->pgd) {
/*
* Right now, we only execute this code path when mapping
* the EFI virtual memory map regions, no other users
* provide a ->pgd value. This may change in the future.
*/
return populate_pgd(cpa, vaddr);
}

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/mm/pageattr.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e8d5c..1b28a333c8ce 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end)

static int alloc_pte_page(pmd_t *pmd)
{
- pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+ pte_t *pte;
+
+ if (in_atomic())
+ pte = (pte_t *)get_zeroed_page(GFP_ATOMIC);
+ else
+ pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pte)
return -1;

@@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd)

static int alloc_pmd_page(pud_t *pud)
{
- pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+ pmd_t *pmd;
+
+ if (in_atomic())
+ pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC);
+ else
+ pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pmd)
return -1;

--
2.7.4


2018-08-09 04:33:49

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V1 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware

From: Sai Praneeth <[email protected]>

EFI regions could briefly be divided into 3 types.
1. EFI_BOOT_SERVICES_<CODE/DATA> regions
2. EFI_RUNTIME_SERVICES_<CODE/DATA> regions
3. Other EFI regions like EFI_LOADER_<CODE/DATA> etc.

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory region except
EFI_RUNTIME_SERVICES_<CODE/DATA> regions is considered illegal. A buggy
firmware could trigger these illegal accesses during boot time or at
runtime (i.e. when the kernel is up and running). Presently, the kernel
can fix up illegal accesses to EFI_BOOT_SERVICES_<CODE/DATA> regions
*only* during kernel boot phase. If the firmware triggers illegal
accesses to *any* other EFI regions during kernel boot, the kernel
panics or if this happens during kernel runtime then the kernel hangs.

Kernel panics/hangs because the memory region requested by the firmware
isn't mapped, which causes a page fault in ring 0 and the kernel fails
to handle it, leading to die(). To save kernel from hanging, add an efi
specific page fault handler which detects illegal accesses by the
firmware and
1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
the efi page fault handler fixes it up by mapping the requested
region.
2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
EFI_LOADER_<CODE/DATA>), then the efi page fault handler freezes
efi_rts_wq and schedules a new process.
3. If the access is to any other efi region like above but if the efi
runtime service is efi_reset_system(), then the efi page fault
handler will reboot the machine through BIOS.

Illegal accesses to EFI_BOOT_SERVICES_<CODE/DATA> and to other regions
are dealt differently in efi page fault handler because, *generally*
EFI_BOOT_SERVICES_<CODE/DATA> regions are smaller in size relative to
other efi regions and hence could be reserved and can be dynamically
mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and
EFI_LOADER_<CODE/DATA> cannot be reserved as they are very huge in size
and reserving them will make the kernel un-bootable.

The efi specific page fault handler offers us two advantages:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Finally, this new mapping will not impact a reboot from kexec, as kexec
is only concerned about runtime memory regions.

Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 7 ++
arch/x86/mm/fault.c | 9 ++
arch/x86/platform/efi/quirks.c | 152 ++++++++++++++++++++++++++++++++
drivers/firmware/efi/runtime-wrappers.c | 7 ++
include/linux/efi.h | 1 +
5 files changed, 176 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index c97f2e955cab..4942fa04d74b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -144,8 +144,15 @@ extern void efi_switch_mm(struct mm_struct *mm);

#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
extern void __init efi_save_original_memmap(void);
+extern int efi_illegal_accesses_fixup(unsigned long phys_addr,
+ struct pt_regs *regs);
#else
static inline void __init efi_save_original_memmap(void) { }
+static inline int efi_illegal_accesses_fixup(unsigned long phys_addr,
+ struct pt_regs *regs)
+{
+ return 0;
+}
#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */

struct efi_setup_data {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..afd42e76058e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
#include <linux/prefetch.h> /* prefetchw */
#include <linux/context_tracking.h> /* exception_enter(), ... */
#include <linux/uaccess.h> /* faulthandler_disabled() */
+#include <linux/efi.h> /* fixup for buggy UEFI firmware*/

#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/traps.h> /* dotraplinkage, ... */
@@ -24,6 +25,7 @@
#include <asm/vsyscall.h> /* emulate_vsyscall */
#include <asm/vm86.h> /* struct vm86 */
#include <asm/mmu_context.h> /* vma_pkey() */
+#include <asm/efi.h> /* fixup for buggy UEFI firmware*/

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;

/*
+ * Buggy firmware could trigger illegal accesses to some EFI regions
+ * which might page fault, try to fixup or recover from such faults.
+ */
+ if (efi_illegal_accesses_fixup(address, regs))
+ return;
+
+ /*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice:
*/
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 84b213a1460a..230924a35161 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
#include <asm/efi.h>
#include <asm/uv/uv.h>
#include <asm/cpu_device_id.h>
+#include <asm/reboot.h>

#define EFI_MIN_RESERVE 5120

@@ -702,4 +703,155 @@ void __init efi_save_original_memmap(void)

original_memory_map_present = true;
}
+
+/*
+ * From the original EFI memory map passed by the firmware, return a
+ * pointer to the memory descriptor that describes the given physical
+ * address. If not found, return NULL.
+ */
+static efi_memory_desc_t *efi_get_md(unsigned long phys_addr)
+{
+ efi_memory_desc_t *md;
+
+ for_each_efi_memory_desc_in_map(&original_memory_map, md) {
+ if (md->phys_addr <= phys_addr &&
+ (phys_addr < (md->phys_addr +
+ (md->num_pages << EFI_PAGE_SHIFT)))) {
+ return md;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * Detect illegal accesses by the firmware and
+ * 1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
+ * fix it up by mapping the requested region.
+ * 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
+ * EFI_LOADER_<CODE/DATA>), then
+ * a. Freeze efi_rts_wq.
+ * b. Return error status to the efi caller process.
+ * c. Disable EFI Runtime Services forever and
+ * d. Schedule another process by explicitly calling scheduler.
+ *
+ * @return: Return 1, if the page fault is handled by mapping the
+ * requested region. Return 0 otherwise.
+ */
+int efi_illegal_accesses_fixup(unsigned long phys_addr, struct pt_regs *regs)
+{
+ char buf[64];
+ efi_memory_desc_t *md;
+ unsigned long long phys_addr_end, size_in_MB;
+
+ /* Fix page faults caused *only* by the firmware */
+ if (current->active_mm != &efi_mm)
+ return 0;
+
+ /*
+ * Address range 0x0000 - 0x0fff is always mapped in the efi_pgd, so
+ * page faulting on these addresses isn't expected.
+ */
+ if (phys_addr >= 0x0000 && phys_addr <= 0x0fff)
+ return 0;
+
+ /*
+ * Original memory map is needed to retrieve the memory descriptor
+ * that the firmware has faulted on. So, check if the kernel had
+ * saved the original memory map passed by the firmware during boot.
+ */
+ if (!original_memory_map_present) {
+ pr_info("Original memory map not found, aborting fixing illegal "
+ "access by firmware\n");
+ return 0;
+ }
+
+ /*
+ * EFI Memory map could sometimes have holes, eg: SMRAM. So, make
+ * sure that a valid memory descriptor is present for the physical
+ * address that triggered page fault.
+ */
+ md = efi_get_md(phys_addr);
+ if (!md) {
+ pr_info("Failed to find EFI memory descriptor for PA: 0x%lx\n",
+ phys_addr);
+ return 0;
+ }
+
+ /*
+ * EFI_RUNTIME_SERVICES_<CODE/DATA> regions are mapped into efi_pgd
+ * by the kernel during boot and hence accesses to these regions
+ * should never page fault.
+ */
+ if (md->type == EFI_RUNTIME_SERVICES_CODE ||
+ md->type == EFI_RUNTIME_SERVICES_DATA) {
+ pr_info("Kernel shouldn't page fault on accesses to "
+ "EFI_RUNTIME_SERVICES_<CODE/DATA> regions\n");
+ return 0;
+ }
+
+ /*
+ * Now it's clear that an illegal access by the firmware has caused
+ * the page fault. Print stack trace and memory descriptor as it is
+ * useful to know which EFI Runtime Service is buggy and what did it
+ * try to access.
+ */
+ phys_addr_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
+ size_in_MB = md->num_pages >> (20 - EFI_PAGE_SHIFT);
+ WARN(1, FW_BUG "Detected illegal access by Firmware at PA: 0x%lx\n",
+ phys_addr);
+ pr_info("EFI Memory Descriptor for offending PA is:\n");
+ pr_info("%s range=[0x%016llx-0x%016llx] (%lluMB)\n",
+ efi_md_typeattr_format(buf, sizeof(buf), md), md->phys_addr,
+ phys_addr_end, size_in_MB);
+
+ /*
+ * Fix illegal accesses by firmware to EFI_BOOT_SERVICES_<CODE/DATA>
+ * regions by creating VA->PA mappings. Further accesses to these
+ * regions will not page fault.
+ */
+ if (md->type == EFI_BOOT_SERVICES_CODE ||
+ md->type == EFI_BOOT_SERVICES_DATA) {
+ efi_map_region(md);
+ pr_info("Fixed illegal access at PA: 0x%lx\n", phys_addr);
+ return 1;
+ }
+
+ /*
+ * Buggy efi_reset_system() is handled differently from other EFI
+ * Runtime Services as it doesn't use efi_rts_wq. Although,
+ * native_machine_emergency_restart() says that machine_real_restart()
+ * could fail, it's better not to compilcate this fault handler
+ * because this case occurs *very* rarely and hence could be improved
+ * on a need by basis.
+ */
+ if (efi_rts_work.efi_rts_id == RESET_SYSTEM) {
+ pr_info("efi_reset_system() buggy! Reboot through BIOS\n");
+ machine_real_restart(MRR_BIOS);
+ return 0;
+ }
+
+ /*
+ * Firmware didn't page fault on EFI_RUNTIME_SERVICES_<CODE/DATA> or
+ * EFI_BOOT_SERVICES_<CODE/DATA> regions. This means that the
+ * firmware has illegally accessed some other EFI region which can't
+ * be fixed. Hence, freeze efi_rts_wq.
+ */
+ set_current_state(TASK_UNINTERRUPTIBLE);
+
+ /*
+ * Before calling EFI Runtime Service, the kernel has switched the
+ * calling process to efi_mm. Hence, switch back to task_mm.
+ */
+ arch_efi_call_virt_teardown();
+
+ /* Signal error status to the efi caller process */
+ efi_rts_work.status = EFI_ABORTED;
+ complete(&efi_rts_work.efi_rts_comp);
+
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n");
+ schedule();
+
+ return 0;
+}
#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index b18b2d864c2c..5ca44ca22011 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -61,6 +61,11 @@ struct efi_runtime_work efi_rts_work;
({ \
efi_rts_work.status = EFI_ABORTED; \
\
+ if (!efi_enabled(EFI_RUNTIME_SERVICES)) { \
+ pr_err("Aborting! EFI Runtime Services disabled\n"); \
+ goto exit; \
+ } \
+ \
init_completion(&efi_rts_work.efi_rts_comp); \
INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
efi_rts_work.arg1 = _arg1; \
@@ -79,6 +84,7 @@ struct efi_runtime_work efi_rts_work;
else \
pr_err("Failed to queue work to efi_rts_wq.\n"); \
\
+exit: \
efi_rts_work.status; \
})

@@ -393,6 +399,7 @@ static void virt_efi_reset_system(int reset_type,
"could not get exclusive access to the firmware\n");
return;
}
+ efi_rts_work.efi_rts_id = RESET_SYSTEM;
__efi_call_virt(reset_system, reset_type, status, data_size, data);
up(&efi_runtime_lock);
}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 169b5837c72c..bfa1537994f5 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1682,6 +1682,7 @@ enum efi_rts_ids {
SET_VARIABLE,
QUERY_VARIABLE_INFO,
GET_NEXT_HIGH_MONO_COUNT,
+ RESET_SYSTEM,
UPDATE_CAPSULE,
QUERY_CAPSULE_CAPS,
};
--
2.7.4


2018-08-10 11:54:07

by Bhupesh Sharma

[permalink] [raw]
Subject: Re: [PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

Hello Sai,

On Thu, Aug 9, 2018 at 10:01 AM, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> There may exist some buggy UEFI firmware implementations that might
> access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
> after the kernel has assumed control of the platform. This violates UEFI
> specification.
>
> If selected, this debug option will print a warning message if the UEFI
> firmware tries to access any memory region which it shouldn't. Along
> with the warning, the efi page fault handler will also try to
> fixup/recover from the page fault triggered by the firmware so that the
> machine doesn't hang.
>
> To support this feature, two changes should be made to the existing efi
> subsystem
> 1. Map EFI_BOOT_SERVICES_<CODE/DATA> regions only when
> EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
> Presently, the kernel maps EFI_BOOT_SERVICES_<CODE/DATA> regions as
> a workaround for buggy firmware that accesses them even when they
> shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESSES enabled (and hence efi
> page fault handler) kernel can now detect and handle such accesses
> dynamically. Hence, rather than safely mapping
> EFI_BOOT_SERVICES_<CODE/DATA> regions *all* the time, map them on
> demand.
>
> 2. If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call
> efi_free_boot_services()
> Presently, during early boot phase EFI_BOOT_SERVICES_<CODE/DATA>
> regions are marked as reserved by kernel
> (see efi_reserve_boot_services()) and are freed before entering
> runtime (see efi_free_boot_services()). But, while dynamically
> fixing page faults caused by the firmware, efi page fault handler
> assumes that EFI_BOOT_SERVICES_<CODE/DATA> regions are still intact.
> Hence, to make this assumption true, don't call
> efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESSES is enabled.
>
> Suggested-by: Matt Fleming <[email protected]>
> Based-on-code-from: Ricardo Neri <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Cc: Lee Chun-Yi <[email protected]>
> Cc: Al Stone <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/Kconfig | 21 +++++++++++++++++++++
> arch/x86/platform/efi/efi.c | 4 ++++
> init/main.c | 3 ++-
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index f1dbb4ee19d7..278e5820e8dd 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1957,6 +1957,27 @@ config EFI_MIXED
>
> If unsure, say N.
>
> +config EFI_WARN_ON_ILLEGAL_ACCESSES

How about shortening the name to just 'EFI_WARN_ON_ILLEGAL_ACCESS'?

> + bool "Warn about illegal memory accesses by firmware"
> + depends on EFI

From a distribution p-o-v, I would suggest that we set this CONFIG
option only if we are in the EXPERT mode, as this need more thrashing
with the behaviour we see on old, buggy EFI firmwares available on
very old x86 machines. So something like:
bool "Warn about illegal memory accesses by firmware" if EXPERT

> + help
> + Enable this debug feature so that the kernel can detect illegal
> + memory accesses by firmware and issue a warning. Also,
> + 1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
> + the kernel fixes it up by mapping the requested region.
> + 2. If the illegally accessed region is any other region (Eg:
> + EFI_CONVENTIONAL_MEMORY or EFI_LOADER_<CODE/DATA>), then the
> + kernel freezes efi_rts_wq and schedules a new process. Also, it
> + disables EFI Runtime Services, so that it will never again call
> + buggy firmware.
> + 3. If the access is to any other efi region like above but if the
> + buggy efi runtime service is efi_reset_system(), then the
> + platform is rebooted through BIOS.
> + Please see the UEFI specification for details on the expectations
> + of memory usage.
> +
> + If unsure, say N.
> +
> config SECCOMP
> def_bool y
> prompt "Enable seccomp to safely compute untrusted bytecode"
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 7d18b7ed5d41..0ddb22a03d88 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md)
> /*
> * Map boot services regions as a workaround for buggy
> * firmware that accesses them even when they shouldn't.
> + * (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is disabled)
> *
> * See efi_{reserve,free}_boot_services().
> */
> + if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES))
> + return false;
> +
> if (md->type == EFI_BOOT_SERVICES_CODE ||
> md->type == EFI_BOOT_SERVICES_DATA)
> return true;
> diff --git a/init/main.c b/init/main.c
> index 3b4ada11ed52..dce0520861a1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
> arch_post_acpi_subsys_init();
> sfi_init_late();
>
> - if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> + if (efi_enabled(EFI_RUNTIME_SERVICES) &&
> + !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {

Since this is an arch agnostic code leg, do we really want to
introduce a generic check for CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
here, without checking for whether we are actually running this on a
x86 hardware, or alternatively we can consider moving
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES out of 'arch/x86/Kconfig' to
'arch/Kconfig' so that later it can be used on other archs like ARM64
as well.

I think the later would be more useful..

Thanks,
Bhupesh

> efi_free_boot_services();
> }
>
> --
> 2.7.4
>

2018-08-10 18:06:11

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

> > +config EFI_WARN_ON_ILLEGAL_ACCESSES
>
> How about shortening the name to just 'EFI_WARN_ON_ILLEGAL_ACCESS'?
>

Makes sense.. I will shorten it.

> > + bool "Warn about illegal memory accesses by firmware"
> > + depends on EFI
>
> From a distribution p-o-v, I would suggest that we set this CONFIG option only if
> we are in the EXPERT mode, as this need more thrashing with the behaviour we
> see on old, buggy EFI firmwares available on very old x86 machines. So
> something like:
> bool "Warn about illegal memory accesses by firmware" if EXPERT
>

Agreed. Although the feature is intended to warn about all these buggy firmware's
instead of silently working around (as we are doing presently), it needs more testing.
Hence, as you said, I think it's safe to have it as an EXPERT config option.

> > + help
> > + Enable this debug feature so that the kernel can detect illegal
> > + memory accesses by firmware and issue a warning. Also,
> > + 1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
> > + the kernel fixes it up by mapping the requested region.

[....]

> > diff --git a/init/main.c b/init/main.c index
> > 3b4ada11ed52..dce0520861a1 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
> > arch_post_acpi_subsys_init();
> > sfi_init_late();
> >
> > - if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> > + if (efi_enabled(EFI_RUNTIME_SERVICES) &&
> > + !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {
>
> Since this is an arch agnostic code leg, do we really want to introduce a generic
> check for CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
> here, without checking for whether we are actually running this on a
> x86 hardware, or alternatively we can consider moving
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES out of 'arch/x86/Kconfig' to
> 'arch/Kconfig' so that later it can be used on other archs like ARM64 as well.
>
> I think the later would be more useful..

Thanks for bringing this up. I see your point. I will refrain from polluting architecture
agnostic code. As we don't need efi_free_boot_services() at all, if EFI_WARN_ON_ILLEGAL_ACCESSES
is enabled, probably making changes to include/linux/efi.h would be better, I guess..

Moving this to arch/Kconfig could be too futuristic.. I guess.. because I am not sure
if this problem exists on ARM machines, if so, probably Ard could suggest something here.

Regards,
Sai

2018-08-10 18:32:29

by Bhupesh Sharma

[permalink] [raw]
Subject: Re: [PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

On Fri, Aug 10, 2018 at 11:04 PM, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > +config EFI_WARN_ON_ILLEGAL_ACCESSES
>>
>> How about shortening the name to just 'EFI_WARN_ON_ILLEGAL_ACCESS'?
>>
>
> Makes sense.. I will shorten it.
>
>> > + bool "Warn about illegal memory accesses by firmware"
>> > + depends on EFI
>>
>> From a distribution p-o-v, I would suggest that we set this CONFIG option only if
>> we are in the EXPERT mode, as this need more thrashing with the behaviour we
>> see on old, buggy EFI firmwares available on very old x86 machines. So
>> something like:
>> bool "Warn about illegal memory accesses by firmware" if EXPERT
>>
>
> Agreed. Although the feature is intended to warn about all these buggy firmware's
> instead of silently working around (as we are doing presently), it needs more testing.
> Hence, as you said, I think it's safe to have it as an EXPERT config option.
>
>> > + help
>> > + Enable this debug feature so that the kernel can detect illegal
>> > + memory accesses by firmware and issue a warning. Also,
>> > + 1. If the illegally accessed region is EFI_BOOT_SERVICES_<CODE/DATA>,
>> > + the kernel fixes it up by mapping the requested region.
>
> [....]
>
>> > diff --git a/init/main.c b/init/main.c index
>> > 3b4ada11ed52..dce0520861a1 100644
>> > --- a/init/main.c
>> > +++ b/init/main.c
>> > @@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
>> > arch_post_acpi_subsys_init();
>> > sfi_init_late();
>> >
>> > - if (efi_enabled(EFI_RUNTIME_SERVICES)) {
>> > + if (efi_enabled(EFI_RUNTIME_SERVICES) &&
>> > + !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {
>>
>> Since this is an arch agnostic code leg, do we really want to introduce a generic
>> check for CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
>> here, without checking for whether we are actually running this on a
>> x86 hardware, or alternatively we can consider moving
>> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES out of 'arch/x86/Kconfig' to
>> 'arch/Kconfig' so that later it can be used on other archs like ARM64 as well.
>>
>> I think the later would be more useful..
>
> Thanks for bringing this up. I see your point. I will refrain from polluting architecture
> agnostic code. As we don't need efi_free_boot_services() at all, if EFI_WARN_ON_ILLEGAL_ACCESSES
> is enabled, probably making changes to include/linux/efi.h would be better, I guess..
>
> Moving this to arch/Kconfig could be too futuristic.. I guess.. because I am not sure
> if this problem exists on ARM machines, if so, probably Ard could suggest something here.

I think Ard can comment better but let me chime in with my limited
experience with Fedora arm64 implementations - we are already seeing
EFI firmware implementations which are broken/are orphaned (not well
supported anymore) on arm64 even though the hardware is still in-use.
With arm64 EFI firmware implementations still catching up with ARM
server SBBR specifications, we expect more x86 like broken EFI
implementations on arm64 as well. So, I don't see that as a distant
problem, in-fact this is something which has been on my Fedora to-do
list for arm64 for some time.

If there is a framework on x86 available, I can take a stab and
extrapolate it for arm64 as well.

Thanks,
Bhupesh