From: Sai Praneeth <[email protected]>
There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a debug config option which when enabled
recovers from page faults caused by buggy firmware.
Page faults triggered by firmware happen at ring 0 and if unhandled,
hangs the kernel. So, provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.
The efi page fault handler will check if the access is by
efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
schedules a new process.
This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.
Testing the patch set:
----------------------
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.
Changes from RFC to V1:
-----------------------
1. Drop "long jump" technique of dealing with illegal access and instead
use scheduling away from efi_rts_wq.
Changes from V1 to V2:
----------------------
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
was part of init/main.c file. As it is an architecture agnostic code,
moved the change to arch/x86/platform/efi/quirks.c file.
Changes from V2 to V3:
----------------------
1. Drop treating illegal access to EFI_BOOT_SERVICES_<CODE/DATA> regions
separatley from illegal accesses to other regions like
EFI_CONVENTIONAL_MEMORY or EFI_LOADER_<CODE/DATA>.
In previous versions, illegal access to EFI_BOOT_SERVICES_<CODE/DATA>
regions were handled by mapping requested region to efi_pgd but from
V3 they are handled similar to illegal access to other regions i.e by
freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.
Changes from V3 to V4:
----------------------
1. Drop saving original memory map passed by kernel. It also means less
checks in efi page fault handler.
2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
functionality more appropriatley.
Note:
-----
Patch set based on "next" branch in efi tree.
[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt
Sai Praneeth (3):
efi: Make efi_rts_work accessible to efi page fault handler
x86/efi: Add efi page fault handler to recover from page faults caused
by the firmware
x86/efi: Introduce EFI_PAGE_FAULT_HANDLER
arch/x86/Kconfig | 18 +++++++++
arch/x86/include/asm/efi.h | 9 +++++
arch/x86/mm/fault.c | 9 +++++
arch/x86/platform/efi/quirks.c | 70 +++++++++++++++++++++++++++++++++
drivers/firmware/efi/runtime-wrappers.c | 60 ++++++++--------------------
include/linux/efi.h | 37 +++++++++++++++++
6 files changed, 159 insertions(+), 44 deletions(-)
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
--
2.7.4
From: Sai Praneeth <[email protected]>
As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory regions except
EFI_RUNTIME_SERVICES_<CODE/DATA> regions is considered illegal. A buggy
firmware could trigger these illegal accesses when an efi runtime
service is invoked and if this happens when the kernel is up and
running, the kernel hangs.
Kernel hangs because the memory region requested by the firmware isn't
mapped in efi_pgd, which causes a page fault in ring 0 and the kernel
fails to handle it, leading to die(). To save kernel from hanging, add
an efi specific page fault handler which recovers from such faults by
1. If the efi runtime service is efi_reset_system(), reboot the machine
through BIOS.
2. If the efi runtime service is _not_ efi_reset_system(), then, freeze
efi_rts_wq and schedule a new process.
The efi page fault handler offers us two advantages:
1. Recovers from potential hangs that could be caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 9 +++++
arch/x86/mm/fault.c | 9 +++++
arch/x86/platform/efi/quirks.c | 70 +++++++++++++++++++++++++++++++++
drivers/firmware/efi/runtime-wrappers.c | 7 ++++
include/linux/efi.h | 1 +
5 files changed, 96 insertions(+)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..afb1c80182f2 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,15 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);
+#ifdef CONFIG_EFI_PAGE_FAULT_HANDLER
+extern int efi_recover_from_page_fault(unsigned long phys_addr);
+#else
+static inline int efi_recover_from_page_fault(unsigned long phys_addr)
+{
+ return 0;
+}
+#endif /* CONFIG_EFI_PAGE_FAULT_HANDLER */
+
struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..cc2a2e3a4095 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
#include <linux/prefetch.h> /* prefetchw */
#include <linux/context_tracking.h> /* exception_enter(), ... */
#include <linux/uaccess.h> /* faulthandler_disabled() */
+#include <linux/efi.h> /* efi_recover_from_page_fault()*/
#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/traps.h> /* dotraplinkage, ... */
@@ -24,6 +25,7 @@
#include <asm/vsyscall.h> /* emulate_vsyscall */
#include <asm/vm86.h> /* struct vm86 */
#include <asm/mmu_context.h> /* vma_pkey() */
+#include <asm/efi.h> /* efi_recover_from_page_fault()*/
#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
/*
+ * Buggy firmware could access regions which might page fault, try to
+ * recover from such faults.
+ */
+ if (efi_recover_from_page_fault(address))
+ return;
+
+ /*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice:
*/
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..853742aba209 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
#include <asm/efi.h>
#include <asm/uv/uv.h>
#include <asm/cpu_device_id.h>
+#include <asm/reboot.h>
#define EFI_MIN_RESERVE 5120
@@ -654,3 +655,72 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff,
}
#endif
+
+#ifdef CONFIG_EFI_PAGE_FAULT_HANDLER
+
+/*
+ * If any access by any efi runtime service causes a page fault, then,
+ * 1. If it's efi_reset_system(), reboot through BIOS.
+ * 2. If any other efi runtime service, then
+ * a. Freeze efi_rts_wq.
+ * b. Return error status to the efi caller process.
+ * c. Disable EFI Runtime Services forever and
+ * d. Schedule another process by explicitly calling scheduler.
+ *
+ * @return: Returns 0, if the page fault is not handled. This function
+ * will never return if the page fault is handled successfully.
+ */
+int efi_recover_from_page_fault(unsigned long phys_addr)
+{
+ /* Recover from page faults caused *only* by the firmware */
+ if (current->active_mm != &efi_mm)
+ return 0;
+
+ /*
+ * Address range 0x0000 - 0x0fff is always mapped in the efi_pgd, so
+ * page faulting on these addresses isn't expected.
+ */
+ if (phys_addr >= 0x0000 && phys_addr <= 0x0fff)
+ return 0;
+
+ /*
+ * Print stack trace as it might be useful to know which EFI Runtime
+ * Service is buggy.
+ */
+ WARN(1, FW_BUG "Page fault caused by firmware at PA: 0x%lx\n",
+ phys_addr);
+
+ /*
+ * Buggy efi_reset_system() is handled differently from other EFI
+ * Runtime Services as it doesn't use efi_rts_wq. Although,
+ * native_machine_emergency_restart() says that machine_real_restart()
+ * could fail, it's better not to compilcate this fault handler
+ * because this case occurs *very* rarely and hence could be improved
+ * on a need by basis.
+ */
+ if (efi_rts_work.efi_rts_id == RESET_SYSTEM) {
+ pr_info("efi_reset_system() buggy! Reboot through BIOS\n");
+ machine_real_restart(MRR_BIOS);
+ return 0;
+ }
+
+ /* Firmware has caused page fault, hence, freeze efi_rts_wq. */
+ set_current_state(TASK_UNINTERRUPTIBLE);
+
+ /*
+ * Before calling EFI Runtime Service, the kernel has switched the
+ * calling process to efi_mm. Hence, switch back to task_mm.
+ */
+ arch_efi_call_virt_teardown();
+
+ /* Signal error status to the efi caller process */
+ efi_rts_work.status = EFI_ABORTED;
+ complete(&efi_rts_work.efi_rts_comp);
+
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n");
+ schedule();
+
+ return 0;
+}
+#endif /* CONFIG_EFI_PAGE_FAULT_HANDLER */
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index b18b2d864c2c..de061bcad098 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -61,6 +61,11 @@ struct efi_runtime_work efi_rts_work;
({ \
efi_rts_work.status = EFI_ABORTED; \
\
+ if (!efi_enabled(EFI_RUNTIME_SERVICES)) { \
+ pr_info("Aborting! EFI Runtime Services disabled\n"); \
+ goto exit; \
+ } \
+ \
init_completion(&efi_rts_work.efi_rts_comp); \
INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
efi_rts_work.arg1 = _arg1; \
@@ -79,6 +84,7 @@ struct efi_runtime_work efi_rts_work;
else \
pr_err("Failed to queue work to efi_rts_wq.\n"); \
\
+exit: \
efi_rts_work.status; \
})
@@ -393,6 +399,7 @@ static void virt_efi_reset_system(int reset_type,
"could not get exclusive access to the firmware\n");
return;
}
+ efi_rts_work.efi_rts_id = RESET_SYSTEM;
__efi_call_virt(reset_system, reset_type, status, data_size, data);
up(&efi_runtime_lock);
}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 855992b15269..80433b6bd2c5 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1670,6 +1670,7 @@ enum efi_rts_ids {
SET_VARIABLE,
QUERY_VARIABLE_INFO,
GET_NEXT_HIGH_MONO_COUNT,
+ RESET_SYSTEM,
UPDATE_CAPSULE,
QUERY_CAPSULE_CAPS,
};
--
2.7.4
From: Sai Praneeth <[email protected]>
There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
after the kernel has assumed control of the platform. This violates UEFI
specification.
If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to recover
from the page fault triggered by the firmware so that the machine
doesn't hang.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/Kconfig | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..cc840710ae3e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,24 @@ config EFI_MIXED
If unsure, say N.
+config EFI_PAGE_FAULT_HANDLER
+ bool "EFI page fault handler support" if EXPERT
+ depends on EFI
+ help
+ Enable this debug feature so that the kernel can recover from page
+ faults caused by buggy firmware. Also,
+ 1. If the page fault is caused by efi_reset_system(), then the
+ platform is rebooted through BIOS.
+ 2. If the page fault is caused by any other efi runtime service,
+ then the kernel freezes efi_rts_wq (work queue that runs efi
+ runtime services) and schedules a new process. Also, it disables
+ EFI Runtime Services, so that it will never again call buggy
+ firmware.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
--
2.7.4
From: Sai Praneeth <[email protected]>
After the kernel has booted, if any accesses by firmware causes a page
fault, the efi page fault handler would freeze efi_rts_wq and schedules
a new process. To do this, the efi page fault handler needs
efi_rts_work. Hence, make it accessible.
There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.
Suggested-by: Matt Fleming <[email protected]>
Based-on-code-from: Ricardo Neri <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Al Stone <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/runtime-wrappers.c | 53 ++++++---------------------------
include/linux/efi.h | 36 ++++++++++++++++++++++
2 files changed, 45 insertions(+), 44 deletions(-)
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
- GET_TIME,
- SET_TIME,
- GET_WAKEUP_TIME,
- SET_WAKEUP_TIME,
- GET_VARIABLE,
- GET_NEXT_VARIABLE,
- SET_VARIABLE,
- QUERY_VARIABLE_INFO,
- GET_NEXT_HIGH_MONO_COUNT,
- UPDATE_CAPSULE,
- QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work: Details of EFI Runtime Service work
- * @arg<1-5>: EFI Runtime Service function arguments
- * @status: Status of executing EFI Runtime Service
- * @efi_rts_id: EFI Runtime Service function identifier
- * @efi_rts_comp: Struct used for handling completions
- */
-struct efi_runtime_work {
- void *arg1;
- void *arg2;
- void *arg3;
- void *arg4;
- void *arg5;
- efi_status_t status;
- struct work_struct work;
- enum efi_rts_ids efi_rts_id;
- struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
/*
* efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
*/
#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
({ \
- struct efi_runtime_work efi_rts_work; \
efi_rts_work.status = EFI_ABORTED; \
\
init_completion(&efi_rts_work.efi_rts_comp); \
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
*/
static void efi_call_rts(struct work_struct *work)
{
- struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
- efi_rts_work = container_of(work, struct efi_runtime_work, work);
- arg1 = efi_rts_work->arg1;
- arg2 = efi_rts_work->arg2;
- arg3 = efi_rts_work->arg3;
- arg4 = efi_rts_work->arg4;
- arg5 = efi_rts_work->arg5;
+ arg1 = efi_rts_work.arg1;
+ arg2 = efi_rts_work.arg2;
+ arg3 = efi_rts_work.arg3;
+ arg4 = efi_rts_work.arg4;
+ arg5 = efi_rts_work.arg5;
- switch (efi_rts_work->efi_rts_id) {
+ switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
(efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
*/
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
- efi_rts_work->status = status;
- complete(&efi_rts_work->efi_rts_comp);
+ efi_rts_work.status = status;
+ complete(&efi_rts_work.efi_rts_comp);
}
static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
extern int efi_tpm_eventlog_init(void);
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ QUERY_VARIABLE_INFO,
+ GET_NEXT_HIGH_MONO_COUNT,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ * @efi_rts_id: EFI Runtime Service function identifier
+ * @efi_rts_comp: Struct used for handling completions
+ */
+struct efi_runtime_work {
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+ enum efi_rts_ids efi_rts_id;
+ struct completion efi_rts_comp;
+};
+
/* Workqueue to queue EFI Runtime Services */
extern struct workqueue_struct *efi_rts_wq;
+extern struct efi_runtime_work efi_rts_work;
+
#endif /* _LINUX_EFI_H */
--
2.7.4
On 7 September 2018 at 01:27, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> There may exist some buggy UEFI firmware implementations that access efi
> memory regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even after
> the kernel has assumed control of the platform. This violates UEFI
> specification. Hence, provide a debug config option which when enabled
> recovers from page faults caused by buggy firmware.
>
> Page faults triggered by firmware happen at ring 0 and if unhandled,
> hangs the kernel. So, provide an efi specific page fault handler to:
> 1. Avoid panics/hangs caused by buggy firmware.
> 2. Shout loud that the firmware is buggy and hence is not a kernel bug.
>
> The efi page fault handler will check if the access is by
> efi_reset_system().
> 1. If so, then the efi page fault handler will reboot the machine
> through BIOS and not through efi_reset_system().
> 2. If not, then the efi page fault handler will freeze efi_rts_wq and
> schedules a new process.
>
Thanks Sai! I am pretty happy how this patch set turned out. It still
requires the blessing of the x86 maintainers, of course, but from my
pov, this is good to go (but I will fold patch #3 into #2)
Thomas, Ingo, Peter, Andy, Boris: any remaining concerns?
> This issue was reported by Al Stone when he saw that reboot via EFI hangs
> the machine. Upon debugging, I found that it's efi_reset_system() that's
> touching memory regions which it shouldn't. To reproduce the same
> behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
> with efi_reset_system(), I have also modified get_next_high_mono_count()
> and set_virtual_address_map(). They illegally access both boot time and
> other efi regions.
>
> Testing the patch set:
> ----------------------
> 1. Download buggy firmware from here [1].
> 2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
> Add reboot=efi to the kernel command line arguments and after the kernel
> is up and running, type "reboot". The kernel should hang while rebooting.
> 3. With the same setup, boot kernel after applying patches and the
> reboot should work fine. Also please notice warning/error messages
> printed by kernel.
>
> Changes from RFC to V1:
> -----------------------
> 1. Drop "long jump" technique of dealing with illegal access and instead
> use scheduling away from efi_rts_wq.
>
> Changes from V1 to V2:
> ----------------------
> 1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
> 2. Made the config option available only to expert users.
> 3. efi_free_boot_services() should be called only when
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
> was part of init/main.c file. As it is an architecture agnostic code,
> moved the change to arch/x86/platform/efi/quirks.c file.
>
> Changes from V2 to V3:
> ----------------------
> 1. Drop treating illegal access to EFI_BOOT_SERVICES_<CODE/DATA> regions
> separatley from illegal accesses to other regions like
> EFI_CONVENTIONAL_MEMORY or EFI_LOADER_<CODE/DATA>.
> In previous versions, illegal access to EFI_BOOT_SERVICES_<CODE/DATA>
> regions were handled by mapping requested region to efi_pgd but from
> V3 they are handled similar to illegal access to other regions i.e by
> freezing efi_rts_wq and scheduling new process.
> 2. Change __efi_init_fixup attribute to __efi_init.
>
> Changes from V3 to V4:
> ----------------------
> 1. Drop saving original memory map passed by kernel. It also means less
> checks in efi page fault handler.
> 2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
> functionality more appropriatley.
>
> Note:
> -----
> Patch set based on "next" branch in efi tree.
>
> [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt
>
> Sai Praneeth (3):
> efi: Make efi_rts_work accessible to efi page fault handler
> x86/efi: Add efi page fault handler to recover from page faults caused
> by the firmware
> x86/efi: Introduce EFI_PAGE_FAULT_HANDLER
>
> arch/x86/Kconfig | 18 +++++++++
> arch/x86/include/asm/efi.h | 9 +++++
> arch/x86/mm/fault.c | 9 +++++
> arch/x86/platform/efi/quirks.c | 70 +++++++++++++++++++++++++++++++++
> drivers/firmware/efi/runtime-wrappers.c | 60 ++++++++--------------------
> include/linux/efi.h | 37 +++++++++++++++++
> 6 files changed, 159 insertions(+), 44 deletions(-)
>
> Suggested-by: Matt Fleming <[email protected]>
> Based-on-code-from: Ricardo Neri <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Cc: Al Stone <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
>
> --
> 2.7.4
>
On Thu, Sep 06, 2018 at 04:27:47PM -0700, Sai Praneeth Prakhya wrote:
> @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> return;
>
> /*
> + * Buggy firmware could access regions which might page fault, try to
> + * recover from such faults.
> + */
> + if (efi_recover_from_page_fault(address))
> + return;
> +
> + /*
> * Oops. The kernel tried to access some bad page. We'll have to
> * terminate things with extreme prejudice:
> */
> +int efi_recover_from_page_fault(unsigned long phys_addr)
> +{
> + /* Recover from page faults caused *only* by the firmware */
> + if (current->active_mm != &efi_mm)
> + return 0;
> +
> + /*
> + * Address range 0x0000 - 0x0fff is always mapped in the efi_pgd, so
> + * page faulting on these addresses isn't expected.
> + */
> + if (phys_addr >= 0x0000 && phys_addr <= 0x0fff)
> + return 0;
> +
> + /*
> + * Print stack trace as it might be useful to know which EFI Runtime
> + * Service is buggy.
> + */
> + WARN(1, FW_BUG "Page fault caused by firmware at PA: 0x%lx\n",
> + phys_addr);
> +
> + /*
> + * Buggy efi_reset_system() is handled differently from other EFI
> + * Runtime Services as it doesn't use efi_rts_wq. Although,
> + * native_machine_emergency_restart() says that machine_real_restart()
> + * could fail, it's better not to compilcate this fault handler
> + * because this case occurs *very* rarely and hence could be improved
> + * on a need by basis.
> + */
> + if (efi_rts_work.efi_rts_id == RESET_SYSTEM) {
> + pr_info("efi_reset_system() buggy! Reboot through BIOS\n");
> + machine_real_restart(MRR_BIOS);
> + return 0;
> + }
> +
> + /* Firmware has caused page fault, hence, freeze efi_rts_wq. */
> + set_current_state(TASK_UNINTERRUPTIBLE);
This doesn't freeze it, as such, it just sets the state.
> +
> + /*
> + * Before calling EFI Runtime Service, the kernel has switched the
> + * calling process to efi_mm. Hence, switch back to task_mm.
> + */
> + arch_efi_call_virt_teardown();
> +
> + /* Signal error status to the efi caller process */
> + efi_rts_work.status = EFI_ABORTED;
> + complete(&efi_rts_work.efi_rts_comp);
> +
> + clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> + pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n");
> + schedule();
So what happens when we get a spurious wakeup and return from this?
Quite possibly you want something like:
for (;;) {
set_current_state(TASK_IDLE);
schedule();
}
here. The TASK_UNINTERRUPTIBLE thing will cause the load-avg to spike;
is that what you want?
> +
> + return 0;
> +}
On Thu, Sep 06, 2018 at 04:27:48PM -0700, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth <[email protected]>
>
> There may exist some buggy UEFI firmware implementations that might
> access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
> after the kernel has assumed control of the platform. This violates UEFI
> specification.
>
> If selected, this debug option will print a warning message if the UEFI
> firmware tries to access any memory region which it shouldn't. Along
> with the warning, the efi page fault handler will also try to recover
> from the page fault triggered by the firmware so that the machine
> doesn't hang.
Why make this optional?
> > There may exist some buggy UEFI firmware implementations that might
> > access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
> > after the kernel has assumed control of the platform. This violates
> > UEFI specification.
> >
> > If selected, this debug option will print a warning message if the
> > UEFI firmware tries to access any memory region which it shouldn't.
> > Along with the warning, the efi page fault handler will also try to
> > recover from the page fault triggered by the firmware so that the
> > machine doesn't hang.
>
> Why make this optional?
I made it as a config option in RFC because the page fault handler was
complicated and touching many parts (it had lots of code change and I didn't want
to break any existing functionality). Now that it's simple, I don't think we need
the config option.
Without efi page fault handler, any page fault caused by firmware should panic
kernel but with this patch I think we are just improving existing condition (ideally).
So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as default (i.e. without
config).
Regards,
Sai
> > + if (efi_rts_work.efi_rts_id == RESET_SYSTEM) {
> > + pr_info("efi_reset_system() buggy! Reboot through BIOS\n");
> > + machine_real_restart(MRR_BIOS);
> > + return 0;
> > + }
> > +
> > + /* Firmware has caused page fault, hence, freeze efi_rts_wq. */
> > + set_current_state(TASK_UNINTERRUPTIBLE);
>
> This doesn't freeze it, as such, it just sets the state.
True! Thanks for pointing it out. I will update the comment.
> > +
> > + /*
> > + * Before calling EFI Runtime Service, the kernel has switched the
> > + * calling process to efi_mm. Hence, switch back to task_mm.
> > + */
> > + arch_efi_call_virt_teardown();
> > +
> > + /* Signal error status to the efi caller process */
> > + efi_rts_work.status = EFI_ABORTED;
> > + complete(&efi_rts_work.efi_rts_comp);
> > +
> > + clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> > + pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n");
>
> > + schedule();
>
> So what happens when we get a spurious wakeup and return from this?
>
> Quite possibly you want something like:
>
> for (;;) {
> set_current_state(TASK_IDLE);
> schedule();
> }
>
> here. The TASK_UNINTERRUPTIBLE thing will cause the load-avg to spike; is that
> what you want?
Yes, makes sense. TASK_IDLE seems more appropriate. I will change it.
Regards,
Sai
> > The efi page fault handler will check if the access is by
> > efi_reset_system().
> > 1. If so, then the efi page fault handler will reboot the machine
> > through BIOS and not through efi_reset_system().
> > 2. If not, then the efi page fault handler will freeze efi_rts_wq and
> > schedules a new process.
> >
>
> Thanks Sai! I am pretty happy how this patch set turned out. It still requires the
> blessing of the x86 maintainers, of course, but from my pov, this is good to go
> (but I will fold patch #3 into #2)
Hopefully, patch #3 goes away too.. in V5.
Regards,
Sai
On 09/07/2018 11:53 PM, Prakhya, Sai Praneeth wrote:
> >> There may exist some buggy UEFI firmware implementations that might
> >> access efi regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even
> >> after the kernel has assumed control of the platform. This violates
> >> UEFI specification.
> >>
> >> If selected, this debug option will print a warning message if the
> >> UEFI firmware tries to access any memory region which it shouldn't.
> >> Along with the warning, the efi page fault handler will also try to
> >> recover from the page fault triggered by the firmware so that the
> >> machine doesn't hang.
> >
> > Why make this optional?
>
> I made it as a config option in RFC because the page fault handler was
> complicated and touching many parts (it had lots of code change and I didn't want
> to break any existing functionality). Now that it's simple, I don't think we need
> the config option.
>
> Without efi page fault handler, any page fault caused by firmware should panic
> kernel but with this patch I think we are just improving existing condition (ideally).
>
> So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as default (i.e. without
> config).
>
> Regards,
> Sai
>
Also, some distributions already have specific ways to handle buggy firmwares which can be at times dependent on the underlying hardware and the firmware versions.
I would suggest that we enable this under a CONFIG for the first round and once it is tested with wider variety of x86 machines which have buggy or orphaned firmware and linux (and reboot) works fine with them, we can drop the CONFIG option in future and enable this by default.
Regards,
Bhupesh
On Fri, Sep 7, 2018 at 4:57 AM, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> There may exist some buggy UEFI firmware implementations that access efi
> memory regions other than EFI_RUNTIME_SERVICES_<CODE/DATA> even after
> the kernel has assumed control of the platform. This violates UEFI
> specification. Hence, provide a debug config option which when enabled
> recovers from page faults caused by buggy firmware.
>
> Page faults triggered by firmware happen at ring 0 and if unhandled,
> hangs the kernel. So, provide an efi specific page fault handler to:
> 1. Avoid panics/hangs caused by buggy firmware.
> 2. Shout loud that the firmware is buggy and hence is not a kernel bug.
>
> The efi page fault handler will check if the access is by
> efi_reset_system().
> 1. If so, then the efi page fault handler will reboot the machine
> through BIOS and not through efi_reset_system().
> 2. If not, then the efi page fault handler will freeze efi_rts_wq and
> schedules a new process.
>
> This issue was reported by Al Stone when he saw that reboot via EFI hangs
> the machine. Upon debugging, I found that it's efi_reset_system() that's
> touching memory regions which it shouldn't. To reproduce the same
> behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
> with efi_reset_system(), I have also modified get_next_high_mono_count()
> and set_virtual_address_map(). They illegally access both boot time and
> other efi regions.
>
> Testing the patch set:
> ----------------------
> 1. Download buggy firmware from here [1].
> 2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
> Add reboot=efi to the kernel command line arguments and after the kernel
> is up and running, type "reboot". The kernel should hang while rebooting.
> 3. With the same setup, boot kernel after applying patches and the
> reboot should work fine. Also please notice warning/error messages
> printed by kernel.
>
> Changes from RFC to V1:
> -----------------------
> 1. Drop "long jump" technique of dealing with illegal access and instead
> use scheduling away from efi_rts_wq.
>
> Changes from V1 to V2:
> ----------------------
> 1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
> 2. Made the config option available only to expert users.
> 3. efi_free_boot_services() should be called only when
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
> was part of init/main.c file. As it is an architecture agnostic code,
> moved the change to arch/x86/platform/efi/quirks.c file.
>
> Changes from V2 to V3:
> ----------------------
> 1. Drop treating illegal access to EFI_BOOT_SERVICES_<CODE/DATA> regions
> separatley from illegal accesses to other regions like
> EFI_CONVENTIONAL_MEMORY or EFI_LOADER_<CODE/DATA>.
> In previous versions, illegal access to EFI_BOOT_SERVICES_<CODE/DATA>
> regions were handled by mapping requested region to efi_pgd but from
> V3 they are handled similar to illegal access to other regions i.e by
> freezing efi_rts_wq and scheduling new process.
> 2. Change __efi_init_fixup attribute to __efi_init.
>
> Changes from V3 to V4:
> ----------------------
> 1. Drop saving original memory map passed by kernel. It also means less
> checks in efi page fault handler.
> 2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
> functionality more appropriatley.
>
> Note:
> -----
> Patch set based on "next" branch in efi tree.
>
> [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt
>
> Sai Praneeth (3):
> efi: Make efi_rts_work accessible to efi page fault handler
> x86/efi: Add efi page fault handler to recover from page faults caused
> by the firmware
> x86/efi: Introduce EFI_PAGE_FAULT_HANDLER
>
> arch/x86/Kconfig | 18 +++++++++
> arch/x86/include/asm/efi.h | 9 +++++
> arch/x86/mm/fault.c | 9 +++++
> arch/x86/platform/efi/quirks.c | 70 +++++++++++++++++++++++++++++++++
> drivers/firmware/efi/runtime-wrappers.c | 60 ++++++++--------------------
> include/linux/efi.h | 37 +++++++++++++++++
> 6 files changed, 159 insertions(+), 44 deletions(-)
>
> Suggested-by: Matt Fleming <[email protected]>
> Based-on-code-from: Ricardo Neri <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Cc: Al Stone <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
>
> --
> 2.7.4
>
Thanks Sai for this work. I think this a step in the right direction.
I tested this on qemu x86_64 with OVMF firmware modified to access
some random address in the EFI_Reserved_Region. I was able to reboot
the qemu instance successfully with the patches (see logs below) while
without the patchset, reboot earlier used to get stuck.
So, feel free to add:
Tested-by: Bhupesh Sharma <[email protected]>
Qemu Console Logs:
---------------------------
# reboot
<snip..>
[ 11.400004] ------------[ cut here ]------------
[ 11.400137] [Firmware Bug]: Page fault caused by firmware at PA: 0x7e924100
[ 11.400484] WARNING: CPU: 0 PID: 1111 at
arch/x86/platform/efi/quirks.c:691
efi_recover_from_page_fault+0x3b/0xf0
[ 11.400751] Modules linked in:
[ 11.400992] CPU: 0 PID: 1111 Comm: init Not tainted 4.18.0-rc5+ #1
[ 11.401146] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 0.0.0 02/06/2015
[ 11.401397] RIP: 0010:efi_recover_from_page_fault+0x3b/0xf0
[ 11.401547] Code: e0 03 00 00 e0 6e 8d 91 0f 85 9e 00 00 00 48 81
ff ff 0f 00 00 0f 86 91 00 00 00 48 89 fe 48 c7 c7 b8 e6 5d 91 e8 65
41 00 00 <0f> 0b 83 3d dc 19 8a 01 09 0f 84 89 00 00 00 48 c7 04 24 02
00 00
[ 11.402185] RSP: 0018:ffffb91080d6ba70 EFLAGS: 00000086
[ 11.402330] RAX: 0000000000000000 RBX: ffff98b53e34c980 RCX: ffffffff91845d38
[ 11.402502] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff91e8986c
[ 11.402706] RBP: ffffb91080d6bb58 R08: 7269662079622064 R09: 00000000000001fe
[ 11.402881] R10: 0000000000000000 R11: 3030313432396537 R12: ffff98b53e34c980
[ 11.403051] R13: 0000000000000002 R14: 000000000000000b R15: 0000000000000001
[ 11.403259] FS: 00007f7d510fe700(0000) GS:ffff98b53f600000(0000)
knlGS:0000000000000000
[ 11.403452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.403602] CR2: 000000007e924100 CR3: 000000007ec9c000 CR4: 00000000000006f0
[ 11.403823] Call Trace:
[ 11.404368] no_context+0x130/0x3a0
[ 11.404509] __do_page_fault+0x39a/0x4b0
[ 11.404623] page_fault+0x1e/0x30
[ 11.404811] RIP: 0010:0xfffffffeffbba977
[ 11.404908] Code: 89 d5 56 53 4d 89 c4 89 cb 48 83 ec 48 e8 cb 05
00 00 84 c0 41 88 c6 74 11 48 8d 15 3e 15 00 00 b9 00 00 00 80 e8 f8
07 00 00 <48> c7 04 25 00 41 92 7e 0a 00 00 00 48 83 3d c5 29 00 00 00
75 30
[ 11.405544] RSP: 0018:ffffb91080d6bc00 EFLAGS: 00000082
[ 11.405683] RAX: 0000000000000041 RBX: 0000000000000000 RCX: ffffb91080d6bae0
[ 11.405849] RDX: 00000000000003f8 RSI: 0000000000000000 RDI: fffffffeffbba93f
[ 11.406016] RBP: 0000000000000000 R08: 0000000000000041 R09: 0000000000000041
[ 11.406184] R10: 00000000000003fd R11: 00000000000003f8 R12: 0000000000000000
[ 11.406369] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 11.406593] ? serial8250_console_putchar+0x11/0x20
[ 11.406725] ? efi_call+0x58/0x90
[ 11.406815] ? msg_print_text+0x9c/0x100
[ 11.406927] ? virt_efi_reset_system+0x81/0x100
[ 11.407042] ? efi_reboot+0x85/0xe0
[ 11.407131] ? native_machine_emergency_restart+0x17f/0x260
[ 11.407267] ? clear_local_APIC.part.13+0x1e3/0x220
[ 11.407394] ? __do_sys_reboot+0x1ee/0x210
[ 11.407501] ? __switch_to_asm+0x40/0x70
[ 11.407613] ? __switch_to_asm+0x34/0x70
[ 11.407716] ? __switch_to_asm+0x40/0x70
[ 11.407817] ? __switch_to_asm+0x34/0x70
[ 11.407916] ? __switch_to_asm+0x40/0x70
[ 11.408017] ? __switch_to_asm+0x34/0x70
[ 11.408117] ? __switch_to_asm+0x40/0x70
[ 11.408217] ? __switch_to_asm+0x34/0x70
[ 11.408317] ? __switch_to_asm+0x40/0x70
[ 11.408417] ? __switch_to_asm+0x34/0x70
[ 11.408515] ? __switch_to_asm+0x40/0x70
[ 11.408620] ? __switch_to_asm+0x34/0x70
[ 11.408718] ? __switch_to_asm+0x40/0x70
[ 11.408814] ? __switch_to_asm+0x34/0x70
[ 11.408909] ? __switch_to_asm+0x40/0x70
[ 11.409005] ? __switch_to_asm+0x34/0x70
[ 11.409113] ? __switch_to_asm+0x40/0x70
[ 11.409209] ? __switch_to_asm+0x34/0x70
[ 11.409303] ? __switch_to_asm+0x40/0x70
[ 11.409396] ? __switch_to_asm+0x34/0x70
[ 11.409491] ? __switch_to_asm+0x40/0x70
[ 11.409589] ? __switch_to_asm+0x34/0x70
[ 11.409685] ? __switch_to_asm+0x40/0x70
[ 11.409781] ? __switch_to_asm+0x34/0x70
[ 11.409879] ? __switch_to_asm+0x40/0x70
[ 11.409980] ? __switch_to_asm+0x34/0x70
[ 11.410079] ? __switch_to_asm+0x40/0x70
[ 11.410178] ? __switch_to_asm+0x34/0x70
[ 11.410281] ? do_syscall_64+0x39/0xe0
[ 11.410378] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 11.410554] ---[ end trace ad3d0a220a88a45b ]---
[ 11.410742] efi: efi_reset_system() buggy! Reboot through BIOS
<snip..>
Thanks,
Bhupesh
> > > Why make this optional?
> >
> > I made it as a config option in RFC because the page fault handler was
> > complicated and touching many parts (it had lots of code change and I
> > didn't want to break any existing functionality). Now that it's
> > simple, I don't think we need the config option.
> >
> > Without efi page fault handler, any page fault caused by firmware
> > should panic kernel but with this patch I think we are just improving existing
> condition (ideally).
> >
> > So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as
> > default (i.e. without config).
> >
> > Regards,
> > Sai
> >
> Also, some distributions already have specific ways to handle buggy firmwares
> which can be at times dependent on the underlying hardware and the firmware
> versions.
>
> I would suggest that we enable this under a CONFIG for the first round and once
> it is tested with wider variety of x86 machines which have buggy or orphaned
> firmware and linux (and reboot) works fine with them, we can drop the CONFIG
> option in future and enable this by default.
Sounds fair to me, but, I would like to wait for someone experienced to make the final call.
Regards,
Sai
> Thanks Sai for this work. I think this a step in the right direction.
> I tested this on qemu x86_64 with OVMF firmware modified to access some
> random address in the EFI_Reserved_Region. I was able to reboot the qemu
> instance successfully with the patches (see logs below) while without the
> patchset, reboot earlier used to get stuck.
>
> So, feel free to add:
> Tested-by: Bhupesh Sharma <[email protected]>
>
Thanks a lot Bhupesh, for trying the patches and as you said, the patches need a lot
more testing on real machines.
> Qemu Console Logs:
> ---------------------------
>
> # reboot
>
> <snip..>
>
> [ 11.400004] ------------[ cut here ]------------
> [ 11.400137] [Firmware Bug]: Page fault caused by firmware at PA: 0x7e924100
> [ 11.400484] WARNING: CPU: 0 PID: 1111 at
> arch/x86/platform/efi/quirks.c:691
> efi_recover_from_page_fault+0x3b/0xf0
> [ 11.400751] Modules linked in:
> [ 11.400992] CPU: 0 PID: 1111 Comm: init Not tainted 4.18.0-rc5+ #1
> [ 11.401146] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [ 11.401397] RIP: 0010:efi_recover_from_page_fault+0x3b/0xf0
[snipped stack trace]
> [ 11.410378] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 11.410554] ---[ end trace ad3d0a220a88a45b ]---
> [ 11.410742] efi: efi_reset_system() buggy! Reboot through BIOS
Thanks for the log, it looks good to me.
On Fri, 7 Sep 2018, Prakhya, Sai Praneeth wrote:
> > > So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as
> > > default (i.e. without config).
Yes, that's the right thing to do.
> > Also, some distributions already have specific ways to handle buggy
> > firmwares which can be at times dependent on the underlying hardware
> > and the firmware versions.
If the distro patched their kernel to deal with buggy firmware, then:
1) why did they not upstream it ?
2) why should we worry about that ?
> > I would suggest that we enable this under a CONFIG for the first round
> > and once it is tested with wider variety of x86 machines which have
> > buggy or orphaned firmware and linux (and reboot) works fine with them,
> > we can drop the CONFIG option in future and enable this by default.
Sure and then nobody enables it and the affected machines still crash or
hang on reboot. The whole thing is simple enough now to make it
unconditional.
> Sounds fair to me, but, I would like to wait for someone experienced to
> make the final call.
Please get rid of that config knob. Buggy firmware exists and we better
deal with it by default.
Thanks,
tglx
On Sat, Sep 8, 2018 at 12:52 AM, Thomas Gleixner <[email protected]> wrote:
> On Fri, 7 Sep 2018, Prakhya, Sai Praneeth wrote:
>> > > So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as
>> > > default (i.e. without config).
>
> Yes, that's the right thing to do.
>
>> > Also, some distributions already have specific ways to handle buggy
>> > firmwares which can be at times dependent on the underlying hardware
>> > and the firmware versions.
>
> If the distro patched their kernel to deal with buggy firmware, then:
>
> 1) why did they not upstream it ?
Because some of the kernel fixes are (for such cases), well to be
honest, ugly.. and probably not suitable for placement in quirks
files/common code present upstream as they introduce lots of #ifdef
jugglery.
Also the x86 machines with such buggy BIOS firmwares are too old (but
still used in some production environment) and the OS workarounds are
suggested by the vendors themselves and they historically had issues
getting the quirks upstream.
> 2) why should we worry about that ?
As this allows one to still promote upgrading such machines to
upstream kernel versions and keep the kernel running on them as close
as possible to mainline.
>> > I would suggest that we enable this under a CONFIG for the first round
>> > and once it is tested with wider variety of x86 machines which have
>> > buggy or orphaned firmware and linux (and reboot) works fine with them,
>> > we can drop the CONFIG option in future and enable this by default.
>
> Sure and then nobody enables it and the affected machines still crash or
> hang on reboot. The whole thing is simple enough now to make it
> unconditional.
Instead, why not the make the CONFIG option default to Y. At least it
gives us an opportunity to turn it off if needed for backported/distro
kernels on such broken platforms which might need more testing with
the EFI page fault approach.
That should serve both the purposes. Just my 2 cents.
Thanks,
Bhupesh
>> Sounds fair to me, but, I would like to wait for someone experienced to
>> make the final call.
>
> Please get rid of that config knob. Buggy firmware exists and we better
> deal with it by default.
>
> Thanks,
>
> tglx
>
On Sat, 8 Sep 2018, Bhupesh Sharma wrote:
> On Sat, Sep 8, 2018 at 12:52 AM, Thomas Gleixner <[email protected]> wrote:
> > If the distro patched their kernel to deal with buggy firmware, then:
> >
> > 1) why did they not upstream it ?
>
> Because some of the kernel fixes are (for such cases), well to be
> honest, ugly.. and probably not suitable for placement in quirks
> files/common code present upstream as they introduce lots of #ifdef
> jugglery.
>
> Also the x86 machines with such buggy BIOS firmwares are too old (but
> still used in some production environment) and the OS workarounds are
> suggested by the vendors themselves and they historically had issues
> getting the quirks upstream.
Right, because they did not even try. And the distros just integrated that
mess instead of telling them to fix the crappy firmware. And that can be
fixed. I know for sure because the RT qualification program at RH has made
vendors to fix their crap in order to get the rubber stamp.
> > 2) why should we worry about that ?
>
> As this allows one to still promote upgrading such machines to
> upstream kernel versions and keep the kernel running on them as close
> as possible to mainline.
If the firmware is buggy and trips faults in the EFI mess then mainline
will just crash and burn.
So how are you going to upgrade these machines to a mainline kernel with
that fault handler disabled? Not at all.
If you need to patch the kernel in order to make it work on mainline, then
still that EFI fault handler has ZERO impact. Because those patches need to
prevent the firmware from faulting in the first place. If they have magic
fault handlers for this crap themself, then the patches will conflict
anyway, config switch or not.
You have to come up with something more convincing.
Thanks,
tglx