From: Sai Praneeth <[email protected]>
Problem statement:
------------------
Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
perf code via an NMI) will have incorrect user space mappings[1]. This
could lead to otherwise unexpected access errors and, worse, unauthorized
access to firmware code and data.
Detailed discussion of problem statement:
-----------------------------------------
As this switch is not propagated to other kernel subsystems; they will
wrongly assume that swapper_pgd is still in use and it can lead to
following issues:
1. If kernel code tries to access user space addresses while in efi_pgd,
it could lead to unauthorized accesses to firmware code/data.
(e.g: <__>/copy_from_user_nmi()).
[This could also be disastrous if the frame pointer happens to point at
MMIO in the EFI runtime mappings] - Mark Rutland.
An example of a subsystem that could touch user space while in efi_pgd is
perf. Assume that we are in efi_pgd, a user could use perf to profile
some user data and depending on the address the user is trying to
profile, two things could happen.
1. If the mappings are absent, perf fails to profile.
2. If efi_pgd does have mappings for the requested address (these
mappings are erroneous), perf profiles firmware code/data. If the
address is MMIO'ed, perf could have potentially changed some device state.
The culprit in both the cases is, EFI subsystem swapping out pgd and not
perf. Because, EFI subsystem has broken the *general assumption* that
all other subsystems rely on - "user space might be valid and nobody has
switched %cr3".
Solutions:
----------
There are two ways to fix this issue:
1. Educate about pgd change to *all* the subsystems that could
potentially access user space while in efi_pgd.
On x86, AFAIK, it could happen only when some one touches user space
from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
showed perf and oprofile). On arm, it could happen from multiple
places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
please comment on this as I might be wrong); whereas x86 runs
efi_runtime_services() interrupts disabled.
I think, this solution isn't holistic because
a. Any other subsystem might well do the same, if not now, in future.
b. Also, this solution looks simpler on x86 but not true if it's the
same for arm (ARM folks, please comment on this as I might be wrong).
c. This solution looks like a work around rather than addressing the issue.
2. Running efi_runtime_services() in kthread context.
This makes sense because efi_pgd doesn't have user space and kthread
by definition means that user space is not valid. Any kernel code that
tries to touch user space while in kthread is buggy in itself. If so,
it should be an easy fix in the other subsystem. This also take us one
step closer to long awaiting proposal of Andy - Running EFI at CPL 3.
What does this patch set do?
----------------------------
Introduce efi_rts_wq (EFI runtime services work queue).
When a user process requests the kernel to execute any efi_runtime_service(),
kernel queues the work to efi_rts_wq, a kthread comes along, switches to
efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
patch set adds support to the EFI subsystem to handle all calls to
efi_runtime_services() using a work queue (which in turn uses kthread).
How running efi_runtime_services() in kthread fixes above discussed issues?
---------------------------------------------------------------------------
If we run efi_runtime_services() in kthread context and if perf
checks for it, we could get both the above scenarios correct by perf
aborting the profiling. Not only perf, but any subsystem that tries to
touch user space should first check for kthread context and if so,
should abort.
Q. If we still need check for kthread context in other subsystems that
access user space, what does this patch set fix?
A. This patch set makes sure that EFI subsystem is not at fault.
Without this patch set the blame is upon EFI subsystem, because it's the
one that changed pgd and hasn't communicated this change to everyone and
hence broke the general assumption. Running efi_runtime_services() in
kthread means explicitly communicating that user space is invalid, now
it's the responsibility of other subsystem to make sure that it's
running in right context.
Testing:
--------
Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
(qemu only). Will appreciate the effort if someone could test the
patches on real ARM/ARM64 machines.
LUV: https://01.org/linux-uefi-validation
Credits:
--------
Thanks to Ricardo, Dan, Miguel, Mark, Peter Z and Ard for reviews and
suggestions. Thanks to Boris and Andy for making me think through/help
on what I am addressing with this patch set.
Please feel free to pour in your comments and concerns.
Note:
-----
Patches are based on Linus's kernel v4.17-rc7
[1] Backup: Detailing efi_pgd:
------------------------------
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.
[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland
Changes from V4 to V5:
----------------------
1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
Non-blocking variants are supposed to not block and using workqueue
exactly does the opposite, hence refrain from using it.
2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
blocking variants means that we have to call efi_delete_dummy_variable()
after efi_rts_wq has been created.
3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
Any caller wishing to use set_variable() and query_variable_info() in
atomic context should use their non-blocking variants.
Changes from V3 to V4:
----------------------
1. As suggested by Peter, use completions instead of flush_work() as the
former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
wasn't able to find a better alternative to keep this change local to
arch/x86.
Changes from V2 to V3:
----------------------
1. Rewrite the cover letter to clearly state the problem. What we are
fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.
Changes from V1 to V2:
----------------------
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.
Sai Praneeth (3):
x86/efi: Make efi_delete_dummy_variable() use
set_variable_nonblocking() instead of set_variable()
efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
efi: Use efi_rts_wq to invoke EFI Runtime Services
arch/x86/platform/efi/quirks.c | 11 +-
drivers/firmware/efi/efi.c | 14 ++
drivers/firmware/efi/runtime-wrappers.c | 218 +++++++++++++++++++++++++++++---
include/linux/efi.h | 3 +
4 files changed, 224 insertions(+), 22 deletions(-)
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
--
2.7.4
From: Sai Praneeth <[email protected]>
Presently, efi_delete_dummy_variable() uses set_variable() which might
block and hence kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". So, make efi_delete_dummy_variable()
use set_variable_nonblocking(), which, as the name suggests doesn't
block.
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
---
arch/x86/platform/efi/quirks.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..6af39dc40325 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -105,12 +105,11 @@ early_param("efi_no_storage_paranoia", setup_storage_paranoia);
*/
void efi_delete_dummy_variable(void)
{
- efi.set_variable((efi_char16_t *)efi_dummy_name,
- &EFI_DUMMY_GUID,
- EFI_VARIABLE_NON_VOLATILE |
- EFI_VARIABLE_BOOTSERVICE_ACCESS |
- EFI_VARIABLE_RUNTIME_ACCESS,
- 0, NULL);
+ efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name,
+ &EFI_DUMMY_GUID,
+ EFI_VARIABLE_NON_VOLATILE |
+ EFI_VARIABLE_BOOTSERVICE_ACCESS |
+ EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL);
}
/*
--
2.7.4
From: Sai Praneeth <[email protected]>
When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.
Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes
The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.
Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
---
drivers/firmware/efi/efi.c | 14 ++++++
drivers/firmware/efi/runtime-wrappers.c | 83 +++++++++++++++++++++++++++++++++
include/linux/efi.h | 3 ++
3 files changed, 100 insertions(+)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1379a375dfa8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
};
+struct workqueue_struct *efi_rts_wq;
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -337,6 +339,18 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
+ /*
+ * Since we process only one efi_runtime_service() at a time, an
+ * ordered workqueue (which creates only one execution context)
+ * should suffice all our needs.
+ */
+ efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+ if (!efi_rts_wq) {
+ pr_err("Creating efi_rts_wq failed, EFI runtime services disabled.\n");
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ return 0;
+ }
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..cf3bae42a752 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
/*
* runtime-wrappers.c - Runtime Services function call wrappers
*
+ * Implementation summary:
+ * -----------------------
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
* Copyright (C) 2014 Linaro Ltd. <[email protected]>
*
* Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
#include <linux/mutex.h>
#include <linux/semaphore.h>
#include <linux/stringify.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+
#include <asm/efi.h>
/*
@@ -33,6 +45,77 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ QUERY_VARIABLE_INFO,
+ GET_NEXT_HIGH_MONO_COUNT,
+ RESET_SYSTEM,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ * @efi_rts_id: EFI Runtime Service function identifier
+ * @efi_rts_comp: Struct used for handling completions
+ */
+struct efi_runtime_work {
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+ enum efi_rts_ids efi_rts_id;
+ struct completion efi_rts_comp;
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts: efi_runtime_service() function identifier
+ * @rts_arg<1-5>: efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the caller
+ * thread waits for completion.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
+({ \
+ struct efi_runtime_work efi_rts_work; \
+ efi_rts_work.status = EFI_ABORTED; \
+ \
+ init_completion(&efi_rts_work.efi_rts_comp); \
+ INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
+ efi_rts_work.arg1 = _arg1; \
+ efi_rts_work.arg2 = _arg2; \
+ efi_rts_work.arg3 = _arg3; \
+ efi_rts_work.arg4 = _arg4; \
+ efi_rts_work.arg5 = _arg5; \
+ efi_rts_work.efi_rts_id = _rts; \
+ \
+ /* \
+ * queue_work() returns 0 if work was already on queue, \
+ * _ideally_ this should never happen. \
+ */ \
+ if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
+ wait_for_completion(&efi_rts_work.efi_rts_comp); \
+ else \
+ pr_err("Failed to queue work to efi_rts_wq.\n"); \
+ \
+ efi_rts_work.status; \
+})
+
void efi_call_virt_check_flags(unsigned long flags, const char *call)
{
unsigned long cur_flags, mismatch;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 3016d8c456bc..116689c56441 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1651,4 +1651,7 @@ struct linux_efi_tpm_eventlog {
extern int efi_tpm_eventlog_init(void);
+/* Workqueue to queue EFI Runtime Services */
+extern struct workqueue_struct *efi_rts_wq;
+
#endif /* _LINUX_EFI_H */
--
2.7.4
From: Sai Praneeth <[email protected]>
Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.
A solution for this issue is to use kthread to run
efi_runtime_service(). When a user process requests the kernel to
execute any efi_runtime_service(), kernel queues the work to efi_rts_wq,
a kthread comes along, switches to efi_pgd and executes
efi_runtime_service() in kthread context. Anything that tries to touch
user space addresses while in kthread is terminally broken.
Implementation summary:
-----------------------
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().
Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.
Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments
Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.
The non-blocking variants of set_variable() and query_variable_info()
should be used while in atomic context. Use of blocking variants like
set_variable() and query_variable_info() while in atomic will issue a
warning ("scheduling wile in atomic") and prints stack trace. Presently,
pstore uses non-blocking variants and hence works fine.
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
---
drivers/firmware/efi/runtime-wrappers.c | 135 ++++++++++++++++++++++++++++----
1 file changed, 119 insertions(+), 16 deletions(-)
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index cf3bae42a752..127d4de00403 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -173,13 +173,104 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
*/
static DEFINE_SEMAPHORE(efi_runtime_lock);
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+ struct efi_runtime_work *efi_rts_work;
+ void *arg1, *arg2, *arg3, *arg4, *arg5;
+ efi_status_t status = EFI_NOT_FOUND;
+
+ efi_rts_work = container_of(work, struct efi_runtime_work, work);
+ arg1 = efi_rts_work->arg1;
+ arg2 = efi_rts_work->arg2;
+ arg3 = efi_rts_work->arg3;
+ arg4 = efi_rts_work->arg4;
+ arg5 = efi_rts_work->arg5;
+
+ switch (efi_rts_work->efi_rts_id) {
+ case GET_TIME:
+ status = efi_call_virt(get_time, (efi_time_t *)arg1,
+ (efi_time_cap_t *)arg2);
+ break;
+ case SET_TIME:
+ status = efi_call_virt(set_time, (efi_time_t *)arg1);
+ break;
+ case GET_WAKEUP_TIME:
+ status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+ (efi_bool_t *)arg2, (efi_time_t *)arg3);
+ break;
+ case SET_WAKEUP_TIME:
+ status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+ (efi_time_t *)arg2);
+ break;
+ case GET_VARIABLE:
+ status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, (u32 *)arg3,
+ (unsigned long *)arg4, (void *)arg5);
+ break;
+ case GET_NEXT_VARIABLE:
+ status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+ (efi_char16_t *)arg2,
+ (efi_guid_t *)arg3);
+ break;
+ case SET_VARIABLE:
+ status = efi_call_virt(set_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, *(u32 *)arg3,
+ *(unsigned long *)arg4, (void *)arg5);
+ break;
+ case QUERY_VARIABLE_INFO:
+ status = efi_call_virt(query_variable_info, *(u32 *)arg1,
+ (u64 *)arg2, (u64 *)arg3, (u64 *)arg4);
+ break;
+ case GET_NEXT_HIGH_MONO_COUNT:
+ status = efi_call_virt(get_next_high_mono_count, (u32 *)arg1);
+ break;
+ case RESET_SYSTEM:
+ __efi_call_virt(reset_system, *(int *)arg1,
+ *(efi_status_t *)arg2,
+ *(unsigned long *)arg3,
+ (efi_char16_t *)arg4);
+ break;
+ case UPDATE_CAPSULE:
+ status = efi_call_virt(update_capsule,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2,
+ *(unsigned long *)arg3);
+ break;
+ case QUERY_CAPSULE_CAPS:
+ status = efi_call_virt(query_capsule_caps,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2, (u64 *)arg3,
+ (int *)arg4);
+ break;
+ default:
+ /*
+ * Ideally, we should never reach here because a caller of this
+ * function should have put the right efi_runtime_service()
+ * function identifier into efi_rts_work->efi_rts_id
+ */
+ pr_err("Requested executing invalid EFI Runtime Service.\n");
+ }
+ efi_rts_work->status = status;
+ complete(&efi_rts_work->efi_rts_comp);
+}
+
static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
{
efi_status_t status;
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_time, tm, tc);
+ status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -190,7 +281,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_time, tm);
+ status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -203,7 +294,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+ status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -214,7 +306,8 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_wakeup_time, enabled, tm);
+ status = efi_queue_work(SET_WAKEUP_TIME, &enabled, tm, NULL, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -229,8 +322,8 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_variable, name, vendor, attr, data_size,
- data);
+ status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size,
+ data);
up(&efi_runtime_lock);
return status;
}
@@ -243,7 +336,8 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_variable, name_size, name, vendor);
+ status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -258,8 +352,10 @@ static efi_status_t virt_efi_set_variable(efi_char16_t *name,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_variable, name, vendor, attr, data_size,
- data);
+
+ status = efi_queue_work(SET_VARIABLE, name, vendor, &attr, &data_size,
+ data);
+
up(&efi_runtime_lock);
return status;
}
@@ -276,6 +372,7 @@ virt_efi_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor,
status = efi_call_virt(set_variable, name, vendor, attr, data_size,
data);
+
up(&efi_runtime_lock);
return status;
}
@@ -293,8 +390,10 @@ static efi_status_t virt_efi_query_variable_info(u32 attr,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_variable_info, attr, storage_space,
- remaining_space, max_variable_size);
+
+ status = efi_queue_work(QUERY_VARIABLE_INFO, &attr, storage_space,
+ remaining_space, max_variable_size, NULL);
+
up(&efi_runtime_lock);
return status;
}
@@ -315,6 +414,7 @@ virt_efi_query_variable_info_nonblocking(u32 attr,
status = efi_call_virt(query_variable_info, attr, storage_space,
remaining_space, max_variable_size);
+
up(&efi_runtime_lock);
return status;
}
@@ -325,7 +425,8 @@ static efi_status_t virt_efi_get_next_high_mono_count(u32 *count)
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_high_mono_count, count);
+ status = efi_queue_work(GET_NEXT_HIGH_MONO_COUNT, count, NULL, NULL,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -340,7 +441,8 @@ static void virt_efi_reset_system(int reset_type,
"could not get exclusive access to the firmware\n");
return;
}
- __efi_call_virt(reset_system, reset_type, status, data_size, data);
+ efi_queue_work(RESET_SYSTEM, &reset_type, &status, &data_size, data,
+ NULL);
up(&efi_runtime_lock);
}
@@ -355,7 +457,8 @@ static efi_status_t virt_efi_update_capsule(efi_capsule_header_t **capsules,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(update_capsule, capsules, count, sg_list);
+ status = efi_queue_work(UPDATE_CAPSULE, capsules, &count, &sg_list,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -372,8 +475,8 @@ static efi_status_t virt_efi_query_capsule_caps(efi_capsule_header_t **capsules,
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_capsule_caps, capsules, count, max_size,
- reset_type);
+ status = efi_queue_work(QUERY_CAPSULE_CAPS, capsules, &count,
+ max_size, reset_type, NULL);
up(&efi_runtime_lock);
return status;
}
--
2.7.4
On 29 May 2018 at 04:21, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Problem statement:
> ------------------
> Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
> to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
> perf code via an NMI) will have incorrect user space mappings[1]. This
> could lead to otherwise unexpected access errors and, worse, unauthorized
> access to firmware code and data.
>
> Detailed discussion of problem statement:
> -----------------------------------------
> As this switch is not propagated to other kernel subsystems; they will
> wrongly assume that swapper_pgd is still in use and it can lead to
> following issues:
>
> 1. If kernel code tries to access user space addresses while in efi_pgd,
> it could lead to unauthorized accesses to firmware code/data.
> (e.g: <__>/copy_from_user_nmi()).
> [This could also be disastrous if the frame pointer happens to point at
> MMIO in the EFI runtime mappings] - Mark Rutland.
>
> An example of a subsystem that could touch user space while in efi_pgd is
> perf. Assume that we are in efi_pgd, a user could use perf to profile
> some user data and depending on the address the user is trying to
> profile, two things could happen.
> 1. If the mappings are absent, perf fails to profile.
> 2. If efi_pgd does have mappings for the requested address (these
> mappings are erroneous), perf profiles firmware code/data. If the
> address is MMIO'ed, perf could have potentially changed some device state.
>
> The culprit in both the cases is, EFI subsystem swapping out pgd and not
> perf. Because, EFI subsystem has broken the *general assumption* that
> all other subsystems rely on - "user space might be valid and nobody has
> switched %cr3".
>
> Solutions:
> ----------
> There are two ways to fix this issue:
> 1. Educate about pgd change to *all* the subsystems that could
> potentially access user space while in efi_pgd.
> On x86, AFAIK, it could happen only when some one touches user space
> from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
> showed perf and oprofile). On arm, it could happen from multiple
> places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
> please comment on this as I might be wrong); whereas x86 runs
> efi_runtime_services() interrupts disabled.
>
> I think, this solution isn't holistic because
> a. Any other subsystem might well do the same, if not now, in future.
> b. Also, this solution looks simpler on x86 but not true if it's the
> same for arm (ARM folks, please comment on this as I might be wrong).
> c. This solution looks like a work around rather than addressing the issue.
>
> 2. Running efi_runtime_services() in kthread context.
> This makes sense because efi_pgd doesn't have user space and kthread
> by definition means that user space is not valid. Any kernel code that
> tries to touch user space while in kthread is buggy in itself. If so,
> it should be an easy fix in the other subsystem. This also take us one
> step closer to long awaiting proposal of Andy - Running EFI at CPL 3.
>
> What does this patch set do?
> ----------------------------
> Introduce efi_rts_wq (EFI runtime services work queue).
> When a user process requests the kernel to execute any efi_runtime_service(),
> kernel queues the work to efi_rts_wq, a kthread comes along, switches to
> efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
> patch set adds support to the EFI subsystem to handle all calls to
> efi_runtime_services() using a work queue (which in turn uses kthread).
>
> How running efi_runtime_services() in kthread fixes above discussed issues?
> ---------------------------------------------------------------------------
> If we run efi_runtime_services() in kthread context and if perf
> checks for it, we could get both the above scenarios correct by perf
> aborting the profiling. Not only perf, but any subsystem that tries to
> touch user space should first check for kthread context and if so,
> should abort.
>
> Q. If we still need check for kthread context in other subsystems that
> access user space, what does this patch set fix?
> A. This patch set makes sure that EFI subsystem is not at fault.
> Without this patch set the blame is upon EFI subsystem, because it's the
> one that changed pgd and hasn't communicated this change to everyone and
> hence broke the general assumption. Running efi_runtime_services() in
> kthread means explicitly communicating that user space is invalid, now
> it's the responsibility of other subsystem to make sure that it's
> running in right context.
>
> Testing:
> --------
> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> (qemu only). Will appreciate the effort if someone could test the
> patches on real ARM/ARM64 machines.
> LUV: https://01.org/linux-uefi-validation
>
> Credits:
> --------
> Thanks to Ricardo, Dan, Miguel, Mark, Peter Z and Ard for reviews and
> suggestions. Thanks to Boris and Andy for making me think through/help
> on what I am addressing with this patch set.
>
> Please feel free to pour in your comments and concerns.
>
> Note:
> -----
> Patches are based on Linus's kernel v4.17-rc7
>
> [1] Backup: Detailing efi_pgd:
> ------------------------------
> efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
> Code/Data) regions. Due to the nature of these mappings, they fall
> in user space address ranges and they are not the same as swapper.
>
> [On arm64, the EFI mappings are in the VA range usually used for user
> space. The two halves of the address space are managed by separate
> tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
> user space or EFI runtime mappings in TTBR0.] - Mark Rutland
>
> Changes from V4 to V5:
> ----------------------
> 1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
> Non-blocking variants are supposed to not block and using workqueue
> exactly does the opposite, hence refrain from using it.
> 2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
> blocking variants means that we have to call efi_delete_dummy_variable()
> after efi_rts_wq has been created.
> 3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
> Any caller wishing to use set_variable() and query_variable_info() in
> atomic context should use their non-blocking variants.
>
> Changes from V3 to V4:
> ----------------------
> 1. As suggested by Peter, use completions instead of flush_work() as the
> former is cheaper
> 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
> wasn't able to find a better alternative to keep this change local to
> arch/x86.
>
> Changes from V2 to V3:
> ----------------------
> 1. Rewrite the cover letter to clearly state the problem. What we are
> fixing and what we are not fixing.
> 2. Make efi_delete_dummy_variable() change local to x86.
> 3. Avoid using BUG(), instead, print error message and exit gracefully.
> 4. Move struct efi_runtime_work to runtime-wrappers.c file.
> 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
> 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.
>
> Changes from V1 to V2:
> ----------------------
> 1. Remove unnecessary include of asm/efi.h file - Fixes build error on
> ia64, reported by 0-day
> 2. Use enum to identify efi_runtime_services()
> 3. Use alloc_ordered_workqueue() to create efi_rts_wq as
> create_workqueue() is scheduled for depreciation.
> 4. Make efi_call_rts() static, as it has no callers outside
> runtime-wrappers.c
> 5. Use BUG(), when we are unable to queue work or unable to identify
> requested efi_runtime_service() - Because these two situations should
> *never* happen.
>
> Sai Praneeth (3):
> x86/efi: Make efi_delete_dummy_variable() use
> set_variable_nonblocking() instead of set_variable()
> efi: Create efi_rts_wq and efi_queue_work() to invoke all
> efi_runtime_services()
> efi: Use efi_rts_wq to invoke EFI Runtime Services
>
This version looks good to me, and works as expected on SynQuacer (arm64).
Tested-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
I will give others the opportunity to review this code before queuing it though.
Thanks,
Ard.
> arch/x86/platform/efi/quirks.c | 11 +-
> drivers/firmware/efi/efi.c | 14 ++
> drivers/firmware/efi/runtime-wrappers.c | 218 +++++++++++++++++++++++++++++---
> include/linux/efi.h | 3 +
> 4 files changed, 224 insertions(+), 22 deletions(-)
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Lee Chun-Yi <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Naresh Bhat <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ravi Shankar <[email protected]>
> Cc: Matt Fleming <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Miguel Ojeda <[email protected]>
>
> --
> 2.7.4
>
Hi Sai, Ard,
On Tue, May 29, 2018 at 4:39 PM, Ard Biesheuvel
<[email protected]> wrote:
> On 29 May 2018 at 04:21, Sai Praneeth Prakhya
> <[email protected]> wrote:
>> From: Sai Praneeth <[email protected]>
>>
>> Problem statement:
>> ------------------
>> Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
>> to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
>> perf code via an NMI) will have incorrect user space mappings[1]. This
>> could lead to otherwise unexpected access errors and, worse, unauthorized
>> access to firmware code and data.
>>
>> Detailed discussion of problem statement:
>> -----------------------------------------
>> As this switch is not propagated to other kernel subsystems; they will
>> wrongly assume that swapper_pgd is still in use and it can lead to
>> following issues:
>>
>> 1. If kernel code tries to access user space addresses while in efi_pgd,
>> it could lead to unauthorized accesses to firmware code/data.
>> (e.g: <__>/copy_from_user_nmi()).
>> [This could also be disastrous if the frame pointer happens to point at
>> MMIO in the EFI runtime mappings] - Mark Rutland.
>>
>> An example of a subsystem that could touch user space while in efi_pgd is
>> perf. Assume that we are in efi_pgd, a user could use perf to profile
>> some user data and depending on the address the user is trying to
>> profile, two things could happen.
>> 1. If the mappings are absent, perf fails to profile.
>> 2. If efi_pgd does have mappings for the requested address (these
>> mappings are erroneous), perf profiles firmware code/data. If the
>> address is MMIO'ed, perf could have potentially changed some device state.
>>
>> The culprit in both the cases is, EFI subsystem swapping out pgd and not
>> perf. Because, EFI subsystem has broken the *general assumption* that
>> all other subsystems rely on - "user space might be valid and nobody has
>> switched %cr3".
>>
>> Solutions:
>> ----------
>> There are two ways to fix this issue:
>> 1. Educate about pgd change to *all* the subsystems that could
>> potentially access user space while in efi_pgd.
>> On x86, AFAIK, it could happen only when some one touches user space
>> from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
>> showed perf and oprofile). On arm, it could happen from multiple
>> places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
>> please comment on this as I might be wrong); whereas x86 runs
>> efi_runtime_services() interrupts disabled.
>>
>> I think, this solution isn't holistic because
>> a. Any other subsystem might well do the same, if not now, in future.
>> b. Also, this solution looks simpler on x86 but not true if it's the
>> same for arm (ARM folks, please comment on this as I might be wrong).
>> c. This solution looks like a work around rather than addressing the issue.
>>
>> 2. Running efi_runtime_services() in kthread context.
>> This makes sense because efi_pgd doesn't have user space and kthread
>> by definition means that user space is not valid. Any kernel code that
>> tries to touch user space while in kthread is buggy in itself. If so,
>> it should be an easy fix in the other subsystem. This also take us one
>> step closer to long awaiting proposal of Andy - Running EFI at CPL 3.
>>
>> What does this patch set do?
>> ----------------------------
>> Introduce efi_rts_wq (EFI runtime services work queue).
>> When a user process requests the kernel to execute any efi_runtime_service(),
>> kernel queues the work to efi_rts_wq, a kthread comes along, switches to
>> efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
>> patch set adds support to the EFI subsystem to handle all calls to
>> efi_runtime_services() using a work queue (which in turn uses kthread).
>>
>> How running efi_runtime_services() in kthread fixes above discussed issues?
>> ---------------------------------------------------------------------------
>> If we run efi_runtime_services() in kthread context and if perf
>> checks for it, we could get both the above scenarios correct by perf
>> aborting the profiling. Not only perf, but any subsystem that tries to
>> touch user space should first check for kthread context and if so,
>> should abort.
>>
>> Q. If we still need check for kthread context in other subsystems that
>> access user space, what does this patch set fix?
>> A. This patch set makes sure that EFI subsystem is not at fault.
>> Without this patch set the blame is upon EFI subsystem, because it's the
>> one that changed pgd and hasn't communicated this change to everyone and
>> hence broke the general assumption. Running efi_runtime_services() in
>> kthread means explicitly communicating that user space is invalid, now
>> it's the responsibility of other subsystem to make sure that it's
>> running in right context.
>>
>> Testing:
>> --------
>> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
>> (qemu only). Will appreciate the effort if someone could test the
>> patches on real ARM/ARM64 machines.
I would give the latest v5 a try on my ARM64 qualcomm board as well.
WIll get back with the test results soon.
Thanks,
Bhupesh
>> LUV: https://01.org/linux-uefi-validation
>>
>> Credits:
>> --------
>> Thanks to Ricardo, Dan, Miguel, Mark, Peter Z and Ard for reviews and
>> suggestions. Thanks to Boris and Andy for making me think through/help
>> on what I am addressing with this patch set.
>>
>> Please feel free to pour in your comments and concerns.
>>
>> Note:
>> -----
>> Patches are based on Linus's kernel v4.17-rc7
>>
>> [1] Backup: Detailing efi_pgd:
>> ------------------------------
>> efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
>> Code/Data) regions. Due to the nature of these mappings, they fall
>> in user space address ranges and they are not the same as swapper.
>>
>> [On arm64, the EFI mappings are in the VA range usually used for user
>> space. The two halves of the address space are managed by separate
>> tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
>> user space or EFI runtime mappings in TTBR0.] - Mark Rutland
>>
>> Changes from V4 to V5:
>> ----------------------
>> 1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
>> Non-blocking variants are supposed to not block and using workqueue
>> exactly does the opposite, hence refrain from using it.
>> 2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
>> blocking variants means that we have to call efi_delete_dummy_variable()
>> after efi_rts_wq has been created.
>> 3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
>> Any caller wishing to use set_variable() and query_variable_info() in
>> atomic context should use their non-blocking variants.
>>
>> Changes from V3 to V4:
>> ----------------------
>> 1. As suggested by Peter, use completions instead of flush_work() as the
>> former is cheaper
>> 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
>> wasn't able to find a better alternative to keep this change local to
>> arch/x86.
>>
>> Changes from V2 to V3:
>> ----------------------
>> 1. Rewrite the cover letter to clearly state the problem. What we are
>> fixing and what we are not fixing.
>> 2. Make efi_delete_dummy_variable() change local to x86.
>> 3. Avoid using BUG(), instead, print error message and exit gracefully.
>> 4. Move struct efi_runtime_work to runtime-wrappers.c file.
>> 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
>> 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.
>>
>> Changes from V1 to V2:
>> ----------------------
>> 1. Remove unnecessary include of asm/efi.h file - Fixes build error on
>> ia64, reported by 0-day
>> 2. Use enum to identify efi_runtime_services()
>> 3. Use alloc_ordered_workqueue() to create efi_rts_wq as
>> create_workqueue() is scheduled for depreciation.
>> 4. Make efi_call_rts() static, as it has no callers outside
>> runtime-wrappers.c
>> 5. Use BUG(), when we are unable to queue work or unable to identify
>> requested efi_runtime_service() - Because these two situations should
>> *never* happen.
>>
>> Sai Praneeth (3):
>> x86/efi: Make efi_delete_dummy_variable() use
>> set_variable_nonblocking() instead of set_variable()
>> efi: Create efi_rts_wq and efi_queue_work() to invoke all
>> efi_runtime_services()
>> efi: Use efi_rts_wq to invoke EFI Runtime Services
>>
>
> This version looks good to me, and works as expected on SynQuacer (arm64).
>
> Tested-by: Ard Biesheuvel <[email protected]>
> Reviewed-by: Ard Biesheuvel <[email protected]>
>
> I will give others the opportunity to review this code before queuing it though.
>
> Thanks,
> Ard.
>
>
>> arch/x86/platform/efi/quirks.c | 11 +-
>> drivers/firmware/efi/efi.c | 14 ++
>> drivers/firmware/efi/runtime-wrappers.c | 218 +++++++++++++++++++++++++++++---
>> include/linux/efi.h | 3 +
>> 4 files changed, 224 insertions(+), 22 deletions(-)
>>
>> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
>> Suggested-by: Andy Lutomirski <[email protected]>
>> Cc: Lee Chun-Yi <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Tony Luck <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: Bhupesh Sharma <[email protected]>
>> Cc: Naresh Bhat <[email protected]>
>> Cc: Ricardo Neri <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Ravi Shankar <[email protected]>
>> Cc: Matt Fleming <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Miguel Ojeda <[email protected]>
>>
>> --
>> 2.7.4
>>
> >> Testing:
> >> --------
> >> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> >> (qemu only). Will appreciate the effort if someone could test the
> >> patches on real ARM/ARM64 machines.
>
> I would give the latest v5 a try on my ARM64 qualcomm board as well.
> WIll get back with the test results soon.
>
Sure! Testing is always welcomed :)
and if you get a chance please try on UV300 machine too.. (probably it might find issues
as happened with mm_struct patches).
Regards,
Sai
On 29 May 2018 at 07:51, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Problem statement:
> ------------------
> Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
> to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
> perf code via an NMI) will have incorrect user space mappings[1]. This
> could lead to otherwise unexpected access errors and, worse, unauthorized
> access to firmware code and data.
>
> Detailed discussion of problem statement:
> -----------------------------------------
> As this switch is not propagated to other kernel subsystems; they will
> wrongly assume that swapper_pgd is still in use and it can lead to
> following issues:
>
> 1. If kernel code tries to access user space addresses while in efi_pgd,
> it could lead to unauthorized accesses to firmware code/data.
> (e.g: <__>/copy_from_user_nmi()).
> [This could also be disastrous if the frame pointer happens to point at
> MMIO in the EFI runtime mappings] - Mark Rutland.
>
> An example of a subsystem that could touch user space while in efi_pgd is
> perf. Assume that we are in efi_pgd, a user could use perf to profile
> some user data and depending on the address the user is trying to
> profile, two things could happen.
> 1. If the mappings are absent, perf fails to profile.
> 2. If efi_pgd does have mappings for the requested address (these
> mappings are erroneous), perf profiles firmware code/data. If the
> address is MMIO'ed, perf could have potentially changed some device state.
>
> The culprit in both the cases is, EFI subsystem swapping out pgd and not
> perf. Because, EFI subsystem has broken the *general assumption* that
> all other subsystems rely on - "user space might be valid and nobody has
> switched %cr3".
>
> Solutions:
> ----------
> There are two ways to fix this issue:
> 1. Educate about pgd change to *all* the subsystems that could
> potentially access user space while in efi_pgd.
> On x86, AFAIK, it could happen only when some one touches user space
> from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
> showed perf and oprofile). On arm, it could happen from multiple
> places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
> please comment on this as I might be wrong); whereas x86 runs
> efi_runtime_services() interrupts disabled.
>
> I think, this solution isn't holistic because
> a. Any other subsystem might well do the same, if not now, in future.
> b. Also, this solution looks simpler on x86 but not true if it's the
> same for arm (ARM folks, please comment on this as I might be wrong).
> c. This solution looks like a work around rather than addressing the issue.
>
> 2. Running efi_runtime_services() in kthread context.
> This makes sense because efi_pgd doesn't have user space and kthread
> by definition means that user space is not valid. Any kernel code that
> tries to touch user space while in kthread is buggy in itself. If so,
> it should be an easy fix in the other subsystem. This also take us one
> step closer to long awaiting proposal of Andy - Running EFI at CPL 3.
>
> What does this patch set do?
> ----------------------------
> Introduce efi_rts_wq (EFI runtime services work queue).
> When a user process requests the kernel to execute any efi_runtime_service(),
> kernel queues the work to efi_rts_wq, a kthread comes along, switches to
> efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
> patch set adds support to the EFI subsystem to handle all calls to
> efi_runtime_services() using a work queue (which in turn uses kthread).
>
> How running efi_runtime_services() in kthread fixes above discussed issues?
> ---------------------------------------------------------------------------
> If we run efi_runtime_services() in kthread context and if perf
> checks for it, we could get both the above scenarios correct by perf
> aborting the profiling. Not only perf, but any subsystem that tries to
> touch user space should first check for kthread context and if so,
> should abort.
>
> Q. If we still need check for kthread context in other subsystems that
> access user space, what does this patch set fix?
> A. This patch set makes sure that EFI subsystem is not at fault.
> Without this patch set the blame is upon EFI subsystem, because it's the
> one that changed pgd and hasn't communicated this change to everyone and
> hence broke the general assumption. Running efi_runtime_services() in
> kthread means explicitly communicating that user space is invalid, now
> it's the responsibility of other subsystem to make sure that it's
> running in right context.
>
> Testing:
> --------
> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> (qemu only). Will appreciate the effort if someone could test the
> patches on real ARM/ARM64 machines.
> LUV: https://01.org/linux-uefi-validation
I have tested V4 series on ThunderX2 Val1 and Saber platform by
integrating the patches with LUV project.
I will test V5 version soon and let you know the outcome.
>
> Credits:
> --------
> Thanks to Ricardo, Dan, Miguel, Mark, Peter Z and Ard for reviews and
> suggestions. Thanks to Boris and Andy for making me think through/help
> on what I am addressing with this patch set.
>
> Please feel free to pour in your comments and concerns.
>
> Note:
> -----
> Patches are based on Linus's kernel v4.17-rc7
>
> [1] Backup: Detailing efi_pgd:
> ------------------------------
> efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
> Code/Data) regions. Due to the nature of these mappings, they fall
> in user space address ranges and they are not the same as swapper.
>
> [On arm64, the EFI mappings are in the VA range usually used for user
> space. The two halves of the address space are managed by separate
> tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
> user space or EFI runtime mappings in TTBR0.] - Mark Rutland
>
> Changes from V4 to V5:
> ----------------------
> 1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
> Non-blocking variants are supposed to not block and using workqueue
> exactly does the opposite, hence refrain from using it.
> 2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
> blocking variants means that we have to call efi_delete_dummy_variable()
> after efi_rts_wq has been created.
> 3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
> Any caller wishing to use set_variable() and query_variable_info() in
> atomic context should use their non-blocking variants.
>
> Changes from V3 to V4:
> ----------------------
> 1. As suggested by Peter, use completions instead of flush_work() as the
> former is cheaper
> 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
> wasn't able to find a better alternative to keep this change local to
> arch/x86.
>
> Changes from V2 to V3:
> ----------------------
> 1. Rewrite the cover letter to clearly state the problem. What we are
> fixing and what we are not fixing.
> 2. Make efi_delete_dummy_variable() change local to x86.
> 3. Avoid using BUG(), instead, print error message and exit gracefully.
> 4. Move struct efi_runtime_work to runtime-wrappers.c file.
> 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
> 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.
>
> Changes from V1 to V2:
> ----------------------
> 1. Remove unnecessary include of asm/efi.h file - Fixes build error on
> ia64, reported by 0-day
> 2. Use enum to identify efi_runtime_services()
> 3. Use alloc_ordered_workqueue() to create efi_rts_wq as
> create_workqueue() is scheduled for depreciation.
> 4. Make efi_call_rts() static, as it has no callers outside
> runtime-wrappers.c
> 5. Use BUG(), when we are unable to queue work or unable to identify
> requested efi_runtime_service() - Because these two situations should
> *never* happen.
>
> Sai Praneeth (3):
> x86/efi: Make efi_delete_dummy_variable() use
> set_variable_nonblocking() instead of set_variable()
> efi: Create efi_rts_wq and efi_queue_work() to invoke all
> efi_runtime_services()
> efi: Use efi_rts_wq to invoke EFI Runtime Services
>
> arch/x86/platform/efi/quirks.c | 11 +-
> drivers/firmware/efi/efi.c | 14 ++
> drivers/firmware/efi/runtime-wrappers.c | 218 +++++++++++++++++++++++++++++---
> include/linux/efi.h | 3 +
> 4 files changed, 224 insertions(+), 22 deletions(-)
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Lee Chun-Yi <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Naresh Bhat <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ravi Shankar <[email protected]>
> Cc: Matt Fleming <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Miguel Ojeda <[email protected]>
>
> --
> 2.7.4
>
> > Testing:
> > --------
> > Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> > (qemu only). Will appreciate the effort if someone could test the
> > patches on real ARM/ARM64 machines.
> > LUV: https://01.org/linux-uefi-validation
>
> I have tested V4 series on ThunderX2 Val1 and Saber platform by integrating the
> patches with LUV project.
> I will test V5 version soon and let you know the outcome.
Thanks a lot! for taking time to test the patch set, Naresh. Will be waiting to see the result.
Regards,
Sai
On 29 May 2018 at 04:21, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Presently, when a user process requests the kernel to execute any
> efi_runtime_service(), kernel switches the page directory (%cr3) from
> swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
> this switch and they might think, user space is still valid (i.e. the
> user space mappings are still pointing to the process that requested to
> run efi_runtime_service()) but in reality it is not so.
>
> A solution for this issue is to use kthread to run
> efi_runtime_service(). When a user process requests the kernel to
> execute any efi_runtime_service(), kernel queues the work to efi_rts_wq,
> a kthread comes along, switches to efi_pgd and executes
> efi_runtime_service() in kthread context. Anything that tries to touch
> user space addresses while in kthread is terminally broken.
>
> Implementation summary:
> -----------------------
> 1. When user/kernel thread requests to execute efi_runtime_service(),
> enqueue work to efi_rts_wq.
> 2. Caller thread waits for completion until the work is finished because
> it's dependent on the return status of efi_runtime_service().
>
> Semantics to pack arguments in efi_runtime_work (has void pointers):
> 1. If argument is a pointer (of any type), pass it as is.
> 2. If argument is a value (of any type), address of the value is passed.
>
> Introduce a handler function (called efi_call_rts()) that
> 1. Understands efi_runtime_work and
> 2. Invokes the appropriate efi_runtime_service() with the appropriate
> arguments
>
> Semantics followed by efi_call_rts() to understand efi_runtime_work:
> 1. If argument was a pointer, recast it from void pointer to original
> pointer type.
> 2. If argument was a value, recast it from void pointer to original
> pointer type and dereference it.
>
> The non-blocking variants of set_variable() and query_variable_info()
> should be used while in atomic context. Use of blocking variants like
> set_variable() and query_variable_info() while in atomic will issue a
> warning ("scheduling wile in atomic") and prints stack trace. Presently,
> pstore uses non-blocking variants and hence works fine.
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Lee Chun-Yi <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Naresh Bhat <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ravi Shankar <[email protected]>
> Cc: Matt Fleming <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Miguel Ojeda <[email protected]>
> ---
> drivers/firmware/efi/runtime-wrappers.c | 135 ++++++++++++++++++++++++++++----
> 1 file changed, 119 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
> index cf3bae42a752..127d4de00403 100644
> --- a/drivers/firmware/efi/runtime-wrappers.c
> +++ b/drivers/firmware/efi/runtime-wrappers.c
> @@ -173,13 +173,104 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
> */
> static DEFINE_SEMAPHORE(efi_runtime_lock);
>
> +/*
> + * Calls the appropriate efi_runtime_service() with the appropriate
> + * arguments.
> + *
> + * Semantics followed by efi_call_rts() to understand efi_runtime_work:
> + * 1. If argument was a pointer, recast it from void pointer to original
> + * pointer type.
> + * 2. If argument was a value, recast it from void pointer to original
> + * pointer type and dereference it.
> + */
> +static void efi_call_rts(struct work_struct *work)
> +{
> + struct efi_runtime_work *efi_rts_work;
> + void *arg1, *arg2, *arg3, *arg4, *arg5;
> + efi_status_t status = EFI_NOT_FOUND;
> +
> + efi_rts_work = container_of(work, struct efi_runtime_work, work);
> + arg1 = efi_rts_work->arg1;
> + arg2 = efi_rts_work->arg2;
> + arg3 = efi_rts_work->arg3;
> + arg4 = efi_rts_work->arg4;
> + arg5 = efi_rts_work->arg5;
> +
> + switch (efi_rts_work->efi_rts_id) {
> + case GET_TIME:
> + status = efi_call_virt(get_time, (efi_time_t *)arg1,
> + (efi_time_cap_t *)arg2);
> + break;
> + case SET_TIME:
> + status = efi_call_virt(set_time, (efi_time_t *)arg1);
> + break;
> + case GET_WAKEUP_TIME:
> + status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
> + (efi_bool_t *)arg2, (efi_time_t *)arg3);
> + break;
> + case SET_WAKEUP_TIME:
> + status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
> + (efi_time_t *)arg2);
> + break;
> + case GET_VARIABLE:
> + status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
> + (efi_guid_t *)arg2, (u32 *)arg3,
> + (unsigned long *)arg4, (void *)arg5);
> + break;
> + case GET_NEXT_VARIABLE:
> + status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
> + (efi_char16_t *)arg2,
> + (efi_guid_t *)arg3);
> + break;
> + case SET_VARIABLE:
> + status = efi_call_virt(set_variable, (efi_char16_t *)arg1,
> + (efi_guid_t *)arg2, *(u32 *)arg3,
> + *(unsigned long *)arg4, (void *)arg5);
> + break;
> + case QUERY_VARIABLE_INFO:
> + status = efi_call_virt(query_variable_info, *(u32 *)arg1,
> + (u64 *)arg2, (u64 *)arg3, (u64 *)arg4);
> + break;
> + case GET_NEXT_HIGH_MONO_COUNT:
> + status = efi_call_virt(get_next_high_mono_count, (u32 *)arg1);
> + break;
> + case RESET_SYSTEM:
> + __efi_call_virt(reset_system, *(int *)arg1,
> + *(efi_status_t *)arg2,
> + *(unsigned long *)arg3,
> + (efi_char16_t *)arg4);
> + break;
I noticed that -unsurprisingly- reboot no longer works with these changes.
I will fix up the patch, and revert the efi_reset_system() change,
both here and below.
> + case UPDATE_CAPSULE:
> + status = efi_call_virt(update_capsule,
> + (efi_capsule_header_t **)arg1,
> + *(unsigned long *)arg2,
> + *(unsigned long *)arg3);
> + break;
> + case QUERY_CAPSULE_CAPS:
> + status = efi_call_virt(query_capsule_caps,
> + (efi_capsule_header_t **)arg1,
> + *(unsigned long *)arg2, (u64 *)arg3,
> + (int *)arg4);
> + break;
> + default:
> + /*
> + * Ideally, we should never reach here because a caller of this
> + * function should have put the right efi_runtime_service()
> + * function identifier into efi_rts_work->efi_rts_id
> + */
> + pr_err("Requested executing invalid EFI Runtime Service.\n");
> + }
> + efi_rts_work->status = status;
> + complete(&efi_rts_work->efi_rts_comp);
> +}
> +
> static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
> {
> efi_status_t status;
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(get_time, tm, tc);
> + status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -190,7 +281,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(set_time, tm);
> + status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -203,7 +294,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
> + status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL,
> + NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -214,7 +306,8 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(set_wakeup_time, enabled, tm);
> + status = efi_queue_work(SET_WAKEUP_TIME, &enabled, tm, NULL, NULL,
> + NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -229,8 +322,8 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(get_variable, name, vendor, attr, data_size,
> - data);
> + status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size,
> + data);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -243,7 +336,8 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(get_next_variable, name_size, name, vendor);
> + status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor,
> + NULL, NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -258,8 +352,10 @@ static efi_status_t virt_efi_set_variable(efi_char16_t *name,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(set_variable, name, vendor, attr, data_size,
> - data);
> +
> + status = efi_queue_work(SET_VARIABLE, name, vendor, &attr, &data_size,
> + data);
> +
> up(&efi_runtime_lock);
> return status;
> }
> @@ -276,6 +372,7 @@ virt_efi_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor,
>
> status = efi_call_virt(set_variable, name, vendor, attr, data_size,
> data);
> +
> up(&efi_runtime_lock);
> return status;
> }
> @@ -293,8 +390,10 @@ static efi_status_t virt_efi_query_variable_info(u32 attr,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(query_variable_info, attr, storage_space,
> - remaining_space, max_variable_size);
> +
> + status = efi_queue_work(QUERY_VARIABLE_INFO, &attr, storage_space,
> + remaining_space, max_variable_size, NULL);
> +
> up(&efi_runtime_lock);
> return status;
> }
> @@ -315,6 +414,7 @@ virt_efi_query_variable_info_nonblocking(u32 attr,
>
> status = efi_call_virt(query_variable_info, attr, storage_space,
> remaining_space, max_variable_size);
> +
> up(&efi_runtime_lock);
> return status;
> }
> @@ -325,7 +425,8 @@ static efi_status_t virt_efi_get_next_high_mono_count(u32 *count)
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(get_next_high_mono_count, count);
> + status = efi_queue_work(GET_NEXT_HIGH_MONO_COUNT, count, NULL, NULL,
> + NULL, NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -340,7 +441,8 @@ static void virt_efi_reset_system(int reset_type,
> "could not get exclusive access to the firmware\n");
> return;
> }
> - __efi_call_virt(reset_system, reset_type, status, data_size, data);
> + efi_queue_work(RESET_SYSTEM, &reset_type, &status, &data_size, data,
> + NULL);
> up(&efi_runtime_lock);
> }
>
> @@ -355,7 +457,8 @@ static efi_status_t virt_efi_update_capsule(efi_capsule_header_t **capsules,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(update_capsule, capsules, count, sg_list);
> + status = efi_queue_work(UPDATE_CAPSULE, capsules, &count, &sg_list,
> + NULL, NULL);
> up(&efi_runtime_lock);
> return status;
> }
> @@ -372,8 +475,8 @@ static efi_status_t virt_efi_query_capsule_caps(efi_capsule_header_t **capsules,
>
> if (down_interruptible(&efi_runtime_lock))
> return EFI_ABORTED;
> - status = efi_call_virt(query_capsule_caps, capsules, count, max_size,
> - reset_type);
> + status = efi_queue_work(QUERY_CAPSULE_CAPS, capsules, &count,
> + max_size, reset_type, NULL);
> up(&efi_runtime_lock);
> return status;
> }
> --
> 2.7.4
>
> > + case RESET_SYSTEM:
> > + __efi_call_virt(reset_system, *(int *)arg1,
> > + *(efi_status_t *)arg2,
> > + *(unsigned long *)arg3,
> > + (efi_char16_t *)arg4);
> > + break;
>
> I noticed that -unsurprisingly- reboot no longer works with these changes.
>
> I will fix up the patch, and revert the efi_reset_system() change, both here and
> below.
Could you please let me know what the bug is here? I am unable to see it right away :(
I have tested reboot on qemu x86_64 by passing "reboot=efi" as command line arg and
saw that reboot is working fine.
> > @@ -340,7 +441,8 @@ static void virt_efi_reset_system(int reset_type,
> > "could not get exclusive access to the firmware\n");
> > return;
> > }
> > - __efi_call_virt(reset_system, reset_type, status, data_size, data);
> > + efi_queue_work(RESET_SYSTEM, &reset_type, &status, &data_size, data,
> > + NULL);
> > up(&efi_runtime_lock);
> > }
Regards,
Sai
On 5 June 2018 at 21:29, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > + case RESET_SYSTEM:
>> > + __efi_call_virt(reset_system, *(int *)arg1,
>> > + *(efi_status_t *)arg2,
>> > + *(unsigned long *)arg3,
>> > + (efi_char16_t *)arg4);
>> > + break;
>>
>> I noticed that -unsurprisingly- reboot no longer works with these changes.
>>
>> I will fix up the patch, and revert the efi_reset_system() change, both here and
>> below.
>
> Could you please let me know what the bug is here? I am unable to see it right away :(
> I have tested reboot on qemu x86_64 by passing "reboot=efi" as command line arg and
> saw that reboot is working fine.
>
My arm64 hangs at reboot or power off, unless i revert the ResetSystem() part.
But given that it is both risky (relying on a kthread running a
workqueue in the shutdown path) and unnecessary (ResetSystem() is not
supposed to return, and is only called by the kernel when the whole
system has already been torn down), I think the main reason is simply
that there is no reason to add it.
>> > @@ -340,7 +441,8 @@ static void virt_efi_reset_system(int reset_type,
>> > "could not get exclusive access to the firmware\n");
>> > return;
>> > }
>> > - __efi_call_virt(reset_system, reset_type, status, data_size, data);
>> > + efi_queue_work(RESET_SYSTEM, &reset_type, &status, &data_size, data,
>> > + NULL);
>> > up(&efi_runtime_lock);
>> > }
>
> Regards,
> Sai
> >> I noticed that -unsurprisingly- reboot no longer works with these changes.
> >>
> >> I will fix up the patch, and revert the efi_reset_system() change,
> >> both here and below.
> >
> > Could you please let me know what the bug is here? I am unable to see
> > it right away :( I have tested reboot on qemu x86_64 by passing
> > "reboot=efi" as command line arg and saw that reboot is working fine.
> >
>
> My arm64 hangs at reboot or power off, unless i revert the ResetSystem() part.
>
That's interesting. Not sure how the behavior could be different between x86 and arm64.
Maybe worth debugging further?
As you said, reverting ResetSystem() part makes things work, we cannot suspect a
buggy efi_reset_system() (I came across buggy efi_reset_system() implementations
on some x86 systems).
> But given that it is both risky (relying on a kthread running a workqueue in the
> shutdown path) and unnecessary (ResetSystem() is not supposed to return, and is
> only called by the kernel when the whole system has already been torn down), I
> think the main reason is simply that there is no reason to add it.
Makes sense.
I think, before calling firmware ResetSystem() kernel has already shut down all other parts
and hence as the control will not transfer back to kernel (other than ResetSystem() failure),
we could safely assume that no kernel code would access user space while in
efi_reset_system() and hence we don't need to take the work queue path.
Regards,
Sai
On 4 June 2018 at 09:57, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > Testing:
>> > --------
>> > Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
>> > (qemu only). Will appreciate the effort if someone could test the
>> > patches on real ARM/ARM64 machines.
>> > LUV: https://01.org/linux-uefi-validation
>>
>> I have tested V4 series on ThunderX2 Val1 and Saber platform by integrating the
>> patches with LUV project.
>> I will test V5 version soon and let you know the outcome.
>
> Thanks a lot! for taking time to test the patch set, Naresh. Will be waiting to see the result.
Sai, I did upgrade LUV kernel to v4.17, apply and test your patches by
running LUV on QEMU and ThunderX2 hardware. All efi test results
looks ok. But as Ard pointed, I also noticed hang issue after
executing reboot command on console on real hardware.
>
> Regards,
> Sai
> >> I have tested V4 series on ThunderX2 Val1 and Saber platform by
> >> integrating the patches with LUV project.
> >> I will test V5 version soon and let you know the outcome.
> >
> > Thanks a lot! for taking time to test the patch set, Naresh. Will be waiting to
> see the result.
>
> Sai, I did upgrade LUV kernel to v4.17, apply and test your patches by running
> LUV on QEMU and ThunderX2 hardware. All efi test results looks ok.
Thanks a lot! for the update :)
> But as Ard pointed, I also noticed hang issue after executing reboot command on console
> on real hardware.
Ard has fixed the issue before applying and the latest patches can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/log/?h=next
Regards,
Sai