2018-05-25 22:06:36

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

From: Sai Praneeth <[email protected]>

Problem statement:
------------------
Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
perf code via an NMI) will have incorrect user space mappings[1]. This
could lead to otherwise unexpected access errors and, worse, unauthorized
access to firmware code and data.

Detailed discussion of problem statement:
-----------------------------------------
As this switch is not propagated to other kernel subsystems; they will
wrongly assume that swapper_pgd is still in use and it can lead to
following issues:

1. If kernel code tries to access user space addresses while in efi_pgd,
it could lead to unauthorized accesses to firmware code/data.
(e.g: <__>/copy_from_user_nmi()).
[This could also be disastrous if the frame pointer happens to point at
MMIO in the EFI runtime mappings] - Mark Rutland.

An example of a subsystem that could touch user space while in efi_pgd is
perf. Assume that we are in efi_pgd, a user could use perf to profile
some user data and depending on the address the user is trying to
profile, two things could happen.
1. If the mappings are absent, perf fails to profile.
2. If efi_pgd does have mappings for the requested address (these
mappings are erroneous), perf profiles firmware code/data. If the
address is MMIO'ed, perf could have potentially changed some device state.

The culprit in both the cases is, EFI subsystem swapping out pgd and not
perf. Because, EFI subsystem has broken the *general assumption* that
all other subsystems rely on - "user space might be valid and nobody has
switched %cr3".

Solutions:
----------
There are two ways to fix this issue:
1. Educate about pgd change to *all* the subsystems that could
potentially access user space while in efi_pgd.
On x86, AFAIK, it could happen only when some one touches user space
from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
showed perf and oprofile). On arm, it could happen from multiple
places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
please comment on this as I might be wrong); whereas x86 runs
efi_runtime_services() interrupts disabled.

I think, this solution isn't holistic because
a. Any other subsystem might well do the same, if not now, in future.
b. Also, this solution looks simpler on x86 but not true if it's the
same for arm (ARM folks, please comment on this as I might be wrong).
c. This solution looks like a work around rather than addressing the issue.

2. Running efi_runtime_services() in kthread context.
This makes sense because efi_pgd doesn't have user space and kthread
by definition means that user space is not valid. Any kernel code that
tries to touch user space while in kthread is buggy in itself. If so,
it should be an easy fix in the other subsystem. This also take us one
step closer to long awaiting proposal of Andy - Running EFI at CPL 3.

What does this patch set do?
----------------------------
Introduce efi_rts_wq (EFI runtime services work queue).
When a user process requests the kernel to execute any efi_runtime_service(),
kernel queues the work to efi_rts_wq, a kthread comes along, switches to
efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
patch set adds support to the EFI subsystem to handle all calls to
efi_runtime_services() using a work queue (which in turn uses kthread).

How running efi_runtime_services() in kthread fixes above discussed issues?
---------------------------------------------------------------------------
If we run efi_runtime_services() in kthread context and if perf
checks for it, we could get both the above scenarios correct by perf
aborting the profiling. Not only perf, but any subsystem that tries to
touch user space should first check for kthread context and if so,
should abort.

Q. If we still need check for kthread context in other subsystems that
access user space, what does this patch set fix?
A. This patch set makes sure that EFI subsystem is not at fault.
Without this patch set the blame is upon EFI subsystem, because it's the
one that changed pgd and hasn't communicated this change to everyone and
hence broke the general assumption. Running efi_runtime_services() in
kthread means explicitly communicating that user space is invalid, now
it's the responsibility of other subsystem to make sure that it's
running in right context.

Testing:
--------
Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
(qemu only). Will appreciate the effort if someone could test the
patches on real ARM/ARM64 machines.
LUV: https://01.org/linux-uefi-validation

Credits:
--------
Thanks to Ricardo, Dan, Miguel, Mark and Peter for reviews and suggestions.
Thanks to Boris and Andy for making me think through/help on what I am
addressing with this patch set.

Please feel free to pour in your comments and concerns.

Note:
-----
Patches are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
------------------------------
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V3 to V4:
----------------------
1. As suggested by Peter, use completions instead of flush_work() as the
former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
wasn't able to find a better alternative to keep this change local to
arch/x86.

Changes from V2 to V3:
----------------------
1. Rewrite the cover letter to clearly state the problem. What we are
fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
----------------------
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
efi: Use efi_rts_wq to invoke EFI Runtime Services

arch/x86/include/asm/efi.h | 1 -
arch/x86/platform/efi/efi.c | 6 -
drivers/firmware/efi/efi.c | 20 +++
drivers/firmware/efi/runtime-wrappers.c | 256 +++++++++++++++++++++++++++++---
include/linux/efi.h | 6 +
5 files changed, 262 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>

--
2.7.4



2018-05-25 22:05:38

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V4 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()

From: Sai Praneeth <[email protected]>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes

The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
---
drivers/firmware/efi/efi.c | 14 ++++++
drivers/firmware/efi/runtime-wrappers.c | 85 +++++++++++++++++++++++++++++++++
include/linux/efi.h | 3 ++
3 files changed, 102 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 1176af664013..2632294eb33f 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
};

+struct workqueue_struct *efi_rts_wq;
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -338,6 +340,18 @@ static int __init efisubsys_init(void)
return 0;

/*
+ * Since we process only one efi_runtime_service() at a time, an
+ * ordered workqueue (which creates only one execution context)
+ * should suffice all our needs.
+ */
+ efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+ if (!efi_rts_wq) {
+ pr_err("Creating efi_rts_wq failed, EFI runtime services disabled.\n");
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ return 0;
+ }
+
+ /*
* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
* it should be invoked only after efi_rts_wq is ready.
*/
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..534bd348feca 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
/*
* runtime-wrappers.c - Runtime Services function call wrappers
*
+ * Implementation summary:
+ * -----------------------
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
* Copyright (C) 2014 Linaro Ltd. <[email protected]>
*
* Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
#include <linux/mutex.h>
#include <linux/semaphore.h>
#include <linux/stringify.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+
#include <asm/efi.h>

/*
@@ -33,6 +45,79 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)

+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ SET_VARIABLE_NONBLOCKING,
+ QUERY_VARIABLE_INFO,
+ QUERY_VARIABLE_INFO_NONBLOCKING,
+ GET_NEXT_HIGH_MONO_COUNT,
+ RESET_SYSTEM,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ * @efi_rts_id: EFI Runtime Service function identifier
+ * @efi_rts_comp: Struct used for handling completions
+ */
+struct efi_runtime_work {
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+ enum efi_rts_ids efi_rts_id;
+ struct completion efi_rts_comp;
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts: efi_runtime_service() function identifier
+ * @rts_arg<1-5>: efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the caller
+ * thread waits for completion.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
+({ \
+ struct efi_runtime_work efi_rts_work; \
+ efi_rts_work.status = EFI_ABORTED; \
+ \
+ init_completion(&efi_rts_work.efi_rts_comp); \
+ INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
+ efi_rts_work.arg1 = _arg1; \
+ efi_rts_work.arg2 = _arg2; \
+ efi_rts_work.arg3 = _arg3; \
+ efi_rts_work.arg4 = _arg4; \
+ efi_rts_work.arg5 = _arg5; \
+ efi_rts_work.efi_rts_id = _rts; \
+ \
+ /* \
+ * queue_work() returns 0 if work was already on queue, \
+ * _ideally_ this should never happen. \
+ */ \
+ if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
+ wait_for_completion(&efi_rts_work.efi_rts_comp); \
+ else \
+ pr_err("Failed to queue work to efi_rts_wq.\n"); \
+ \
+ efi_rts_work.status; \
+})
+
void efi_call_virt_check_flags(unsigned long flags, const char *call)
{
unsigned long cur_flags, mismatch;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 1b79939d0b1e..8fb1af15be67 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1654,4 +1654,7 @@ struct linux_efi_tpm_eventlog {

extern int efi_tpm_eventlog_init(void);

+/* Workqueue to queue EFI Runtime Services */
+extern struct workqueue_struct *efi_rts_wq;
+
#endif /* _LINUX_EFI_H */
--
2.7.4


2018-05-25 22:06:14

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V4 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

From: Sai Praneeth <[email protected]>

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service().
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
-----------------------
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
---
drivers/firmware/efi/runtime-wrappers.c | 171 ++++++++++++++++++++++++++++----
1 file changed, 151 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index 534bd348feca..26bb6645ff59 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -175,13 +175,108 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
*/
static DEFINE_SEMAPHORE(efi_runtime_lock);

+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+ struct efi_runtime_work *efi_rts_work;
+ void *arg1, *arg2, *arg3, *arg4, *arg5;
+ efi_status_t status = EFI_NOT_FOUND;
+
+ efi_rts_work = container_of(work, struct efi_runtime_work, work);
+ arg1 = efi_rts_work->arg1;
+ arg2 = efi_rts_work->arg2;
+ arg3 = efi_rts_work->arg3;
+ arg4 = efi_rts_work->arg4;
+ arg5 = efi_rts_work->arg5;
+
+ switch (efi_rts_work->efi_rts_id) {
+ case GET_TIME:
+ status = efi_call_virt(get_time, (efi_time_t *)arg1,
+ (efi_time_cap_t *)arg2);
+ break;
+ case SET_TIME:
+ status = efi_call_virt(set_time, (efi_time_t *)arg1);
+ break;
+ case GET_WAKEUP_TIME:
+ status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+ (efi_bool_t *)arg2, (efi_time_t *)arg3);
+ break;
+ case SET_WAKEUP_TIME:
+ status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+ (efi_time_t *)arg2);
+ break;
+ case GET_VARIABLE:
+ status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, (u32 *)arg3,
+ (unsigned long *)arg4, (void *)arg5);
+ break;
+ case GET_NEXT_VARIABLE:
+ status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+ (efi_char16_t *)arg2,
+ (efi_guid_t *)arg3);
+ break;
+ case SET_VARIABLE:
+ /* fall through */
+ case SET_VARIABLE_NONBLOCKING:
+ status = efi_call_virt(set_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, *(u32 *)arg3,
+ *(unsigned long *)arg4, (void *)arg5);
+ break;
+ case QUERY_VARIABLE_INFO:
+ /* fall through */
+ case QUERY_VARIABLE_INFO_NONBLOCKING:
+ status = efi_call_virt(query_variable_info, *(u32 *)arg1,
+ (u64 *)arg2, (u64 *)arg3, (u64 *)arg4);
+ break;
+ case GET_NEXT_HIGH_MONO_COUNT:
+ status = efi_call_virt(get_next_high_mono_count, (u32 *)arg1);
+ break;
+ case RESET_SYSTEM:
+ __efi_call_virt(reset_system, *(int *)arg1,
+ *(efi_status_t *)arg2,
+ *(unsigned long *)arg3,
+ (efi_char16_t *)arg4);
+ break;
+ case UPDATE_CAPSULE:
+ status = efi_call_virt(update_capsule,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2,
+ *(unsigned long *)arg3);
+ break;
+ case QUERY_CAPSULE_CAPS:
+ status = efi_call_virt(query_capsule_caps,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2, (u64 *)arg3,
+ (int *)arg4);
+ break;
+ default:
+ /*
+ * Ideally, we should never reach here because a caller of this
+ * function should have put the right efi_runtime_service()
+ * function identifier into efi_rts_work->efi_rts_id
+ */
+ pr_err("Requested executing invalid EFI Runtime Service.\n");
+ }
+ efi_rts_work->status = status;
+ complete(&efi_rts_work->efi_rts_comp);
+}
+
static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
{
efi_status_t status;

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_time, tm, tc);
+ status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -192,7 +287,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_time, tm);
+ status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -205,7 +300,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+ status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -216,7 +312,8 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_wakeup_time, enabled, tm);
+ status = efi_queue_work(SET_WAKEUP_TIME, &enabled, tm, NULL, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -231,8 +328,8 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_variable, name, vendor, attr, data_size,
- data);
+ status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size,
+ data);
up(&efi_runtime_lock);
return status;
}
@@ -245,7 +342,8 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_variable, name_size, name, vendor);
+ status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -260,8 +358,15 @@ static efi_status_t virt_efi_set_variable(efi_char16_t *name,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_variable, name, vendor, attr, data_size,
- data);
+
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(SET_VARIABLE, name, vendor, &attr,
+ &data_size, data);
+ else
+ status = efi_call_virt(set_variable, name, vendor, attr,
+ data_size, data);
+
up(&efi_runtime_lock);
return status;
}
@@ -276,8 +381,14 @@ virt_efi_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor,
if (down_trylock(&efi_runtime_lock))
return EFI_NOT_READY;

- status = efi_call_virt(set_variable, name, vendor, attr, data_size,
- data);
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(SET_VARIABLE_NONBLOCKING, &name, vendor,
+ &attr, &data_size, data);
+ else
+ status = efi_call_virt(set_variable, name, vendor, attr,
+ data_size, data);
+
up(&efi_runtime_lock);
return status;
}
@@ -295,8 +406,17 @@ static efi_status_t virt_efi_query_variable_info(u32 attr,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_variable_info, attr, storage_space,
- remaining_space, max_variable_size);
+
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(QUERY_VARIABLE_INFO, &attr,
+ storage_space, remaining_space,
+ max_variable_size, NULL);
+ else
+ status = efi_call_virt(query_variable_info, attr,
+ storage_space, remaining_space,
+ max_variable_size);
+
up(&efi_runtime_lock);
return status;
}
@@ -315,8 +435,16 @@ virt_efi_query_variable_info_nonblocking(u32 attr,
if (down_trylock(&efi_runtime_lock))
return EFI_NOT_READY;

- status = efi_call_virt(query_variable_info, attr, storage_space,
- remaining_space, max_variable_size);
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(QUERY_VARIABLE_INFO_NONBLOCKING, &attr,
+ storage_space, remaining_space,
+ max_variable_size, NULL);
+ else
+ status = efi_call_virt(query_variable_info, attr,
+ storage_space, remaining_space,
+ max_variable_size);
+
up(&efi_runtime_lock);
return status;
}
@@ -327,7 +455,8 @@ static efi_status_t virt_efi_get_next_high_mono_count(u32 *count)

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_high_mono_count, count);
+ status = efi_queue_work(GET_NEXT_HIGH_MONO_COUNT, count, NULL, NULL,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -342,7 +471,8 @@ static void virt_efi_reset_system(int reset_type,
"could not get exclusive access to the firmware\n");
return;
}
- __efi_call_virt(reset_system, reset_type, status, data_size, data);
+ efi_queue_work(RESET_SYSTEM, &reset_type, &status, &data_size, data,
+ NULL);
up(&efi_runtime_lock);
}

@@ -357,7 +487,8 @@ static efi_status_t virt_efi_update_capsule(efi_capsule_header_t **capsules,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(update_capsule, capsules, count, sg_list);
+ status = efi_queue_work(UPDATE_CAPSULE, capsules, &count, &sg_list,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -374,8 +505,8 @@ static efi_status_t virt_efi_query_capsule_caps(efi_capsule_header_t **capsules,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_capsule_caps, capsules, count, max_size,
- reset_type);
+ status = efi_queue_work(QUERY_CAPSULE_CAPS, capsules, &count,
+ max_size, reset_type, NULL);
up(&efi_runtime_lock);
return status;
}
--
2.7.4


2018-05-25 22:07:48

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V4 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

From: Sai Praneeth <[email protected]>

Invoking efi_runtime_services() through efi_rts_wq (efi runtime
services workqueue) means all accesses to efi_runtime_services() should
be done after efi_rts_wq has been created. efi_delete_dummy_variable()
calls set_variable(), hence efi_delete_dummy_variable() should be called
after efi_rts_wq has been created.

Presently, efi_delete_dummy_variable() is called from
efi_enter_virtual_mode() which is early in the boot phase (efi_rts_wq
isn't created yet), so call efi_delete_dummy_variable() later in the
boot phase. Another and the most important reason for calling
efi_delete_dummy_variable() late in the boot process is, if called
before rest_init(), kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". Hence, call from efisubsys_init()
which is called during rest_init().

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Naresh Bhat <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Miguel Ojeda <[email protected]>
---
arch/x86/include/asm/efi.h | 1 -
arch/x86/platform/efi/efi.c | 6 ------
drivers/firmware/efi/efi.c | 6 ++++++
include/linux/efi.h | 3 +++
4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..0e61b771b93d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -138,7 +138,6 @@ extern void __init efi_runtime_update_mappings(void);
extern void __init efi_dump_pagetable(void);
extern void __init efi_apply_memmap_quirks(void);
extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);

struct efi_setup_data {
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)

if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
- /* clean DUMMY object */
- efi_delete_dummy_variable();
#endif
}

@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
* necessary relocation fixups for the new virtual addresses.
*/
efi_runtime_update_mappings();
-
- /* clean DUMMY object */
- efi_delete_dummy_variable();
}

void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1176af664013 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -337,6 +337,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;

+ /*
+ * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+ * it should be invoked only after efi_rts_wq is ready.
+ */
+ efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 3016d8c456bc..1b79939d0b1e 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -994,6 +994,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
bool nonblocking);
extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
#else
static inline void efi_late_init(void) {}
static inline void efi_free_boot_services(void) {}
@@ -1004,6 +1005,8 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
{
return EFI_SUCCESS;
}
+
+static inline void efi_delete_dummy_variable(void) {}
#endif
extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);

--
2.7.4


2018-05-25 22:20:02

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

On 26 May 2018 at 00:05, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Problem statement:
> ------------------
> Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
> to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
> perf code via an NMI) will have incorrect user space mappings[1]. This
> could lead to otherwise unexpected access errors and, worse, unauthorized
> access to firmware code and data.
>
> Detailed discussion of problem statement:
> -----------------------------------------
> As this switch is not propagated to other kernel subsystems; they will
> wrongly assume that swapper_pgd is still in use and it can lead to
> following issues:
>
> 1. If kernel code tries to access user space addresses while in efi_pgd,
> it could lead to unauthorized accesses to firmware code/data.
> (e.g: <__>/copy_from_user_nmi()).
> [This could also be disastrous if the frame pointer happens to point at
> MMIO in the EFI runtime mappings] - Mark Rutland.
>
> An example of a subsystem that could touch user space while in efi_pgd is
> perf. Assume that we are in efi_pgd, a user could use perf to profile
> some user data and depending on the address the user is trying to
> profile, two things could happen.
> 1. If the mappings are absent, perf fails to profile.
> 2. If efi_pgd does have mappings for the requested address (these
> mappings are erroneous), perf profiles firmware code/data. If the
> address is MMIO'ed, perf could have potentially changed some device state.
>
> The culprit in both the cases is, EFI subsystem swapping out pgd and not
> perf. Because, EFI subsystem has broken the *general assumption* that
> all other subsystems rely on - "user space might be valid and nobody has
> switched %cr3".
>
> Solutions:
> ----------
> There are two ways to fix this issue:
> 1. Educate about pgd change to *all* the subsystems that could
> potentially access user space while in efi_pgd.
> On x86, AFAIK, it could happen only when some one touches user space
> from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
> showed perf and oprofile). On arm, it could happen from multiple
> places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
> please comment on this as I might be wrong); whereas x86 runs
> efi_runtime_services() interrupts disabled.
>
> I think, this solution isn't holistic because
> a. Any other subsystem might well do the same, if not now, in future.
> b. Also, this solution looks simpler on x86 but not true if it's the
> same for arm (ARM folks, please comment on this as I might be wrong).
> c. This solution looks like a work around rather than addressing the issue.
>
> 2. Running efi_runtime_services() in kthread context.
> This makes sense because efi_pgd doesn't have user space and kthread
> by definition means that user space is not valid. Any kernel code that
> tries to touch user space while in kthread is buggy in itself. If so,
> it should be an easy fix in the other subsystem. This also take us one
> step closer to long awaiting proposal of Andy - Running EFI at CPL 3.
>
> What does this patch set do?
> ----------------------------
> Introduce efi_rts_wq (EFI runtime services work queue).
> When a user process requests the kernel to execute any efi_runtime_service(),
> kernel queues the work to efi_rts_wq, a kthread comes along, switches to
> efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
> patch set adds support to the EFI subsystem to handle all calls to
> efi_runtime_services() using a work queue (which in turn uses kthread).
>
> How running efi_runtime_services() in kthread fixes above discussed issues?
> ---------------------------------------------------------------------------
> If we run efi_runtime_services() in kthread context and if perf
> checks for it, we could get both the above scenarios correct by perf
> aborting the profiling. Not only perf, but any subsystem that tries to
> touch user space should first check for kthread context and if so,
> should abort.
>
> Q. If we still need check for kthread context in other subsystems that
> access user space, what does this patch set fix?
> A. This patch set makes sure that EFI subsystem is not at fault.
> Without this patch set the blame is upon EFI subsystem, because it's the
> one that changed pgd and hasn't communicated this change to everyone and
> hence broke the general assumption. Running efi_runtime_services() in
> kthread means explicitly communicating that user space is invalid, now
> it's the responsibility of other subsystem to make sure that it's
> running in right context.
>
> Testing:
> --------
> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> (qemu only). Will appreciate the effort if someone could test the
> patches on real ARM/ARM64 machines.
> LUV: https://01.org/linux-uefi-validation
>
> Credits:
> --------
> Thanks to Ricardo, Dan, Miguel, Mark and Peter for reviews and suggestions.
> Thanks to Boris and Andy for making me think through/help on what I am
> addressing with this patch set.
>
> Please feel free to pour in your comments and concerns.
>
> Note:
> -----
> Patches are based on Linus's kernel v4.17-rc6
>
> [1] Backup: Detailing efi_pgd:
> ------------------------------
> efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
> Code/Data) regions. Due to the nature of these mappings, they fall
> in user space address ranges and they are not the same as swapper.
>
> [On arm64, the EFI mappings are in the VA range usually used for user
> space. The two halves of the address space are managed by separate
> tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
> user space or EFI runtime mappings in TTBR0.] - Mark Rutland
>
> Changes from V3 to V4:
> ----------------------
> 1. As suggested by Peter, use completions instead of flush_work() as the
> former is cheaper
> 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
> wasn't able to find a better alternative to keep this change local to
> arch/x86.
>

Two questions:
- Should the non-blocking variants of the query and set_variable_store
use the work queue? Doesn't that make them blocking?
- If the non-blocking set_variable() does not use the work queue, can
we just call it from efi_delete_dummy_variable(), and keep the calls
where they are?



> Changes from V2 to V3:
> ----------------------
> 1. Rewrite the cover letter to clearly state the problem. What we are
> fixing and what we are not fixing.
> 2. Make efi_delete_dummy_variable() change local to x86.
> 3. Avoid using BUG(), instead, print error message and exit gracefully.
> 4. Move struct efi_runtime_work to runtime-wrappers.c file.
> 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
> 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.
>
> Changes from V1 to V2:
> ----------------------
> 1. Remove unnecessary include of asm/efi.h file - Fixes build error on
> ia64, reported by 0-day
> 2. Use enum to identify efi_runtime_services()
> 3. Use alloc_ordered_workqueue() to create efi_rts_wq as
> create_workqueue() is scheduled for depreciation.
> 4. Make efi_call_rts() static, as it has no callers outside
> runtime-wrappers.c
> 5. Use BUG(), when we are unable to queue work or unable to identify
> requested efi_runtime_service() - Because these two situations should
> *never* happen.
>
> Sai Praneeth (3):
> x86/efi: Call efi_delete_dummy_variable() during efi subsystem
> initialization
> efi: Create efi_rts_wq and efi_queue_work() to invoke all
> efi_runtime_services()
> efi: Use efi_rts_wq to invoke EFI Runtime Services
>
> arch/x86/include/asm/efi.h | 1 -
> arch/x86/platform/efi/efi.c | 6 -
> drivers/firmware/efi/efi.c | 20 +++
> drivers/firmware/efi/runtime-wrappers.c | 256 +++++++++++++++++++++++++++++---
> include/linux/efi.h | 6 +
> 5 files changed, 262 insertions(+), 27 deletions(-)
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Lee Chun-Yi <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Naresh Bhat <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ravi Shankar <[email protected]>
> Cc: Matt Fleming <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Miguel Ojeda <[email protected]>
>
> --
> 2.7.4
>

2018-05-25 23:11:20

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

> > Changes from V3 to V4:
> > ----------------------
> > 1. As suggested by Peter, use completions instead of flush_work() as the
> > former is cheaper
> > 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
> > wasn't able to find a better alternative to keep this change local to
> > arch/x86.
> >
>
> Two questions:
> - Should the non-blocking variants of the query and set_variable_store use the
> work queue? Doesn't that make them blocking?

That's a good question . I think you are right, calling non-blocking variants of efi_rts
using work queues makes them blocking. But, I have a follow on question.

Assume some user requested to execute some non-blocking variant of efi_rts and
the kernel hasn't called efi_call_virt() yet, but was scheduled out. IOW, even though
user requests for non-blocking efi call, we might still block. Am I right?

With efi_rts_wq, I think, I have increased the window of getting blocked. With efi_rts_wq,
kernel should explicitly call schedule() to run firmware and the chances of getting blocked
are much more.

Expect this increased window, I think firmware should be executed as before.

So, can you please explain me the difference between blocking and non-blocking variants
from kernel perspective?
(the way we get locks are different down_interruptible() vs down_trylock())

> - If the non-blocking set_variable() does not use the work queue, can we just call
> it from efi_delete_dummy_variable(), and keep the calls where they are?

Yes, I think we can do that (if we don't use efi_rts_wq for non-blocking variants).

Regards,
Sai

2018-05-26 06:33:18

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

On 26 May 2018 at 01:08, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > Changes from V3 to V4:
>> > ----------------------
>> > 1. As suggested by Peter, use completions instead of flush_work() as the
>> > former is cheaper
>> > 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
>> > wasn't able to find a better alternative to keep this change local to
>> > arch/x86.
>> >
>>
>> Two questions:
>> - Should the non-blocking variants of the query and set_variable_store use the
>> work queue? Doesn't that make them blocking?
>
> That's a good question . I think you are right, calling non-blocking variants of efi_rts
> using work queues makes them blocking. But, I have a follow on question.
>
> Assume some user requested to execute some non-blocking variant of efi_rts and
> the kernel hasn't called efi_call_virt() yet, but was scheduled out. IOW, even though
> user requests for non-blocking efi call, we might still block. Am I right?
>

No, that is the whole point. These functions may be called from atomic
context, which is why they trylock() and give up rather than block on
the semaphore if a rt services call is already in progress. E.g.,

/*
* efivar_entry_set_nonblocking - call set_variable_nonblocking()
*
* This function is guaranteed to not block and is suitable for calling
* from crash/panic handlers.
*
* Crucially, this function will not block if it cannot acquire
* efivars_lock. Instead, it returns -EBUSY.
*/

> With efi_rts_wq, I think, I have increased the window of getting blocked. With efi_rts_wq,
> kernel should explicitly call schedule() to run firmware and the chances of getting blocked
> are much more.
>
> Expect this increased window, I think firmware should be executed as before.
>
> So, can you please explain me the difference between blocking and non-blocking variants
> from kernel perspective?
> (the way we get locks are different down_interruptible() vs down_trylock())
>
>> - If the non-blocking set_variable() does not use the work queue, can we just call
>> it from efi_delete_dummy_variable(), and keep the calls where they are?
>
> Yes, I think we can do that (if we don't use efi_rts_wq for non-blocking variants).
>

OK, then please implement that change.

Thanks,
Ard.

2018-05-27 05:33:25

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

> > Assume some user requested to execute some non-blocking variant of
> > efi_rts and the kernel hasn't called efi_call_virt() yet, but was
> > scheduled out. IOW, even though user requests for non-blocking efi call, we
> might still block. Am I right?
> >
>
> No, that is the whole point. These functions may be called from atomic context,
> which is why they trylock() and give up rather than block on the semaphore if a rt
> services call is already in progress. E.g.,
>
> /*
> * efivar_entry_set_nonblocking - call set_variable_nonblocking()
> *
> * This function is guaranteed to not block and is suitable for calling
> * from crash/panic handlers.
> *
> * Crucially, this function will not block if it cannot acquire
> * efivars_lock. Instead, it returns -EBUSY.
> */
>

One more question again, if we are sure that non-blocking variants will
_always_ be called in atomic context, then, we got it covered. Because, in
set_variable() and query_variable_info() (both blocking and non-blocking) we check
for in_atomic() and if so, we don't use efi_rts_wq (please refer to patch 3).

If you think, there might be a probability of calling non-blocking efi_rts out of atomic
context, then, sure! Let's make them never use efi_rts_wq.

Regards,
Sai

2018-05-27 08:20:56

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

On 27 May 2018 at 07:32, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > Assume some user requested to execute some non-blocking variant of
>> > efi_rts and the kernel hasn't called efi_call_virt() yet, but was
>> > scheduled out. IOW, even though user requests for non-blocking efi call, we
>> might still block. Am I right?
>> >
>>
>> No, that is the whole point. These functions may be called from atomic context,
>> which is why they trylock() and give up rather than block on the semaphore if a rt
>> services call is already in progress. E.g.,
>>
>> /*
>> * efivar_entry_set_nonblocking - call set_variable_nonblocking()
>> *
>> * This function is guaranteed to not block and is suitable for calling
>> * from crash/panic handlers.
>> *
>> * Crucially, this function will not block if it cannot acquire
>> * efivars_lock. Instead, it returns -EBUSY.
>> */
>>
>
> One more question again, if we are sure that non-blocking variants will
> _always_ be called in atomic context, then, we got it covered. Because, in
> set_variable() and query_variable_info() (both blocking and non-blocking) we check
> for in_atomic() and if so, we don't use efi_rts_wq (please refer to patch 3).
>
> If you think, there might be a probability of calling non-blocking efi_rts out of atomic
> context, then, sure! Let's make them never use efi_rts_wq.
>

This is not about what happens to be the current situation. It is about the API.

The non-blocking functions should never block, period. They either
fail gracefully or perform their duties without sleeping.

In this particular case, I think it is useful to have a guaranteed
non-blocking version, not only to delete the dummy EFI variable, but
potentially in other future cases as well, given that they can be
called much earlier in the boot (when the perf/%cr3 issue is not a
concern to begin with)

2018-05-27 08:38:20

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

> > One more question again, if we are sure that non-blocking variants
> > will _always_ be called in atomic context, then, we got it covered.
> > Because, in
> > set_variable() and query_variable_info() (both blocking and
> > non-blocking) we check for in_atomic() and if so, we don't use efi_rts_wq
> (please refer to patch 3).
> >
> > If you think, there might be a probability of calling non-blocking
> > efi_rts out of atomic context, then, sure! Let's make them never use
> efi_rts_wq.
> >
>
> This is not about what happens to be the current situation. It is about the API.
>
> The non-blocking functions should never block, period. They either fail gracefully
> or perform their duties without sleeping.

Yes, that makes sense.

>
> In this particular case, I think it is useful to have a guaranteed non-blocking
> version, not only to delete the dummy EFI variable, but potentially in other
> future cases as well, given that they can be called much earlier in the boot (when
> the perf/%cr3 issue is not a concern to begin with)

Thanks for making it more clear :)
I will change the non-blocking variants _not_ to use efi_rts_wq and as you suggested
make efi_delete_dummy_variable() use non-blocking variants (that should also make it
local to arch/x86).

Another follow on question is, does every firmware support both blocking and
non-blocking variants (specially legacy EFI firmware)? I am worried about
this because, presently efi_delete_dummy_variable() uses set_variable() and
query_variable_info() but if I change efi_delete_dummy_variable() to use non-blocking
variants and if they aren’t supported, then, I guess, efi_delete_dummy_variable() might
fail :(

So, could you please clarify on that?

Regards,
Sai

2018-05-27 12:29:10

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

On 27 May 2018 at 10:37, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > One more question again, if we are sure that non-blocking variants
>> > will _always_ be called in atomic context, then, we got it covered.
>> > Because, in
>> > set_variable() and query_variable_info() (both blocking and
>> > non-blocking) we check for in_atomic() and if so, we don't use efi_rts_wq
>> (please refer to patch 3).
>> >
>> > If you think, there might be a probability of calling non-blocking
>> > efi_rts out of atomic context, then, sure! Let's make them never use
>> efi_rts_wq.
>> >
>>
>> This is not about what happens to be the current situation. It is about the API.
>>
>> The non-blocking functions should never block, period. They either fail gracefully
>> or perform their duties without sleeping.
>
> Yes, that makes sense.
>
>>
>> In this particular case, I think it is useful to have a guaranteed non-blocking
>> version, not only to delete the dummy EFI variable, but potentially in other
>> future cases as well, given that they can be called much earlier in the boot (when
>> the perf/%cr3 issue is not a concern to begin with)
>
> Thanks for making it more clear :)
> I will change the non-blocking variants _not_ to use efi_rts_wq and as you suggested
> make efi_delete_dummy_variable() use non-blocking variants (that should also make it
> local to arch/x86).
>

Yes, please.

> Another follow on question is, does every firmware support both blocking and
> non-blocking variants (specially legacy EFI firmware)? I am worried about
> this because, presently efi_delete_dummy_variable() uses set_variable() and
> query_variable_info() but if I change efi_delete_dummy_variable() to use non-blocking
> variants and if they aren’t supported, then, I guess, efi_delete_dummy_variable() might
> fail :(
>
> So, could you please clarify on that?
>

I don't follow. Why should it make any difference to the firmware
whether the OS routines blocks or gives up? We always honor the mutual
exclusion between different invocations of runtime services, and the
firmware itself has no awareness of the kind of scheduling the OS
needs to do to ensure this.

2018-05-27 15:50:10

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

> > Another follow on question is, does every firmware support both
> > blocking and non-blocking variants (specially legacy EFI firmware)? I
> > am worried about this because, presently efi_delete_dummy_variable()
> > uses set_variable() and
> > query_variable_info() but if I change efi_delete_dummy_variable() to
> > use non-blocking variants and if they aren’t supported, then, I guess,
> > efi_delete_dummy_variable() might fail :(
> >
> > So, could you please clarify on that?
> >
>
> I don't follow. Why should it make any difference to the firmware whether the
> OS routines blocks or gives up? We always honor the mutual exclusion between
> different invocations of runtime services, and the firmware itself has no
> awareness of the kind of scheduling the OS needs to do to ensure this.

Sorry! my bad.. I thought firmware (with EFI System table revision > 2.0 ) offers two
types of efi run time services, a blocking variant and a non-blocking variant. But, now I
noticed in the spec that there is only set_variable() but _no_ set_variable_nonblocking().
Same with query_variable_info(). The same is also seen in runtime-wrappers.c file.
Both the blocking and non-blocking variants call the same efi runtime service. I see that
non-blocking() variants are just an additional feature (API) offered by OS.

Regards,
Sai