2018-03-05 23:31:57

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V2 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

From: Sai Praneeth <[email protected]>

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
-----------------------
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on real ARM/ARM64 machines.
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc4

Changes from V1 to V2:
----------------------
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
efi: Introduce efi_rts_workqueue and some infrastructure to invoke all
efi_runtime_services()
efi: Use efi_rts_workqueue to invoke EFI Runtime Services

arch/x86/include/asm/efi.h | 1 -
arch/x86/platform/efi/efi.c | 6 -
drivers/firmware/efi/efi.c | 21 +++
drivers/firmware/efi/runtime-wrappers.c | 229 +++++++++++++++++++++++++++++---
include/linux/efi.h | 23 ++++
5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Dan Williams <[email protected]>

--
2.7.4



2018-03-05 23:30:57

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

From: Sai Praneeth <[email protected]>

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
-----------------------
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Introduce a handler function (called efi_call_rts()) that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Dan Williams <[email protected]>
---
drivers/firmware/efi/runtime-wrappers.c | 168 ++++++++++++++++++++++++++++----
1 file changed, 148 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index 649763171439..eff443bf942c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -151,13 +151,105 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
*/
static DEFINE_SEMAPHORE(efi_runtime_lock);

+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+ struct efi_runtime_work *efi_rts_work;
+ void *arg1, *arg2, *arg3, *arg4, *arg5;
+ efi_status_t status = EFI_NOT_FOUND;
+
+ efi_rts_work = container_of(work, struct efi_runtime_work, work);
+ arg1 = efi_rts_work->arg1;
+ arg2 = efi_rts_work->arg2;
+ arg3 = efi_rts_work->arg3;
+ arg4 = efi_rts_work->arg4;
+ arg5 = efi_rts_work->arg5;
+
+ switch (efi_rts_work->func) {
+ case GET_TIME:
+ status = efi_call_virt(get_time, (efi_time_t *)arg1,
+ (efi_time_cap_t *)arg2);
+ break;
+ case SET_TIME:
+ status = efi_call_virt(set_time, (efi_time_t *)arg1);
+ break;
+ case GET_WAKEUP_TIME:
+ status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+ (efi_bool_t *)arg2, (efi_time_t *)arg3);
+ break;
+ case SET_WAKEUP_TIME:
+ status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+ (efi_time_t *)arg2);
+ break;
+ case GET_VARIABLE:
+ status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, (u32 *)arg3,
+ (unsigned long *)arg4, (void *)arg5);
+ break;
+ case GET_NEXT_VARIABLE:
+ status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+ (efi_char16_t *)arg2,
+ (efi_guid_t *)arg3);
+ break;
+ case SET_VARIABLE:
+ case SET_VARIABLE_NONBLOCKING:
+ status = efi_call_virt(set_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, *(u32 *)arg3,
+ *(unsigned long *)arg4, (void *)arg5);
+ break;
+ case QUERY_VARIABLE_INFO:
+ case QUERY_VARIABLE_INFO_NONBLOCKING:
+ status = efi_call_virt(query_variable_info, *(u32 *)arg1,
+ (u64 *)arg2, (u64 *)arg3, (u64 *)arg4);
+ break;
+ case GET_NEXT_HIGH_MONO_COUNT:
+ status = efi_call_virt(get_next_high_mono_count, (u32 *)arg1);
+ break;
+ case RESET_SYSTEM:
+ __efi_call_virt(reset_system, *(int *)arg1,
+ *(efi_status_t *)arg2,
+ *(unsigned long *)arg3,
+ (efi_char16_t *)arg4);
+ break;
+ case UPDATE_CAPSULE:
+ status = efi_call_virt(update_capsule,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2,
+ *(unsigned long *)arg3);
+ break;
+ case QUERY_CAPSULE_CAPS:
+ status = efi_call_virt(query_capsule_caps,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2, (u64 *)arg3,
+ (int *)arg4);
+ break;
+ default:
+ /*
+ * Ideally, we should never reach here because a caller of this
+ * function should have put the right efi_runtime_service()
+ * function identifier into efi_rts_work->func
+ */
+ BUG();
+ }
+ efi_rts_work->status = status;
+}
+
static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
{
efi_status_t status;

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_time, tm, tc);
+ status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -168,7 +260,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_time, tm);
+ status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -181,7 +273,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+ status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -192,7 +285,8 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_wakeup_time, enabled, tm);
+ status = efi_queue_work(SET_WAKEUP_TIME, &enabled, tm, NULL, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -207,8 +301,8 @@ static efi_status_t virt_efi_get_variable(efi_char16_t *name,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_variable, name, vendor, attr, data_size,
- data);
+ status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size,
+ data);
up(&efi_runtime_lock);
return status;
}
@@ -221,7 +315,8 @@ static efi_status_t virt_efi_get_next_variable(unsigned long *name_size,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_variable, name_size, name, vendor);
+ status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -236,8 +331,15 @@ static efi_status_t virt_efi_set_variable(efi_char16_t *name,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_variable, name, vendor, attr, data_size,
- data);
+
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(SET_VARIABLE, name, vendor, &attr,
+ &data_size, data);
+ else
+ status = efi_call_virt(set_variable, name, vendor, attr,
+ data_size, data);
+
up(&efi_runtime_lock);
return status;
}
@@ -252,8 +354,14 @@ virt_efi_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor,
if (down_trylock(&efi_runtime_lock))
return EFI_NOT_READY;

- status = efi_call_virt(set_variable, name, vendor, attr, data_size,
- data);
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(SET_VARIABLE_NONBLOCKING, &name, vendor,
+ &attr, &data_size, data);
+ else
+ status = efi_call_virt(set_variable, name, vendor, attr,
+ data_size, data);
+
up(&efi_runtime_lock);
return status;
}
@@ -271,8 +379,17 @@ static efi_status_t virt_efi_query_variable_info(u32 attr,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_variable_info, attr, storage_space,
- remaining_space, max_variable_size);
+
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(QUERY_VARIABLE_INFO, &attr,
+ storage_space, remaining_space,
+ max_variable_size, NULL);
+ else
+ status = efi_call_virt(query_variable_info, attr,
+ storage_space, remaining_space,
+ max_variable_size);
+
up(&efi_runtime_lock);
return status;
}
@@ -291,8 +408,16 @@ virt_efi_query_variable_info_nonblocking(u32 attr,
if (down_trylock(&efi_runtime_lock))
return EFI_NOT_READY;

- status = efi_call_virt(query_variable_info, attr, storage_space,
- remaining_space, max_variable_size);
+ /* pstore shouldn't use efi_rts_wq while in atomic */
+ if (!in_atomic())
+ status = efi_queue_work(QUERY_VARIABLE_INFO_NONBLOCKING, &attr,
+ storage_space, remaining_space,
+ max_variable_size, NULL);
+ else
+ status = efi_call_virt(query_variable_info, attr,
+ storage_space, remaining_space,
+ max_variable_size);
+
up(&efi_runtime_lock);
return status;
}
@@ -303,7 +428,8 @@ static efi_status_t virt_efi_get_next_high_mono_count(u32 *count)

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_high_mono_count, count);
+ status = efi_queue_work(GET_NEXT_HIGH_MONO_COUNT, count, NULL, NULL,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -318,7 +444,8 @@ static void virt_efi_reset_system(int reset_type,
"could not get exclusive access to the firmware\n");
return;
}
- __efi_call_virt(reset_system, reset_type, status, data_size, data);
+ efi_queue_work(RESET_SYSTEM, &reset_type, &status, &data_size, data,
+ NULL);
up(&efi_runtime_lock);
}

@@ -333,7 +460,8 @@ static efi_status_t virt_efi_update_capsule(efi_capsule_header_t **capsules,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(update_capsule, capsules, count, sg_list);
+ status = efi_queue_work(UPDATE_CAPSULE, capsules, &count, &sg_list,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -350,8 +478,8 @@ static efi_status_t virt_efi_query_capsule_caps(efi_capsule_header_t **capsules,

if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_capsule_caps, capsules, count, max_size,
- reset_type);
+ status = efi_queue_work(QUERY_CAPSULE_CAPS, capsules, &count,
+ max_size, reset_type, NULL);
up(&efi_runtime_lock);
return status;
}
--
2.7.4


2018-03-05 23:31:35

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

From: Sai Praneeth <[email protected]>

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Dan Williams <[email protected]>
---
arch/x86/include/asm/efi.h | 1 -
arch/x86/platform/efi/efi.c | 6 ------
drivers/firmware/efi/efi.c | 6 ++++++
include/linux/efi.h | 3 +++
4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index a399c1ebf6f0..43009e3f821b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,7 +143,6 @@ extern void __init efi_runtime_update_mappings(void);
extern void __init efi_dump_pagetable(void);
extern void __init efi_apply_memmap_quirks(void);
extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);

struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)

if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
- /* clean DUMMY object */
- efi_delete_dummy_variable();
#endif
}

@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
* necessary relocation fixups for the new virtual addresses.
*/
efi_runtime_update_mappings();
-
- /* clean DUMMY object */
- efi_delete_dummy_variable();
}

void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..838b8efe639c 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;

+ /*
+ * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+ * it should be invoked only after efi_rts_workqueue is ready.
+ */
+ efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
bool nonblocking);
extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
#else
static inline void efi_late_init(void) {}
static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
{
return EFI_SUCCESS;
}
+
+static inline void efi_delete_dummy_variable(void) {}
#endif
extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);

--
2.7.4


2018-03-05 23:32:34

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

From: Sai Praneeth <[email protected]>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Dan Williams <[email protected]>
---
drivers/firmware/efi/efi.c | 15 ++++++++
drivers/firmware/efi/runtime-wrappers.c | 61 +++++++++++++++++++++++++++++++++
include/linux/efi.h | 20 +++++++++++
3 files changed, 96 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 838b8efe639c..04b46c62f3ce 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -75,6 +75,8 @@ static unsigned long *efi_tables[] = {
&efi.mem_attr_table,
};

+struct workqueue_struct *efi_rts_wq;
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
return 0;

/*
+ * Since we process only one efi_runtime_service() at a time, an
+ * ordered workqueue (which creates only one execution context)
+ * should suffice all our needs.
+ */
+ efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
+ if (!efi_rts_wq) {
+ pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
+ "disabled.\n");
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ return 0;
+ }
+
+ /*
* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
* it should be invoked only after efi_rts_workqueue is ready.
*/
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..649763171439 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
/*
* runtime-wrappers.c - Runtime Services function call wrappers
*
+ * Implementation summary:
+ * -----------------------
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
* Copyright (C) 2014 Linaro Ltd. <[email protected]>
*
* Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
#include <linux/mutex.h>
#include <linux/semaphore.h>
#include <linux/stringify.h>
+#include <linux/workqueue.h>
+
#include <asm/efi.h>

/*
@@ -33,6 +43,57 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)

+/* efi_runtime_service() function identifiers */
+enum {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ SET_VARIABLE_NONBLOCKING,
+ QUERY_VARIABLE_INFO,
+ QUERY_VARIABLE_INFO_NONBLOCKING,
+ GET_NEXT_HIGH_MONO_COUNT,
+ RESET_SYSTEM,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts: efi_runtime_service() function identifier
+ * @rts_arg<1-5>: efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the queued
+ * work gets flushed.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
+({ \
+ struct efi_runtime_work efi_rts_work; \
+ \
+ INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
+ efi_rts_work.func = _rts; \
+ efi_rts_work.arg1 = _arg1; \
+ efi_rts_work.arg2 = _arg2; \
+ efi_rts_work.arg3 = _arg3; \
+ efi_rts_work.arg4 = _arg4; \
+ efi_rts_work.arg5 = _arg5; \
+ /* \
+ * queue_work() returns 0 if work was already on queue, \
+ * _ideally_ this should never happen. \
+ */ \
+ if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
+ flush_work(&efi_rts_work.work); \
+ else \
+ BUG(); \
+ \
+ efi_rts_work.status; \
+})
+
void efi_call_virt_check_flags(unsigned long flags, const char *call)
{
unsigned long cur_flags, mismatch;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index c4efb3ef0dfa..bb06b71af55c 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1652,4 +1652,24 @@ struct linux_efi_tpm_eventlog {

extern int efi_tpm_eventlog_init(void);

+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @func: EFI Runtime Service function identifier
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ */
+struct efi_runtime_work {
+ u8 func;
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+};
+
+/* Workqueue to queue EFI Runtime Services */
+extern struct workqueue_struct *efi_rts_wq;
+
#endif /* _LINUX_EFI_H */
--
2.7.4


2018-03-06 00:07:04

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

On Mon, Mar 5, 2018 at 3:23 PM, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Presently, efi_runtime_services() are executed by firmware in process
> context. To execute efi_runtime_service(), kernel switches the page
> directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
> user space mappings. A potential issue could be, for instance, an NMI
> interrupt (like perf) trying to profile some user data while in efi_pgd.
>
> A solution for this issue could be to use kthread to run
> efi_runtime_service(). When a user/kernel thread requests to execute
> efi_runtime_service(), kernel off-loads this work to kthread which in
> turn uses efi_pgd. Anything that tries to touch user space addresses
> while in kthread is terminally broken. This patch adds support to efi
> subsystem to handle all calls to efi_runtime_services() using a work
> queue (which in turn uses kthread).
>
> Implementation summary:
> -----------------------
> 1. When user/kernel thread requests to execute efi_runtime_service(),
> enqueue work to efi_rts_workqueue.
> 2. Caller thread waits until the work is finished because it's dependent
> on the return status of efi_runtime_service().
>
> Semantics to pack arguments in efi_runtime_work (has void pointers):
> 1. If argument is a pointer (of any type), pass it as is.
> 2. If argument is a value (of any type), address of the value is
> passed.
>
> Introduce a handler function (called efi_call_rts()) that
> a. understands efi_runtime_work and
> b. invokes the appropriate efi_runtime_service() with the
> appropriate arguments
>
> Semantics followed by efi_call_rts() to understand efi_runtime_work:
> 1. If argument was a pointer, recast it from void pointer to original
> pointer type.
> 2. If argument was a value, recast it from void pointer to original
> pointer type and dereference it.
>
> pstore writes could potentially be invoked in interrupt context and it
> uses set_variable<>() and query_variable_info<>() to store logs. If we
> invoke efi_runtime_services() through efi_rts_wq while in atomic()
> kernel issues a warning ("scheduling wile in atomic") and prints stack
> trace. One way to overcome this is to not make the caller process wait
> for the worker thread to finish. This approach breaks pstore i.e. the
> log messages aren't written to efi variables. Hence, pstore calls
> efi_runtime_services() without using efi_rts_wq or in other words
> efi_rts_wq will be used unconditionally for all the
> efi_runtime_services() except set_variable<>() and
> query_variable_info<>()


Is there a place in the system reboot path where we can try to flush
these asynchronous pstore writes from interrupt context? It seems
unfortunate that we need to have this wide exception for all
set_variable() calls. Either that or switch to an explicit "emergency
mode" where we stop caring about protecting the system from EFI
runtime code because we're already crashing.

2018-03-06 00:58:23

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

> > Presently, efi_runtime_services() are executed by firmware in process
> > context. To execute efi_runtime_service(), kernel switches the page
> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
> > any user space mappings. A potential issue could be, for instance, an
> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
> >
> > A solution for this issue could be to use kthread to run
> > efi_runtime_service(). When a user/kernel thread requests to execute
> > efi_runtime_service(), kernel off-loads this work to kthread which in
> > turn uses efi_pgd. Anything that tries to touch user space addresses
> > while in kthread is terminally broken. This patch adds support to efi
> > subsystem to handle all calls to efi_runtime_services() using a work
> > queue (which in turn uses kthread).
> >
> > Implementation summary:
> > -----------------------
> > 1. When user/kernel thread requests to execute efi_runtime_service(),
> > enqueue work to efi_rts_workqueue.
> > 2. Caller thread waits until the work is finished because it's
> > dependent on the return status of efi_runtime_service().
> >
> > Semantics to pack arguments in efi_runtime_work (has void pointers):
> > 1. If argument is a pointer (of any type), pass it as is.
> > 2. If argument is a value (of any type), address of the value is
> > passed.
> >
> > Introduce a handler function (called efi_call_rts()) that
> > a. understands efi_runtime_work and
> > b. invokes the appropriate efi_runtime_service() with the
> > appropriate arguments
> >
> > Semantics followed by efi_call_rts() to understand efi_runtime_work:
> > 1. If argument was a pointer, recast it from void pointer to original
> > pointer type.
> > 2. If argument was a value, recast it from void pointer to original
> > pointer type and dereference it.
> >
> > pstore writes could potentially be invoked in interrupt context and it
> > uses set_variable<>() and query_variable_info<>() to store logs. If we
> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
> > kernel issues a warning ("scheduling wile in atomic") and prints stack
> > trace. One way to overcome this is to not make the caller process wait
> > for the worker thread to finish. This approach breaks pstore i.e. the
> > log messages aren't written to efi variables. Hence, pstore calls
> > efi_runtime_services() without using efi_rts_wq or in other words
> > efi_rts_wq will be used unconditionally for all the
> > efi_runtime_services() except set_variable<>() and
> > query_variable_info<>()
>
>
> Is there a place in the system reboot path where we can try to flush these
> asynchronous pstore writes from interrupt context?

I don't think so because, the issue is not with the pstore writes but with pstore
using efi as backing store. Anything could register as pstore backend, eg: RAM,
ACPI-ERST etc.. and AFAIK, they don’t use work queues to store logs. Now that
efi_runtime_services() uses work queues, we unfortunately have to have this hack.

> It seems unfortunate that
> we need to have this wide exception for all
> set_variable() calls.

True, basically any efi_runtime_service() that might get called in interrupt context.
I am not very happy to have the hack too, but didn’t find other way.

Either that or switch to an explicit "emergency mode" where
> we stop caring about protecting the system from EFI runtime code because
> we're already crashing.

Should we care about extra warning (scheduling while in atomic) when we are already
crashing? This sounds kind of debatable. I will wait for feedback from community it they
think it's OK or maybe a better solution.

Regards,
Sai

2018-03-06 11:16:17

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Mon, Mar 05, 2018 at 03:23:09PM -0800, Sai Praneeth Prakhya wrote:
> @@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
> return 0;
>
> /*
> + * Since we process only one efi_runtime_service() at a time, an
> + * ordered workqueue (which creates only one execution context)
> + * should suffice all our needs.
> + */
> + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
> + if (!efi_rts_wq) {
> + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
> + "disabled.\n");
> + clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> + return 0;
> + }

I'm a little worried that something might sample this flag between it
being set in an early_initcall (arm_enable_runtime_services), and
cleared in a subsys_initcall here.

However, nothing seems to do that so far, so maybe that's ok...

[...]

> +/* efi_runtime_service() function identifiers */
> +enum {
> + GET_TIME,
> + SET_TIME,
> + GET_WAKEUP_TIME,
> + SET_WAKEUP_TIME,
> + GET_VARIABLE,
> + GET_NEXT_VARIABLE,
> + SET_VARIABLE,
> + SET_VARIABLE_NONBLOCKING,
> + QUERY_VARIABLE_INFO,
> + QUERY_VARIABLE_INFO_NONBLOCKING,
> + GET_NEXT_HIGH_MONO_COUNT,
> + RESET_SYSTEM,
> + UPDATE_CAPSULE,
> + QUERY_CAPSULE_CAPS,
> +};

Can we please give this enum a name....

[...]

> +/*
> + * efi_runtime_work: Details of EFI Runtime Service work
> + * @func: EFI Runtime Service function identifier
> + * @arg<1-5>: EFI Runtime Service function arguments
> + * @status: Status of executing EFI Runtime Service
> + */
> +struct efi_runtime_work {
> + u8 func;

... and use it here rather than an opaque u8? I realise that means
placing the enum in <linux/efi.h>.

> + void *arg1;
> + void *arg2;
> + void *arg3;
> + void *arg4;
> + void *arg5;
> + efi_status_t status;
> + struct work_struct work;
> +};

Thanks,
Mark.

2018-03-06 11:27:28

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

On Mon, Mar 05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth <[email protected]>
>
> Presently, efi_runtime_services() are executed by firmware in process
> context. To execute efi_runtime_service(), kernel switches the page
> directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
> user space mappings. A potential issue could be, for instance, an NMI
> interrupt (like perf) trying to profile some user data while in efi_pgd.

It might be worth pointing out that this could result in disaster (e.g.
if the frame pointer happens to point at MMIO in the EFI runtime
services mappings).

[...]

> pstore writes could potentially be invoked in interrupt context and it
> uses set_variable<>() and query_variable_info<>() to store logs. If we
> invoke efi_runtime_services() through efi_rts_wq while in atomic()
> kernel issues a warning ("scheduling wile in atomic") and prints stack
> trace. One way to overcome this is to not make the caller process wait
> for the worker thread to finish. This approach breaks pstore i.e. the
> log messages aren't written to efi variables. Hence, pstore calls
> efi_runtime_services() without using efi_rts_wq or in other words
> efi_rts_wq will be used unconditionally for all the
> efi_runtime_services() except set_variable<>() and
> query_variable_info<>()

Can't NMIs still come in during this?

... or do we assume that since something has already gone wrong, this
doesn't matter?

Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I
can boot up, read the EFI RTC, and reboot. I ahd lockdep and KASAN
enabled, I received no splats from either.

FWIW:

Tested-by: Mark Rutland <[email protected] [arm64]

Thanks,
Mark.

2018-03-07 11:58:38

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Tue, Mar 6, 2018 at 12:23 AM, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> When a process requests the kernel to execute any efi_runtime_service(),
> the requested efi_runtime_service (represented as an identifier) and its
> arguments are packed into a struct named efi_runtime_work and queued
> onto work queue named efi_rts_wq. The caller then waits until the work
> is completed.
>
> Introduce some infrastructure:
> 1. Creating workqueue named efi_rts_wq
> 2. A macro (efi_queue_work()) that
> a. populates efi_runtime_work
> b. queues work onto efi_rts_wq and
> c. waits until worker thread returns
>
> The caller thread has to wait until the worker thread returns, because
> it's dependent on the return status of efi_runtime_service() and, in
> specific cases, the arguments populated by efi_runtime_service(). Some
> efi_runtime_services() takes a pointer to buffer as an argument and
> fills up the buffer with requested data. For instance,
> efi_get_variable() and efi_get_next_variable(). Hence, caller process
> cannot just post the work and get going.
>
> Some facts about efi_runtime_services():
> 1. A quick look at all the efi_runtime_services() shows that any
> efi_runtime_service() has five or less arguments.
> 2. An argument of efi_runtime_service() can be a value (of any type)
> or a pointer (of any type).
> Hence, efi_runtime_work has five void pointers to store these arguments.
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Lee, Chun-Yi <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Cc: Ravi Shankar <[email protected]>
> Cc: Matt Fleming <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Dan Williams <[email protected]>
> ---
> drivers/firmware/efi/efi.c | 15 ++++++++
> drivers/firmware/efi/runtime-wrappers.c | 61 +++++++++++++++++++++++++++++++++
> include/linux/efi.h | 20 +++++++++++
> 3 files changed, 96 insertions(+)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 838b8efe639c..04b46c62f3ce 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -75,6 +75,8 @@ static unsigned long *efi_tables[] = {
> &efi.mem_attr_table,
> };
>
> +struct workqueue_struct *efi_rts_wq;
> +
> static bool disable_runtime;
> static int __init setup_noefi(char *arg)
> {
> @@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
> return 0;
>
> /*
> + * Since we process only one efi_runtime_service() at a time, an
> + * ordered workqueue (which creates only one execution context)
> + * should suffice all our needs.
> + */
> + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);

efi_rts_wq or efi_rts_workqueue?

> + if (!efi_rts_wq) {
> + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "

Same here.

> + "disabled.\n");
> + clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> + return 0;
> + }
> +
> + /*
> * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> * it should be invoked only after efi_rts_workqueue is ready.
> */
> diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
> index ae54870b2788..649763171439 100644
> --- a/drivers/firmware/efi/runtime-wrappers.c
> +++ b/drivers/firmware/efi/runtime-wrappers.c
> @@ -1,6 +1,14 @@
> /*
> * runtime-wrappers.c - Runtime Services function call wrappers
> *
> + * Implementation summary:
> + * -----------------------
> + * 1. When user/kernel thread requests to execute efi_runtime_service(),
> + * enqueue work to efi_rts_workqueue.
> + * 2. Caller thread waits until the work is finished because it's
> + * dependent on the return status and execution of efi_runtime_service().
> + * For instance, get_variable() and get_next_variable().
> + *
> * Copyright (C) 2014 Linaro Ltd. <[email protected]>
> *
> * Split off from arch/x86/platform/efi/efi.c
> @@ -22,6 +30,8 @@
> #include <linux/mutex.h>
> #include <linux/semaphore.h>
> #include <linux/stringify.h>
> +#include <linux/workqueue.h>
> +
> #include <asm/efi.h>
>
> /*
> @@ -33,6 +43,57 @@
> #define __efi_call_virt(f, args...) \
> __efi_call_virt_pointer(efi.systab->runtime, f, args)
>
> +/* efi_runtime_service() function identifiers */
> +enum {
> + GET_TIME,
> + SET_TIME,
> + GET_WAKEUP_TIME,
> + SET_WAKEUP_TIME,
> + GET_VARIABLE,
> + GET_NEXT_VARIABLE,
> + SET_VARIABLE,
> + SET_VARIABLE_NONBLOCKING,
> + QUERY_VARIABLE_INFO,
> + QUERY_VARIABLE_INFO_NONBLOCKING,
> + GET_NEXT_HIGH_MONO_COUNT,
> + RESET_SYSTEM,
> + UPDATE_CAPSULE,
> + QUERY_CAPSULE_CAPS,
> +};

Much cleaner now :-)

> +
> +/*
> + * efi_queue_work: Queue efi_runtime_service() and wait until it's done
> + * @rts: efi_runtime_service() function identifier
> + * @rts_arg<1-5>: efi_runtime_service() function arguments
> + *
> + * Accesses to efi_runtime_services() are serialized by a binary
> + * semaphore (efi_runtime_lock) and caller waits until the work is
> + * finished, hence _only_ one work is queued at a time and the queued
> + * work gets flushed.
> + */
> +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
> +({ \
> + struct efi_runtime_work efi_rts_work; \
> + \
> + INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
> + efi_rts_work.func = _rts; \
> + efi_rts_work.arg1 = _arg1; \
> + efi_rts_work.arg2 = _arg2; \
> + efi_rts_work.arg3 = _arg3; \
> + efi_rts_work.arg4 = _arg4; \
> + efi_rts_work.arg5 = _arg5; \
> + /* \
> + * queue_work() returns 0 if work was already on queue, \
> + * _ideally_ this should never happen. \
> + */ \
> + if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
> + flush_work(&efi_rts_work.work); \
> + else \
> + BUG(); \

Thanks for the change! One remark, I would just do:

> + if (!queue_work(efi_rts_wq, &efi_rts_work.work)) \
> + BUG(); \

And then you can unindent:

> + flush_work(&efi_rts_work.work); \

> + \
> + efi_rts_work.status; \
> +})
> +
> void efi_call_virt_check_flags(unsigned long flags, const char *call)
> {
> unsigned long cur_flags, mismatch;
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index c4efb3ef0dfa..bb06b71af55c 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -1652,4 +1652,24 @@ struct linux_efi_tpm_eventlog {
>
> extern int efi_tpm_eventlog_init(void);
>
> +/*
> + * efi_runtime_work: Details of EFI Runtime Service work
> + * @func: EFI Runtime Service function identifier
> + * @arg<1-5>: EFI Runtime Service function arguments
> + * @status: Status of executing EFI Runtime Service
> + */
> +struct efi_runtime_work {
> + u8 func;
> + void *arg1;
> + void *arg2;
> + void *arg3;
> + void *arg4;
> + void *arg5;
> + efi_status_t status;
> + struct work_struct work;
> +};

Why is efi_runtime_work in the .h at all?

Please CC me for the next version! :-)

Cheers,
Miguel

> +
> +/* Workqueue to queue EFI Runtime Services */
> +extern struct workqueue_struct *efi_rts_wq;
> +
> #endif /* _LINUX_EFI_H */
> --
> 2.7.4
>

2018-03-07 12:12:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Mon, Mar 05, 2018 at 03:23:09PM -0800, Sai Praneeth Prakhya wrote:
> +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
> +({ \
> + struct efi_runtime_work efi_rts_work; \
> + \
> + INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
> + efi_rts_work.func = _rts; \
> + efi_rts_work.arg1 = _arg1; \
> + efi_rts_work.arg2 = _arg2; \
> + efi_rts_work.arg3 = _arg3; \
> + efi_rts_work.arg4 = _arg4; \
> + efi_rts_work.arg5 = _arg5; \
> + /* \
> + * queue_work() returns 0 if work was already on queue, \
> + * _ideally_ this should never happen. \
> + */ \
> + if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
> + flush_work(&efi_rts_work.work); \
> + else \
> + BUG(); \

So failure to queue that work is such a critical problem that we need
to BUG() and can't possibly continue and shoult not attempt recovery at
all?

IOW, we should always strive to fail gracefully and not shit in pants at
the first sign of trouble.

Even checkpatch warns here:

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#184: FILE: drivers/firmware/efi/runtime-wrappers.c:92:
+ BUG(); \


and by looking at the other output, you should run your patches through
checkpatch. Some of the things make sense like:

WARNING: quoted string split across lines
#97: FILE: drivers/firmware/efi/efi.c:341:
+ pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
+ "disabled.\n");

for example.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-03-08 04:01:34

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

-0800, Sai Praneeth Prakhya wrote:
> > @@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
> > return 0;
> >
> > /*
> > + * Since we process only one efi_runtime_service() at a time, an
> > + * ordered workqueue (which creates only one execution context)
> > + * should suffice all our needs.
> > + */
> > + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
> > + if (!efi_rts_wq) {
> > + pr_err("Failed to create efi_rts_workqueue, EFI runtime services
> "
> > + "disabled.\n");
> > + clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> > + return 0;
> > + }
>
> I'm a little worried that something might sample this flag between it being set in
> an early_initcall (arm_enable_runtime_services), and cleared in a subsys_initcall
> here.
>
> However, nothing seems to do that so far, so maybe that's ok...
>

Thanks for raising this. I will take a look at initcalls.

> [...]
>
> > +/* efi_runtime_service() function identifiers */ enum {
> > + GET_TIME,
> > + SET_TIME,
> > + GET_WAKEUP_TIME,
> > + SET_WAKEUP_TIME,
> > + GET_VARIABLE,
> > + GET_NEXT_VARIABLE,
> > + SET_VARIABLE,
> > + SET_VARIABLE_NONBLOCKING,
> > + QUERY_VARIABLE_INFO,
> > + QUERY_VARIABLE_INFO_NONBLOCKING,
> > + GET_NEXT_HIGH_MONO_COUNT,
> > + RESET_SYSTEM,
> > + UPDATE_CAPSULE,
> > + QUERY_CAPSULE_CAPS,
> > +};
>
> Can we please give this enum a name....

Sure! Added in V3.

>
> [...]
>
> > +/*
> > + * efi_runtime_work: Details of EFI Runtime Service work
> > + * @func: EFI Runtime Service function identifier
> > + * @arg<1-5>: EFI Runtime Service function arguments
> > + * @status: Status of executing EFI Runtime Service
> > + */
> > +struct efi_runtime_work {
> > + u8 func;
>
> ... and use it here rather than an opaque u8? I realise that means placing the
> enum in <linux/efi.h>.
>

Actually, with Miguel comments, I am considering making this struct static and moving
it to runtime-wrappers.c, since "struct efi_runtime_work" isn't really being used anywhere
except runtime-wrappers.c. Please see in V3.

> > + void *arg1;
> > + void *arg2;
> > + void *arg3;
> > + void *arg4;
> > + void *arg5;
> > + efi_status_t status;
> > + struct work_struct work;
> > +};
>
> Thanks,
> Mark.

2018-03-08 04:13:11

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
> > From: Sai Praneeth <[email protected]>
> >
> > Presently, efi_runtime_services() are executed by firmware in process
> > context. To execute efi_runtime_service(), kernel switches the page
> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
> > any user space mappings. A potential issue could be, for instance, an
> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
>
> It might be worth pointing out that this could result in disaster (e.g.
> if the frame pointer happens to point at MMIO in the EFI runtime services
> mappings).
>

Sorry! I didn't get it. I would like to add this point, so could you please
explain it more?

> [...]
>
> > pstore writes could potentially be invoked in interrupt context and it
> > uses set_variable<>() and query_variable_info<>() to store logs. If we
> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
> > kernel issues a warning ("scheduling wile in atomic") and prints stack
> > trace. One way to overcome this is to not make the caller process wait
> > for the worker thread to finish. This approach breaks pstore i.e. the
> > log messages aren't written to efi variables. Hence, pstore calls
> > efi_runtime_services() without using efi_rts_wq or in other words
> > efi_rts_wq will be used unconditionally for all the
> > efi_runtime_services() except set_variable<>() and
> > query_variable_info<>()
>
> Can't NMIs still come in during this?
>
> ... or do we assume that since something has already gone wrong, this doesn't
> matter?
>

I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
Since, we are still executing stuff to log messages and NMI's can't be masked,
there is still a possibility for NMI's to occur (please correct me if I am wrong).
But, as you said earlier, I guess it doesn't matter because anyways we are going down.

> Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I can boot
> up, read the EFI RTC, and reboot. I ahd lockdep and KASAN enabled, I received
> no splats from either.
>
> FWIW:
>
> Tested-by: Mark Rutland <[email protected] [arm64]
>
> Thanks,
> Mark.

2018-03-08 04:24:01

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

> > +struct workqueue_struct *efi_rts_wq;
> > +
> > static bool disable_runtime;
> > static int __init setup_noefi(char *arg) { @@ -329,6 +331,19 @@
> > static int __init efisubsys_init(void)
> > return 0;
> >
> > /*
> > + * Since we process only one efi_runtime_service() at a time, an
> > + * ordered workqueue (which creates only one execution context)
> > + * should suffice all our needs.
> > + */
> > + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
>
> efi_rts_wq or efi_rts_workqueue?
>
> > + if (!efi_rts_wq) {
> > + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
>
> Same here.

Sure! I will make it consistent with "efi_rts_wq". Just tried to be more verbose
with names :)

[...]

> > +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
> > +({ \
> > + struct efi_runtime_work efi_rts_work; \
> > + \
> > + INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
> > + efi_rts_work.func = _rts; \
> > + efi_rts_work.arg1 = _arg1; \
> > + efi_rts_work.arg2 = _arg2; \
> > + efi_rts_work.arg3 = _arg3; \
> > + efi_rts_work.arg4 = _arg4; \
> > + efi_rts_work.arg5 = _arg5; \
> > + /* \
> > + * queue_work() returns 0 if work was already on queue, \
> > + * _ideally_ this should never happen. \
> > + */ \
> > + if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
> > + flush_work(&efi_rts_work.work); \
> > + else \
> > + BUG(); \
>
> Thanks for the change! One remark, I would just do:

Sorry! but I am planning to remove BUG(). Looks like it could defeat the purpose
of patch. Please see Boris comments on the other thread.

[...]

> > +/*
> > + * efi_runtime_work: Details of EFI Runtime Service work
> > + * @func: EFI Runtime Service function identifier
> > + * @arg<1-5>: EFI Runtime Service function arguments
> > + * @status: Status of executing EFI Runtime Service
> > + */
> > +struct efi_runtime_work {
> > + u8 func;
> > + void *arg1;
> > + void *arg2;
> > + void *arg3;
> > + void *arg4;
> > + void *arg5;
> > + efi_status_t status;
> > + struct work_struct work;
> > +};
>
> Why is efi_runtime_work in the .h at all?
>

Thanks for the catch. I will move it to runtime-wrappers.c file and will make it
static too. It isn't being used in any other place.

> Please CC me for the next version! :-)

Sure! Sorry for that. I should have done in V2.

Regards,
Sai

2018-03-08 04:34:14

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

On Wed, Mar 7, 2018 at 8:11 PM, Prakhya, Sai Praneeth
<[email protected]> wrote:
> 05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
>> > From: Sai Praneeth <[email protected]>
>> >
>> > Presently, efi_runtime_services() are executed by firmware in process
>> > context. To execute efi_runtime_service(), kernel switches the page
>> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
>> > any user space mappings. A potential issue could be, for instance, an
>> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
>>
>> It might be worth pointing out that this could result in disaster (e.g.
>> if the frame pointer happens to point at MMIO in the EFI runtime services
>> mappings).
>>
>
> Sorry! I didn't get it. I would like to add this point, so could you please
> explain it more?
>
>> [...]
>>
>> > pstore writes could potentially be invoked in interrupt context and it
>> > uses set_variable<>() and query_variable_info<>() to store logs. If we
>> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
>> > kernel issues a warning ("scheduling wile in atomic") and prints stack
>> > trace. One way to overcome this is to not make the caller process wait
>> > for the worker thread to finish. This approach breaks pstore i.e. the
>> > log messages aren't written to efi variables. Hence, pstore calls
>> > efi_runtime_services() without using efi_rts_wq or in other words
>> > efi_rts_wq will be used unconditionally for all the
>> > efi_runtime_services() except set_variable<>() and
>> > query_variable_info<>()
>>
>> Can't NMIs still come in during this?
>>
>> ... or do we assume that since something has already gone wrong, this doesn't
>> matter?
>>
>
> I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
> Since, we are still executing stuff to log messages and NMI's can't be masked,
> there is still a possibility for NMI's to occur (please correct me if I am wrong).
> But, as you said earlier, I guess it doesn't matter because anyways we are going down.

The problem is that we are not always in a "already going down"
condition for typical set_variable and query_variable_info calls. So
are we actually fixing anything with this patchset? In other words if
the NMI vs EFI mapping problem requires the workqueue context then we
can't have any EFI calls outside of that context. Am I missing how
this scheme addresses that problem?

2018-03-08 05:30:54

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

> >> > pstore writes could potentially be invoked in interrupt context and
> >> > it uses set_variable<>() and query_variable_info<>() to store logs.
> >> > If we invoke efi_runtime_services() through efi_rts_wq while in
> >> > atomic() kernel issues a warning ("scheduling wile in atomic") and
> >> > prints stack trace. One way to overcome this is to not make the
> >> > caller process wait for the worker thread to finish. This approach
> >> > breaks pstore i.e. the log messages aren't written to efi
> >> > variables. Hence, pstore calls
> >> > efi_runtime_services() without using efi_rts_wq or in other words
> >> > efi_rts_wq will be used unconditionally for all the
> >> > efi_runtime_services() except set_variable<>() and
> >> > query_variable_info<>()
> >>
> >> Can't NMIs still come in during this?
> >>
> >> ... or do we assume that since something has already gone wrong, this
> >> doesn't matter?
> >>
> >
> > I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
> > Since, we are still executing stuff to log messages and NMI's can't be
> > masked, there is still a possibility for NMI's to occur (please correct me if I am
> wrong).
> > But, as you said earlier, I guess it doesn't matter because anyways we are
> going down.
>
> The problem is that we are not always in a "already going down"
> condition for typical set_variable and query_variable_info calls.

That's correct.

> So are we actually fixing anything with this patchset?

When we are _not_ in interrupt context (eg: process context)
we still use efi_rts_wq to invoke *all* efi_runtime_services().
This solves *someone trying to access user space* while in EFI runtime services
mapping problem. A instance could be, some user space process requests us to
execute efi_runtime_service(), so, kernel switches to efi_pgd (which doesn’t have
user space part of process address space) and while executing efi_runtime_service()
perf NMI comes along to profile user data.

> In other words if the NMI vs EFI
> mapping problem requires the workqueue context then we can't have any EFI
> calls outside of that context. Am I missing how this scheme addresses that
> problem?

I think so, because, we are not trying to solve NMI vs EFI Runtime Service mappings.
AFAIK, they both work together (for x86_64). The problem is, as stated above,
"someone trying to access user space while executing EFI runtime services".
The problem exists because EFI runtime services mappings don’t have user space
part of process address space.

I think the problem still persists when we are already in interrupt context and then
we were requested to execute some efi_runtime_service() and then NMI happens
and that NMI handler touches user space.

Please let me know if my explanation didn’t make sense or if I misunderstood
your question.

Ard and Matt, please correct me if I stated something that is incorrect.

2018-03-08 05:33:23

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

> > +({ \
> > + struct efi_runtime_work efi_rts_work; \
> > + \
> > + INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
> > + efi_rts_work.func = _rts; \
> > + efi_rts_work.arg1 = _arg1; \
> > + efi_rts_work.arg2 = _arg2; \
> > + efi_rts_work.arg3 = _arg3; \
> > + efi_rts_work.arg4 = _arg4; \
> > + efi_rts_work.arg5 = _arg5; \
> > + /* \
> > + * queue_work() returns 0 if work was already on queue, \
> > + * _ideally_ this should never happen. \
> > + */ \
> > + if (queue_work(efi_rts_wq, &efi_rts_work.work))
> \
> > + flush_work(&efi_rts_work.work);
> \
> > + else \
> > + BUG(); \
>
> So failure to queue that work is such a critical problem that we need to BUG()
> and can't possibly continue and shoult not attempt recovery at all?
>

I think it's not critical, we can just return error status.
I think the problem in itself is not at all critical but when I initially thought about
why the problem could have occurred, it sounded like one i.e. ideally (if the system
is running fine) we should always be able to queue work. Failure to queue means
that the previous work is already on queue and that shouldn't be the case.
So, thought, maybe something bad had happened already (just doubtful).

But, I see your point. BUG() sounds more like an over kill. Instead of fixing an existing
problem, this patch could completely take down the system.

> IOW, we should always strive to fail gracefully and not shit in pants at the first
> sign of trouble.
>

Yes, that makes sense. I will remove BUG() in V3 (in the two places that I introduced).

> Even checkpatch warns here:
>
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> rather than BUG() or BUG_ON()
> #184: FILE: drivers/firmware/efi/runtime-wrappers.c:92:
> + BUG(); \
>

Sure! I will fix this

>
> and by looking at the other output, you should run your patches through
> checkpatch. Some of the things make sense like:
>
> WARNING: quoted string split across lines
> #97: FILE: drivers/firmware/efi/efi.c:341:
> + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
> + "disabled.\n");
>
> for example.
>

I will fix this one too.

Another warning by checkpatch is "use of in_atomic() in drivers code"
Do you think it's OK to check if were are "in_atomic()" in drivers code.
I wasn't able to decide on other alternative, if possible, could you please suggest one?

Regards,
Sai

2018-03-08 05:41:21

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

+Cc Miguel Ojeda

> > > +({ \
> > > + struct efi_runtime_work efi_rts_work; \
> > > + \
> > > + INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
> > > + efi_rts_work.func = _rts; \
> > > + efi_rts_work.arg1 = _arg1; \
> > > + efi_rts_work.arg2 = _arg2; \
> > > + efi_rts_work.arg3 = _arg3; \
> > > + efi_rts_work.arg4 = _arg4; \
> > > + efi_rts_work.arg5 = _arg5; \
> > > + /* \
> > > + * queue_work() returns 0 if work was already on queue, \
> > > + * _ideally_ this should never happen. \
> > > + */ \
> > > + if (queue_work(efi_rts_wq, &efi_rts_work.work))
> > \
> > > + flush_work(&efi_rts_work.work);
> > \
> > > + else \
> > > + BUG(); \
> >
> > So failure to queue that work is such a critical problem that we need
> > to BUG() and can't possibly continue and shoult not attempt recovery at all?
> >
>
> I think it's not critical, we can just return error status.
> I think the problem in itself is not at all critical but when I initially thought about
> why the problem could have occurred, it sounded like one i.e. ideally (if the
> system is running fine) we should always be able to queue work. Failure to queue
> means that the previous work is already on queue and that shouldn't be the
> case.
> So, thought, maybe something bad had happened already (just doubtful).
>
> But, I see your point. BUG() sounds more like an over kill. Instead of fixing an
> existing problem, this patch could completely take down the system.
>
> > IOW, we should always strive to fail gracefully and not shit in pants
> > at the first sign of trouble.
> >
>
> Yes, that makes sense. I will remove BUG() in V3 (in the two places that I
> introduced).
>
> > Even checkpatch warns here:
> >
> > WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> > rather than BUG() or BUG_ON()
> > #184: FILE: drivers/firmware/efi/runtime-wrappers.c:92:
> > + BUG(); \
> >
>
> Sure! I will fix this
>
> >
> > and by looking at the other output, you should run your patches
> > through checkpatch. Some of the things make sense like:
> >
> > WARNING: quoted string split across lines
> > #97: FILE: drivers/firmware/efi/efi.c:341:
> > + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
> > + "disabled.\n");
> >
> > for example.
> >
>
> I will fix this one too.
>
> Another warning by checkpatch is "use of in_atomic() in drivers code"
> Do you think it's OK to check if were are "in_atomic()" in drivers code.
> I wasn't able to decide on other alternative, if possible, could you please suggest
> one?
>
> Regards,
> Sai

2018-03-08 07:45:03

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

Hello Sai,

On 5 March 2018 at 23:23, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Invoking efi_runtime_services() through efi_workqueue means all accesses
> to efi_runtime_services() should be done after efi_rts_wq has been
> created. efi_delete_dummy_variable() calls set_variable(), hence
> efi_delete_dummy_variable() should be called after efi_rts_wq has been
> created.
>
> efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
> which is early in the boot phase (efi_rts_wq isn't created yet), so call
> efi_delete_dummy_variable() later in the boot phase i.e. while
> initializing efi subsystem. In the next patch, this is the place where
> we create efi_rts_wq and all the efi_runtime_services() will be called
> using efi_rts_wq.
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Lee, Chun-Yi <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Cc: Ravi Shankar <[email protected]>
> Cc: Matt Fleming <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Dan Williams <[email protected]>
> ---
> arch/x86/include/asm/efi.h | 1 -
> arch/x86/platform/efi/efi.c | 6 ------
> drivers/firmware/efi/efi.c | 6 ++++++
> include/linux/efi.h | 3 +++
> 4 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index a399c1ebf6f0..43009e3f821b 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -143,7 +143,6 @@ extern void __init efi_runtime_update_mappings(void);
> extern void __init efi_dump_pagetable(void);
> extern void __init efi_apply_memmap_quirks(void);
> extern int __init efi_reuse_config(u64 tables, int nr_tables);
> -extern void efi_delete_dummy_variable(void);
>
> struct efi_setup_data {
> u64 fw_vendor;
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 9061babfbc83..a3169d14583f 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
>
> if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
> runtime_code_page_mkexec();
> -
> - /* clean DUMMY object */
> - efi_delete_dummy_variable();
> #endif
> }
>
> @@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
> * necessary relocation fixups for the new virtual addresses.
> */
> efi_runtime_update_mappings();
> -
> - /* clean DUMMY object */
> - efi_delete_dummy_variable();
> }
>
> void __init efi_enter_virtual_mode(void)
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index cd42f66a7c85..838b8efe639c 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
> if (!efi_enabled(EFI_BOOT))
> return 0;
>
> + /*
> + * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> + * it should be invoked only after efi_rts_workqueue is ready.
> + */
> + efi_delete_dummy_variable();
> +

Is there any way to keep this local to arch/x86?

> /* We register the efi directory at /sys/firmware/efi */
> efi_kobj = kobject_create_and_add("efi", firmware_kobj);
> if (!efi_kobj) {
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index f5083aa72eae..c4efb3ef0dfa 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
> unsigned long size,
> bool nonblocking);
> extern void efi_find_mirror(void);
> +extern void efi_delete_dummy_variable(void);
> #else
> static inline void efi_late_init(void) {}
> static inline void efi_free_boot_services(void) {}
> @@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
> {
> return EFI_SUCCESS;
> }
> +
> +static inline void efi_delete_dummy_variable(void) {}
> #endif
> extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
>
> --
> 2.7.4
>

2018-03-08 09:15:13

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Thu, Mar 8, 2018 at 5:22 AM, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> > +struct workqueue_struct *efi_rts_wq;
>> > +
>> > static bool disable_runtime;
>> > static int __init setup_noefi(char *arg) { @@ -329,6 +331,19 @@
>> > static int __init efisubsys_init(void)
>> > return 0;
>> >
>> > /*
>> > + * Since we process only one efi_runtime_service() at a time, an
>> > + * ordered workqueue (which creates only one execution context)
>> > + * should suffice all our needs.
>> > + */
>> > + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
>>
>> efi_rts_wq or efi_rts_workqueue?
>>
>> > + if (!efi_rts_wq) {
>> > + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
>>
>> Same here.
>
> Sure! I will make it consistent with "efi_rts_wq". Just tried to be more verbose
> with names :)
>

It is not a big deal, but using the exact same name is better for the
purposes of grepping and things like that :-) By the way, check the
commit title/message, there are some others there too.

> [...]
>
>> > +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
>> > +({ \
>> > + struct efi_runtime_work efi_rts_work; \
>> > + \
>> > + INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
>> > + efi_rts_work.func = _rts; \
>> > + efi_rts_work.arg1 = _arg1; \
>> > + efi_rts_work.arg2 = _arg2; \
>> > + efi_rts_work.arg3 = _arg3; \
>> > + efi_rts_work.arg4 = _arg4; \
>> > + efi_rts_work.arg5 = _arg5; \
>> > + /* \
>> > + * queue_work() returns 0 if work was already on queue, \
>> > + * _ideally_ this should never happen. \
>> > + */ \
>> > + if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
>> > + flush_work(&efi_rts_work.work); \
>> > + else \
>> > + BUG(); \
>>
>> Thanks for the change! One remark, I would just do:
>
> Sorry! but I am planning to remove BUG(). Looks like it could defeat the purpose
> of patch. Please see Boris comments on the other thread.

No problem. Let's see how it looks in v3 :-)

>
> [...]
>
>> > +/*
>> > + * efi_runtime_work: Details of EFI Runtime Service work
>> > + * @func: EFI Runtime Service function identifier
>> > + * @arg<1-5>: EFI Runtime Service function arguments
>> > + * @status: Status of executing EFI Runtime Service
>> > + */
>> > +struct efi_runtime_work {
>> > + u8 func;
>> > + void *arg1;
>> > + void *arg2;
>> > + void *arg3;
>> > + void *arg4;
>> > + void *arg5;
>> > + efi_status_t status;
>> > + struct work_struct work;
>> > +};
>>
>> Why is efi_runtime_work in the .h at all?
>>
>
> Thanks for the catch. I will move it to runtime-wrappers.c file and will make it
> static too. It isn't being used in any other place.
>
>> Please CC me for the next version! :-)
>
> Sure! Sorry for that. I should have done in V2.

Thanks!

Cheers,
Miguel

>
> Regards,
> Sai

2018-03-08 14:10:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Thu, Mar 08, 2018 at 05:31:03AM +0000, Prakhya, Sai Praneeth wrote:
> Another warning by checkpatch is "use of in_atomic() in drivers code"

I'm assuming it warns because you're touching files in drivers/ but the
efi fun is not really a driver...

But looking at patch 3, that thing looks like a real mess. Some of the
things - pstore, it seems - do stuff in atomic context and yet you want
to do efi stuff in a workqueue which doesn't stomach atomic context to
begin with.

So if you wanna do workqueue, you should make sure all efi stuff gets
delayed to process context and queued properly. For example, we log
MCEs from atomic context by putting them on a lockless buffer and then
kicking irq_work to queue the work when we return to process context.
Can you do something like that?

"Hence, pstore calls efi_runtime_services() without using efi_rts_wq" -
that doesn't sound like optimal design to me. I would try to shove them
all through the workqueue - not have exceptions.

Then this:

> A potential issue could be, for instance, an NMI interrupt (like perf)
> trying to profile some user data while in efi_pgd.

I can't understand.

How did we handle this until now and why is it a problem all of a
sudden?

Because I don't recall being unable to run perf while efi runtime
services are happening.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-03-08 17:07:31

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

> "Hence, pstore calls efi_runtime_services() without using efi_rts_wq" -
> that doesn't sound like optimal design to me. I would try to shove them
> all through the workqueue - not have exceptions.

But pstore is trying to save the last gasp dying words from a kernel that
has paniced. There isn't any guarantee that work queue will run. I think
it is reasonable to have some special case to make sure that we do save
the information. But perhaps that special case should be to have pstore
call directly into the guts of the EFI code and not worry about all these
fancy switches of "mm" etc.

This is true for the machine check logging case too, but the mitigation is
that the details of the error persist in the machine check banks across the
reset ... so all is not lost if the work queue isn't run here.

-Tony

2018-03-08 18:09:07

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

> > void __init efi_enter_virtual_mode(void) diff --git
> > a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index
> > cd42f66a7c85..838b8efe639c 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
> > if (!efi_enabled(EFI_BOOT))
> > return 0;
> >
> > + /*
> > + * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> > + * it should be invoked only after efi_rts_workqueue is ready.
> > + */
> > + efi_delete_dummy_variable();
> > +
>
> Is there any way to keep this local to arch/x86?
>

I think, we can definitely do that. I will retake a look at the possibilities
and will update this thread.

> > /* We register the efi directory at /sys/firmware/efi */
> > efi_kobj = kobject_create_and_add("efi", firmware_kobj);
> > if (!efi_kobj) {
> > diff --git a/include/linux/efi.h b/include/linux/efi.h index
> > f5083aa72eae..c4efb3ef0dfa 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32

2018-03-08 18:11:16

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

/*
> >> > + * Since we process only one efi_runtime_service() at a time, an
> >> > + * ordered workqueue (which creates only one execution context)
> >> > + * should suffice all our needs.
> >> > + */
> >> > + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue",
> >> > + 0);
> >>
> >> efi_rts_wq or efi_rts_workqueue?
> >>
> >> > + if (!efi_rts_wq) {
> >> > + pr_err("Failed to create efi_rts_workqueue, EFI runtime services "
> >>
> >> Same here.
> >
> > Sure! I will make it consistent with "efi_rts_wq". Just tried to be
> > more verbose with names :)
> >
>
> It is not a big deal, but using the exact same name is better for the purposes of
> grepping and things like that :-)

Yes, that makes sense.

> By the way, check the commit title/message, there are some others there too.

Sure! I will make changes to commit/title messages too.

Regards,
Sai

2018-03-09 02:39:18

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

> > Another warning by checkpatch is "use of in_atomic() in drivers code"
>
> I'm assuming it warns because you're touching files in drivers/ but the efi fun is
> not really a driver...

True! That makes sense :)

>
> But looking at patch 3, that thing looks like a real mess. Some of the things -
> pstore, it seems - do stuff in atomic context and yet you want to do efi stuff in a
> workqueue which doesn't stomach atomic context to begin with.
>
> So if you wanna do workqueue, you should make sure all efi stuff gets delayed
> to process context and queued properly. For example, we log MCEs from atomic
> context by putting them on a lockless buffer and then kicking irq_work to queue
> the work when we return to process context.
> Can you do something like that?
>

I think we can do this, it's is a good idea. I looked at this approach and saw that
in oops_end() function, part of arch/x86/kernel/dumpstack, between oops_exit()
and panic() (here we are not in atomic context, so, I think we can use work queues)
we could have something like efi_flush_buffer() which will flush the buffer and
queue the work to efi_rts_wq.

But, I guess, we have some downsides with this design:
1. We are doing this to have "no exceptions to use efi_rts_wq", but we will be making
the common case complicated. i.e. When a user requests to write some efi variable,
we will first write it to a buffer and then flush the buffer using efi_rts_wq. Instead, we
could have written the variable directly.
Maybe, you meant, we should use this buffer only while pstore and not during normal
case (which sounds reasonably OK).
2. It doesn't look rational that, when we are already going down, we schedule (because
we will be invoking efi_runtime_services() through work queue) to log some stuff.
Not, sure if that's happening in other parts of kernel or if it's OK to do that.

I will try the suggested approach and will keep this thread posted.

> "Hence, pstore calls efi_runtime_services() without using efi_rts_wq" - that
> doesn't sound like optimal design to me. I would try to shove them all through
> the workqueue - not have exceptions.
>

Alternatively, instead of playing around with in_atomic(), we could have wrapper
functions like efi_write_var_non_wq() which will only be used by pstore. This function
will not use efi_rts_wq and directly invoke efi_runtime_service. Just an attempt to
make the code not look messy.

> Then this:
>
> > A potential issue could be, for instance, an NMI interrupt (like perf)
> > trying to profile some user data while in efi_pgd.
>
> I can't understand.
>
> How did we handle this until now and why is it a problem all of a sudden?
>
> Because I don't recall being unable to run perf while efi runtime services are
> happening.
>

That's true! AFAIK, we don't have any issues handling NMI while in efi_pgd.
We might have issues only when, we are already in efi_pgd, NMI comes along
and NMI handler tries to touch the regions that are not mapped in efi_pgd
(Eg: User space part of process address space) and using kthread inherently
means that we will not have any user space.

Regards,
Sai

2018-03-09 10:59:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Thu, Mar 08, 2018 at 05:05:44PM +0000, Luck, Tony wrote:
> > "Hence, pstore calls efi_runtime_services() without using efi_rts_wq" -
> > that doesn't sound like optimal design to me. I would try to shove them
> > all through the workqueue - not have exceptions.
>
> But pstore is trying to save the last gasp dying words from a kernel that
> has paniced. There isn't any guarantee that work queue will run. I think
> it is reasonable to have some special case to make sure that we do save
> the information. But perhaps that special case should be to have pstore
> call directly into the guts of the EFI code and not worry about all these
> fancy switches of "mm" etc.

I guess...

> This is true for the machine check logging case too, but the mitigation is
> that the details of the error persist in the machine check banks across the
> reset ... so all is not lost if the work queue isn't run here.

Well, I'm still hoping to have this wonderful and reliable
logging facility one day where we can dump bytes into a
persistent-across-reboots buffer and analyze them later. And yap, in
that case, I don't mind if the code simply bypasses all the dancing in
the OS and writes those bytes. Even switching pagetables would be too
much for that case - just fire'n'forget writing from ring 0.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-03-09 11:13:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Fri, Mar 09, 2018 at 02:37:59AM +0000, Prakhya, Sai Praneeth wrote:
> But, I guess, we have some downsides with this design:
> 1. We are doing this to have "no exceptions to use efi_rts_wq", but we will be making
> the common case complicated. i.e. When a user requests to write some efi variable,

So if the pstore use case is so important and special, I think we should
make the EFI path as fast as possible as getting that data to the pstore
is a priority.

> Alternatively, instead of playing around with in_atomic(), we could have wrapper
> functions like efi_write_var_non_wq() which will only be used by pstore. This function
> will not use efi_rts_wq and directly invoke efi_runtime_service. Just an attempt to
> make the code not look messy.

I guess.

If the write-to-pstore case is such a critical one, I guess the
exception is justified.

> That's true! AFAIK, we don't have any issues handling NMI while in efi_pgd.
> We might have issues only when, we are already in efi_pgd, NMI comes along

Can you trigger this? Or is it something hypothetical?

> and NMI handler tries to touch the regions that are not mapped in efi_pgd

If it is not hypothetical, the NMI handler should learn to look at CR3
first and return if CR3 has the efi pgd.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-03-10 00:34:44

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

> > That's true! AFAIK, we don't have any issues handling NMI while in efi_pgd.
> > We might have issues only when, we are already in efi_pgd, NMI comes
> > along
>
> Can you trigger this? Or is it something hypothetical?
>

AFAIK, it's hypothetical. I did try to trigger the issue, but failed [1].
Maybe, I need to have some more constraints [2].

[1] https://lkml.org/lkml/2017/8/23/715
[2] https://lkml.org/lkml/2017/8/25/469

> > and NMI handler tries to touch the regions that are not mapped in
> > efi_pgd
>
> If it is not hypothetical, the NMI handler should learn to look at CR3 first and
> return if CR3 has the efi pgd.

This solution and it's variants were discussed here [1], [2] and for varied reasons
the community had decided to go with "Everything EFI as kthread" approach [3] [4].

Although the discussions were off my understanding, the present issue I see is,
(and also the motivation for me to do the patch is)
when a thread tries to execute any efi_runtime_service() we switch to efi_pgd
(which doesn't have user space mappings) and all other subsystems in kernel
aren't aware of this switch. This looks like a perfect case for kthread.

Kthread by definition doesn’t have user space mappings and if we run efi_runtime_services()
in a kthread context and if any other subsystem tries to access user space mappings
while in efi_kthread, it's terminally broken [5].

There were several issues Andy, Peter and Mark raised.
One such (hypothetical) case is accessing user space from the back of an interrupt (NMI).
Others include
1. Issue specific to ARM because it runs efi_runtime_services() with interrupts enabled [6]
2. Interrupt taken while mmap_sem() is held for write that tries to access user memory [7]
3. If EFI were to have IO memory mapped at a "user" address, perf could end up reading it [8]

[1] https://lkml.org/lkml/2017/8/15/757
[2] https://lkml.org/lkml/2017/8/16/487
[3] https://lkml.org/lkml/2017/8/21/573
[4] https://lkml.org/lkml/2017/8/16/540

[5] https://lkml.org/lkml/2017/8/17/667
[6] https://lkml.org/lkml/2017/8/16/176
[7] https://lkml.org/lkml/2017/8/17/667
[8] https://lkml.org/lkml/2017/8/21/427

Regards,
Sai

2018-03-14 17:42:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

On Sat, Mar 10, 2018 at 12:33:35AM +0000, Prakhya, Sai Praneeth wrote:
> Although the discussions were off my understanding, the present issue
> I see is, (and also the motivation for me to do the patch is) when a
> thread tries to execute any efi_runtime_service() we switch to efi_pgd
> (which doesn't have user space mappings) and all other subsystems in
> kernel aren't aware of this switch. This looks like a perfect case for
> kthread.

That's all fine and good but you need to be prepared and handle properly
an NMI while EFI is running.

I have no clue whether the platform delays delivery of NMIs during EFI
runtime services or whatever happens but you need to have all those
cases covered so that no monkey business happens.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.