2018-12-06 22:20:24

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v2 0/4] x86: Add vDSO exception fixup for SGX

This version is almost entirely about the vDSO function itself,
i.e. patch 4/4. Feel free to ignore patches 2/4 and 3/4, I need
to do (a lot) more legwork to address feedback and improve their
changelogs. I'm expecting that to take a fair amount of time and
wanted to get the alternative exit handler idea out there ASAP.

The new vDSO function builds but is otherwise completely untested.

v2:
- For all intents and purposes, rewrite the SGX vDSO function.
This version is quite a bit different than anything discussed in
the past. Rather than provide separate a separate function or an
explicit parameter to request ERESUME, e.g. to recover after a
fault, take an optional "exit handler" that provides the caller
the opportunity to specify if and how the enclave should be
resumed. More details in the changelog.

- Rename it to __vdso_sgx_enter_enclave() to abstract the details
of EENTER and ERESUME to some degree.

- Give the enclave RDI, RSI and RDX to pass data out of the enclave.

- Call fixup_vdso_exception() in do_int3().


v1: https://lkml.kernel.org/r/[email protected]


Sean Christopherson (4):
x86/vdso: Add support for exception fixup in vDSO functions
x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling
x86/traps: Attempt to fixup exceptions in vDSO before signaling
x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave
transitions

arch/x86/entry/vdso/Makefile | 5 +-
arch/x86/entry/vdso/extable.c | 37 +++++++
arch/x86/entry/vdso/extable.h | 17 ++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 +-
arch/x86/entry/vdso/vdso.lds.S | 1 +
arch/x86/entry/vdso/vdso2c.h | 58 +++++++++--
arch/x86/entry/vdso/vsgx_enter_enclave.c | 119 +++++++++++++++++++++++
arch/x86/include/asm/vdso.h | 5 +
arch/x86/kernel/traps.c | 15 +++
arch/x86/mm/fault.c | 7 ++
10 files changed, 262 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/entry/vdso/extable.c
create mode 100644 arch/x86/entry/vdso/extable.h
create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.c

--
2.19.2



2018-12-06 22:20:32

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v2 1/4] x86/vdso: Add support for exception fixup in vDSO functions

The basic concept and implementation is very similar to the kernel's
exception fixup mechanism. The key differences are that the kernel
handler is hardcoded and the fixup entry addresses are relative to
the overall table as opposed to individual entries.

Hardcoding the kernel handler avoids the need to figure out how to
get userspace code to point at a kernel function. Given that the
expected usage is to propagate information to userspace, dumping all
fault information into registers is likely the desired behavior for
the vast majority of yet-to-be-created functions. Use registers
DI, SI and DX to communicate fault information, which follows Linux's
ABI for register consumption and hopefully avoids conflict with
hardware features that might leverage the fixup capabilities, e.g.
register usage for SGX instructions was at least partially designed
with calling conventions in mind.

Making fixup addresses relative to the overall table allows the table
to be stripped from the final vDSO image (it's a kernel construct)
without complicating the offset logic, e.g. entry-relative addressing
would also need to account for the table's location relative to the
image.

Regarding stripping the table, modify vdso2c to extract the table from
the raw, a.k.a. unstripped, data and dump it as a standalone byte array
in the resulting .c file. The original base of the table, its length
and a pointer to the byte array are captured in struct vdso_image.
Alternatively, the table could be dumped directly into the struct,
but because the number of entries can vary per image, that would
require either hardcoding a max sized table into the struct definition
or defining the table as a flexible length array. The flexible length
array approach has zero benefits, e.g. the base/size are still needed,
and prevents reusing the extraction code, while hardcoding the max size
adds ongoing maintenance just to avoid exporting the explicit size.

The immediate use case is for Intel Software Guard Extensions (SGX).
SGX introduces a new CPL3-only "enclave" mode that runs as a sort of
black box shared object that is hosted by an untrusted "normal" CPl3
process.

Entering an enclave can only be done through SGX-specific instructions,
EENTER and ERESUME, and is a non-trivial process. Because of the
complexity of transitioning to/from an enclave, the vast majority of
enclaves are expected to utilize a library to handle the actual
transitions. This is roughly analogous to how e.g. libc implementations
are used by most applications.

Another crucial characteristic of SGX enclaves is that they can generate
exceptions as part of their normal (at least as "normal" as SGX can be)
operation that need to be handled *in* the enclave and/or are unique
to SGX.

And because they are essentially fancy shared objects, a process can
host any number of enclaves, each of which can execute multiple threads
simultaneously.

Putting everything together, userspace enclaves will utilize a library
that must be prepared to handle any and (almost) all exceptions any time
at least one thread may be executing in an enclave. Leveraging signals
to handle the enclave exceptions is unpleasant, to put it mildly, e.g.
the SGX library must constantly (un)register its signal handler based
on whether or not at least one thread is executing in an enclave, and
filter and forward exceptions that aren't related to its enclaves. This
becomes particularly nasty when using multiple levels of libraries that
register signal handlers, e.g. running an enclave via cgo inside of the
Go runtime.

Enabling exception fixup in vDSO allows the kernel to provide a vDSO
function that wraps the low-level transitions to/from the enclave, i.e.
the EENTER and ERESUME instructions. The vDSO function can intercept
exceptions that would otherwise generate a signal and return the fault
information directly to its caller, thus avoiding the need to juggle
signal handlers.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/entry/vdso/Makefile | 4 +-
arch/x86/entry/vdso/extable.c | 37 +++++++++++++++++
arch/x86/entry/vdso/extable.h | 17 ++++++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 ++++-
arch/x86/entry/vdso/vdso2c.h | 58 +++++++++++++++++++++++----
arch/x86/include/asm/vdso.h | 5 +++
6 files changed, 119 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/entry/vdso/extable.c
create mode 100644 arch/x86/entry/vdso/extable.h

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 141d415a8c80..eb543ee1bcec 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -20,7 +20,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o

# files to link into kernel
-obj-y += vma.o
+obj-y += vma.o extable.o
OBJECT_FILES_NON_STANDARD_vma.o := n

# vDSO images to build
@@ -115,7 +115,7 @@ $(obj)/%-x32.o: $(obj)/%.o FORCE

targets += vdsox32.lds $(vobjx32s-y)

-$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: OBJCOPYFLAGS := -S --remove-section __ex_table
$(obj)/%.so: $(obj)/%.so.dbg
$(call if_changed,objcopy)

diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c
new file mode 100644
index 000000000000..49284d560d36
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <asm/current.h>
+#include <asm/vdso.h>
+
+struct vdso_exception_table_entry {
+ int insn, fixup;
+};
+
+bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+ unsigned long error_code, unsigned long fault_addr)
+{
+ const struct vdso_image *image = current->mm->context.vdso_image;
+ const struct vdso_exception_table_entry *extable;
+ unsigned int nr_entries, i;
+ unsigned long base;
+
+ if (!current->mm->context.vdso)
+ return false;
+
+ base = (unsigned long)current->mm->context.vdso + image->extable_base;
+ nr_entries = image->extable_len / (sizeof(*extable));
+ extable = image->extable;
+
+ for (i = 0; i < nr_entries; i++) {
+ if (regs->ip == base + extable[i].insn) {
+ regs->ip = base + extable[i].fixup;
+ regs->di = trapnr;
+ regs->si = error_code;
+ regs->dx = fault_addr;
+ return true;
+ }
+ }
+
+ return false;
+}
diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
new file mode 100644
index 000000000000..0f4ef8f34f2b
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_EXTABLE_H
+#define __VDSO_EXTABLE_H
+
+/*
+ * Inject exception fixup for vDSO code. Unlike normal exception fixup,
+ * vDSO uses a dedicated handler the addresses are relative to the overall
+ * exception table, not each individual entry.
+ */
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to) \
+ ".pushsection __ex_table,\"a\"\n" \
+ ".balign 4\n" \
+ ".long (" #from ") - __ex_table\n" \
+ ".long (" #to ") - __ex_table\n" \
+ ".popsection\n"
+
+#endif /* __VDSO_EXTABLE_H */
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index acfd5ba7d943..f1b228e8e599 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -85,7 +85,12 @@ SECTIONS
* stuff that isn't used at runtime in between.
*/

- .text : { *(.text*) } :text =0x90909090,
+ .text : {
+ *(.text*)
+ *(.fixup)
+ } :text =0x90909090,
+
+

/*
* At the end so that eu-elflint stays happy when vdso2c strips
@@ -95,6 +100,8 @@ SECTIONS
.altinstructions : { *(.altinstructions) } :text
.altinstr_replacement : { *(.altinstr_replacement) } :text

+ __ex_table : { *(__ex_table) } :text
+
/DISCARD/ : {
*(.discard)
*(.discard.*)
diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index fa847a620f40..eca2f808bec3 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -5,6 +5,41 @@
* are built for 32-bit userspace.
*/

+static void BITSFUNC(copy)(FILE *outfile, const unsigned char *data, size_t len)
+{
+ size_t i;
+
+ for (i = 0; i < len; i++) {
+ if (i % 10 == 0)
+ fprintf(outfile, "\n\t");
+ fprintf(outfile, "0x%02X, ", (int)(data)[i]);
+ }
+}
+
+
+/*
+ * Extract a section from the input data into a standalone blob. Used to
+ * capture kernel-only data that needs to persist indefinitely, e.g. the
+ * exception fixup tables, but only in the kernel, i.e. the section can
+ * be stripped from the final vDSO image.
+ */
+static void BITSFUNC(extract)(const unsigned char *data, size_t data_len,
+ FILE *outfile, ELF(Shdr) *sec, const char *name)
+{
+ unsigned long offset;
+ size_t len;
+
+ offset = (unsigned long)GET_LE(&sec->sh_offset);
+ len = (size_t)GET_LE(&sec->sh_size);
+
+ if (offset + len > data_len)
+ fail("section to extract overruns input data");
+
+ fprintf(outfile, "static const unsigned char %s[%lu] = {", name, len);
+ BITSFUNC(copy)(outfile, data + offset, len);
+ fprintf(outfile, "\n};\n\n");
+}
+
static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
void *stripped_addr, size_t stripped_len,
FILE *outfile, const char *name)
@@ -14,9 +49,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
unsigned long mapping_size;
ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
int i;
- unsigned long j;
ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
- *alt_sec = NULL;
+ *alt_sec = NULL, *extable_sec = NULL;
ELF(Dyn) *dyn = 0, *dyn_end = 0;
const char *secstrings;
INT_BITS syms[NSYMS] = {};
@@ -78,6 +112,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
if (!strcmp(secstrings + GET_LE(&sh->sh_name),
".altinstructions"))
alt_sec = sh;
+ if (!strcmp(secstrings + GET_LE(&sh->sh_name), "__ex_table"))
+ extable_sec = sh;
}

if (!symtab_hdr)
@@ -149,13 +185,11 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile,
"static unsigned char raw_data[%lu] __ro_after_init __aligned(PAGE_SIZE) = {",
mapping_size);
- for (j = 0; j < stripped_len; j++) {
- if (j % 10 == 0)
- fprintf(outfile, "\n\t");
- fprintf(outfile, "0x%02X, ",
- (int)((unsigned char *)stripped_addr)[j]);
- }
+ BITSFUNC(copy)(outfile, stripped_addr, stripped_len);
fprintf(outfile, "\n};\n\n");
+ if (extable_sec)
+ BITSFUNC(extract)(raw_addr, raw_len, outfile,
+ extable_sec, "extable");

fprintf(outfile, "const struct vdso_image %s = {\n", name);
fprintf(outfile, "\t.data = raw_data,\n");
@@ -166,6 +200,14 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile, "\t.alt_len = %lu,\n",
(unsigned long)GET_LE(&alt_sec->sh_size));
}
+ if (extable_sec) {
+ fprintf(outfile, "\t.extable_base = %lu,\n",
+ (unsigned long)GET_LE(&extable_sec->sh_offset));
+ fprintf(outfile, "\t.extable_len = %lu,\n",
+ (unsigned long)GET_LE(&extable_sec->sh_size));
+ fprintf(outfile, "\t.extable = extable,\n");
+ }
+
for (i = 0; i < NSYMS; i++) {
if (required_syms[i].export && syms[i])
fprintf(outfile, "\t.sym_%s = %" PRIi64 ",\n",
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 27566e57e87d..1c8a6a8f7b59 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -15,6 +15,8 @@ struct vdso_image {
unsigned long size; /* Always a multiple of PAGE_SIZE */

unsigned long alt, alt_len;
+ unsigned long extable_base, extable_len;
+ const void *extable;

long sym_vvar_start; /* Negative offset to the vvar area */

@@ -45,6 +47,9 @@ extern void __init init_vdso_image(const struct vdso_image *image);

extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);

+extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+ unsigned long error_code,
+ unsigned long fault_addr);
#endif /* __ASSEMBLER__ */

#endif /* _ASM_X86_VDSO_H */
--
2.19.2


2018-12-06 22:20:37

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v2 3/4] x86/traps: Attempt to fixup exceptions in vDSO before signaling

Call fixup_vdso_exception() in all trap flows that generate signals to
userspace immediately prior to generating any such signal. If the
exception is fixed, return cleanly and do not generate a signal.

The goal of vDSO fixup is not to fixup all faults, nor is it to avoid
all signals, but rather to report faults directly to userspace when the
fault would otherwise directly result in a signal being sent to the
process.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/traps.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a7..f813481a85ff 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/mpx.h>
#include <asm/vm86.h>
#include <asm/umip.h>
+#include <asm/vdso.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -223,6 +224,10 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;

+ if (user_mode(regs) &&
+ fixup_vdso_exception(regs, trapnr, error_code, 0))
+ return 0;
+
return -1;
}

@@ -563,6 +568,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;

+ if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
+ return;
+
show_signal(tsk, SIGSEGV, "", desc, regs, error_code);

force_sig(SIGSEGV, tsk);
@@ -791,6 +799,10 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
goto exit;
}

+ if (user_mode(regs) &&
+ fixup_vdso_exception(regs, X86_TRAP_DB, error_code, 0))
+ goto exit;
+
if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
/*
* Historical junk that used to handle SYSENTER single-stepping.
@@ -854,6 +866,9 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
if (!si_code)
return;

+ if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+ return;
+
force_sig_fault(SIGFPE, si_code,
(void __user *)uprobe_get_trap_addr(regs), task);
}
--
2.19.2


2018-12-06 22:21:28

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v2 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling

Call fixup_vdso_exception() in the SIGSEGV and SIGBUS paths of the
page fault handler immediately prior to signaling. If the fault is
fixed, return cleanly and do not generate a signal.

The goal of vDSO fixup is not to fixup all faults, nor is it to avoid
all signals, but rather to report faults directly to userspace when the
fault would otherwise directly result in a signal being sent to the
process. For example, a page fault that points to valid user memory
that happened to be swapped out should not trigger fixup, i.e. should
not be reported to userspace.

In the SIGSEGV flow, make sure to call fixup_vdso_exception() after
the error code has been sanitized.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..ff1cb10858d5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
#include <asm/mmu_context.h> /* vma_pkey() */
#include <asm/efi.h> /* efi_recover_from_page_fault()*/
#include <asm/desc.h> /* store_idt(), ... */
+#include <asm/vdso.h> /* fixup_vdso_exception() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
if (address >= TASK_SIZE_MAX)
error_code |= X86_PF_PROT;

+ if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+ return;
+
if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);

@@ -1045,6 +1049,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
if (is_prefetch(regs, error_code, address))
return;

+ if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+ return;
+
set_signal_archinfo(address, error_code);

#ifdef CONFIG_MEMORY_FAILURE
--
2.19.2


2018-12-06 22:21:58

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
enclave mode that runs as a sort of black box shared object that is
hosted by an untrusted normal CPL3 process.

Enclave transitions have semantics that are a lovely blend of SYCALL,
SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
an enclave can only be done through SGX-specific instructions, EENTER
and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
RIP from R{B,C}X.

But in a faulting/interrupting scenario, enclave transitions act more
like VM-Exit and VMRESUME. Maintaining the black box nature of the
enclave means that hardware must automatically switch CPU context when
an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
or exception (exceptions are AEEs because asynchronous in this context
is relative to the enclave and not CPU execution, e.g. the enclave
doesn't get an opportunity to save/fuzz CPU state).

Like VM-Exits, all AEEs jump to a common location, referred to as the
Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
via register passed to EENTER/ERESUME, similar to how the hypervisor
specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
Resuming the enclave/VM after the exiting event is handled is done via
ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
the AEP which then ERESUMEs th enclave.

Enclaves also behave a bit like VMs in the sense that they can generate
exceptions as part of their normal operation that for all intents and
purposes need to handled in the enclave/VM. However, unlike VMX, SGX
doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
as doing so would circumvent the enclave's security. So to handle an
exception, the enclave must first be re-entered through the normal
EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
(VMRESUME behavior) after the source of the exception is resolved.

All of the above is just the tip of the iceberg when it comes to running
an enclave. But, SGX was designed in such a way that the host process
can utilize a library to build, launch and run an enclave. This is
roughly analogous to how e.g. libc implementations are used by most
applications so that the application can focus on its business logic.

The big gotcha is that because enclaves can generate *and* handle
exceptions, any SGX library must be prepared to handle nearly any
exception at any time (well, any time a thread is executing in an
enclave). In Linux, this means the SGX library must register a
signal handler in order to intercept relevant exceptions and forward
them to the enclave (or in some cases, take action on behalf of the
enclave). Unfortunately, Linux's signal mechanism doesn't mesh well
with libraries, e.g. signal handlers are process wide, are difficult
to chain, etc... This becomes particularly nasty when using multiple
levels of libraries that register signal handlers, e.g. running an
enclave via cgo inside of the Go runtime.

In comes vDSO to save the day. Now that vDSO can fixup exceptions,
add a function, __vdso_sgx_enter_enclave(), to wrap enclave transitions
and intercept any exceptions that occur when running the enclave.

__vdso_sgx_enter_enclave() accepts four parameters:

- A pointer to a Thread Control Structure (TCS). A TCS is a page
within the enclave that defines/tracks the context of an enclave
thread.

- An opaque pointer passed to the enclave via RDI, e.g. to marshal
data into the enclave.

- A pointer to a struct sgx_enclave_exit_info that is used to relay
exit/fault information back to the caller. The primary variable
in the exit info is the ENCLU leaf at the time of exit, which can
be queried to determine whether the enclave exited cleanly (EEXIT),
or took an exception (EENTER or ERESUME).

The exact leaf is captured, instead of e.g. a fault flag, so that
the caller can identity whether a fault occurred in the enclave or
on EENTER. A fault on EENTER generally means the enclave has died
and needs to be restarted.

On a clean EEXIT, registers RDI, RSI and RDX are captured as-is,
e.g. to pass data out of the enclave. On a fault that is reported
to the caller, the exit info is stuffed (by way of the vDSO fixup
handler) with the trapnr, error_code and address. Not all enclave
exits are reported to the caller, e.g. interrupts and faults that
are handled by the kernel do not trigger fixup and IRET back to
ENCLU[ERESUME], i.e. unconditionally resume the enclave.

- An optional exit handler that, when defined, is invoked prior to
returning. The callback provides the enclave's untrusted runtime an
opportunity to resolve a fault or service a remote procedure call
without losing its stack context.

In addition to allowing the runtime to do silly shenanigans with its
stack, e.g. pushing data onto the stack from within the enclave, the
handler approach preserves the full call stack for debugging purposes.
For example, to accept a new EPC page into the enclave, an enclave
with a single TCS must re-EENTER the enclave using the same TCS to
EACCEPT the new page prior to executing ERESUME to restart at the
fault context. The handler approach allows reentry without undoing
the original call stack, i.e. preserves the view of the original
interrupted call.

Note that this effectively requires userspace to implement an exit
handler if they want to support correctable enclave faults, as there
is no other way to request ERESUME.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/entry/vdso/vdso.lds.S | 1 +
arch/x86/entry/vdso/vsgx_enter_enclave.c | 119 +++++++++++++++++++++++
3 files changed, 121 insertions(+)
create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.c

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index eb543ee1bcec..8b530e20e8be 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y

# files to link into the vdso
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-$(VDSO64-y) += vsgx_enter_enclave.o

# files to link into kernel
obj-y += vma.o extable.o
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..50952a995a6c 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -25,6 +25,7 @@ VERSION {
__vdso_getcpu;
time;
__vdso_time;
+ __vdso_sgx_enter_enclave;
local: *;
};
}
diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.c b/arch/x86/entry/vdso/vsgx_enter_enclave.c
new file mode 100644
index 000000000000..896c2eb079bb
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2018 Intel Corporation.
+
+#include <uapi/linux/errno.h>
+#include <uapi/linux/types.h>
+
+#include "extable.h"
+
+/*
+ * The exit info/handler definitions will live elsewhere in the actual
+ * implementation, e.g. arch/x86/include/uapi/asm/sgx.h.
+ */
+struct sgx_enclave_exit_info {
+ __u32 leaf;
+ union {
+ struct {
+ __u16 trapnr;
+ __u16 error_code;
+ __u64 address;
+ } fault;
+ struct {
+ __u64 rdi;
+ __u64 rsi;
+ __u64 rdx;
+ } eexit;
+ };
+};
+
+typedef long (sgx_enclave_exit_handler)(struct sgx_enclave_exit_info *exit_info,
+ void *tcs, void *priv);
+
+/*
+ * ENCLU (ENCLave User) is an umbrella instruction for a variety of CPL3
+ * SGX functions, The ENCLU function that is executed is specified in EAX,
+ * with each function potentially having more leaf-specific operands beyond
+ * EAX. In the vDSO we're only concerned with the leafs that are used to
+ * transition to/from the enclave.
+ */
+enum sgx_enclu_leaf {
+ SGX_EENTER = 2,
+ SGX_ERESUME = 3,
+ SGX_EEXIT = 4,
+};
+
+notrace long __vdso_sgx_enter_enclave(void *tcs, void *priv,
+ struct sgx_enclave_exit_info *exit_info,
+ sgx_enclave_exit_handler *exit_handler)
+{
+ u64 rdi, rsi, rdx;
+ u32 leaf;
+
+ if (!tcs || !exit_info)
+ return -EINVAL;
+
+ leaf = SGX_EENTER;
+
+enter_enclave:
+ asm volatile(
+ /*
+ * When an event occurs in an enclave, hardware first exits the
+ * enclave to the AEP, switching CPU context along the way, and
+ * *then* delivers the event as usual. As part of the context
+ * switching, registers are loaded with synthetic state (except
+ * BP and SP, which are saved/restored). The defined synthetic
+ * state loads registers so that simply executing ENCLU will do
+ * ERESUME, e.g. RAX=4, RBX=TCS and RCX=AEP after an AEE. So,
+ * we only need to load RAX, RBX and RCX for the initial entry.
+ * The AEP can point at that same ENCLU, fixup will jump us out
+ * if an exception was unhandled.
+ */
+ " lea 1f(%%rip), %%rcx\n"
+ "1: enclu\n"
+ "2:\n"
+
+ ".pushsection .fixup, \"ax\" \n"
+ "3: jmp 2b\n"
+ ".popsection\n"
+ _ASM_VDSO_EXTABLE_HANDLE(1b, 3b)
+
+ : "=a"(leaf), "=D" (rdi), "=S" (rsi), "=d" (rdx)
+ : "a" (leaf), "b" (tcs), "D" (priv)
+ : "cc", "memory",
+ "rcx", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"
+ );
+
+ /*
+ * EEXIT means we left the assembly blob via EEXIT, anything else is
+ * an unhandled exception (handled exceptions and interrupts simply
+ * ERESUME from the AEP).
+ */
+ exit_info->leaf = leaf;
+ if (leaf == SGX_EEXIT) {
+ exit_info->eexit.rdi = rdi;
+ exit_info->eexit.rsi = rsi;
+ exit_info->eexit.rdx = rdx;
+ } else {
+ exit_info->fault.trapnr = rdi;
+ exit_info->fault.error_code = rsi;
+ exit_info->fault.address = rdx;
+ }
+
+ /*
+ * Invoke the caller's exit handler if one was provided. The return
+ * value tells us whether to re-enter the enclave (EENTER or ERESUME)
+ * or to return (EEXIT).
+ */
+ if (exit_handler) {
+ leaf = exit_handler(exit_info, tcs, priv);
+ if (leaf == SGX_EENTER || leaf == SGX_ERESUME)
+ goto enter_enclave;
+ if (leaf == SGX_EEXIT)
+ return 0;
+ return -EINVAL;
+ } else if (leaf != SGX_EEXIT) {
+ return -EFAULT;
+ }
+
+ return 0;
+}
--
2.19.2


2018-12-06 22:52:45

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Thu, Dec 6, 2018 at 2:19 PM Sean Christopherson
<[email protected]> wrote:
>
+
> + /*
> + * Invoke the caller's exit handler if one was provided. The return
> + * value tells us whether to re-enter the enclave (EENTER or ERESUME)
> + * or to return (EEXIT).
> + */
> + if (exit_handler) {
> + leaf = exit_handler(exit_info, tcs, priv);
> + if (leaf == SGX_EENTER || leaf == SGX_ERESUME)
> + goto enter_enclave;
> + if (leaf == SGX_EEXIT)
> + return 0;
> + return -EINVAL;
> + } else if (leaf != SGX_EEXIT) {
> + return -EFAULT;
> + }

This still seems overcomplicated to me. How about letting the
requested leaf (EENTER or ERESUME) be a parameter to the function and
then just returning here? As it stands, you're requiring any ERESUME
that gets issued (other than the implicit ones) to be issued in the
same call stack, which is very awkward if you're doing something like
forwarding the fault to a different task over a socket and then
waiting in epoll_wait() or similar before resuming the enclave.

--Andy

2018-12-07 16:34:58

by Dr. Greg

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Thu, Dec 06, 2018 at 02:19:22PM -0800, Sean Christopherson wrote:

Good morning, I hope the week is ending well for everyone.

The audience for the issues that Sean is addressing are the groups
that have developed and are delivering an Untrusted RunTime System
(URTS) as a component of SGX Platform SoftWare (PSW). At the current
time I believe that is Intel and us, although there may be stealth
initiatives as well.

Sean is obviously coordinating with and supporting the Intel SDK/PSW
team. SGX has now been in the wild for 2-3 years so there is work
other then the reference implementation in the field. The purpose of
this mail is to make sure that everyone understands the issues and
ramifications of changes that may end up in 'Flag Day' events, if
nothing else so we can get the best possible implementation put
forward.

Baidu and Fortanix are working on Trusted RunTime Systems (TRTS) based
on RUST, I believe, so this will affect them to the extent that they
are implementing their own low level enclave runtime support or they
may be simply building on top of the low level Intel TRTS. Perhaps
Jethro would comment on these issues if he could.

> Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
> enclave mode that runs as a sort of black box shared object that is
> hosted by an untrusted normal CPL3 process.
>
> Enclave transitions have semantics that are a lovely blend of SYCALL,
> SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
> an enclave can only be done through SGX-specific instructions, EENTER
> and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
> e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
> RIP from R{B,C}X.
>
> But in a faulting/interrupting scenario, enclave transitions act more
> like VM-Exit and VMRESUME. Maintaining the black box nature of the
> enclave means that hardware must automatically switch CPU context when
> an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
> or exception (exceptions are AEEs because asynchronous in this context
> is relative to the enclave and not CPU execution, e.g. the enclave
> doesn't get an opportunity to save/fuzz CPU state).
>
> Like VM-Exits, all AEEs jump to a common location, referred to as the
> Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
> via register passed to EENTER/ERESUME, similar to how the hypervisor
> specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
> Resuming the enclave/VM after the exiting event is handled is done via
> ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
> kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
> the AEP which then ERESUMEs th enclave.
>
> Enclaves also behave a bit like VMs in the sense that they can generate
> exceptions as part of their normal operation that for all intents and
> purposes need to handled in the enclave/VM. However, unlike VMX, SGX
> doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
> as doing so would circumvent the enclave's security. So to handle an
> exception, the enclave must first be re-entered through the normal
> EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
> (VMRESUME behavior) after the source of the exception is resolved.
>
> All of the above is just the tip of the iceberg when it comes to running
> an enclave. But, SGX was designed in such a way that the host process
> can utilize a library to build, launch and run an enclave. This is
> roughly analogous to how e.g. libc implementations are used by most
> applications so that the application can focus on its business
> logic.

Just to make sure we are on the same page.

When you refer to 'build' an enclave I assume you mean to construct
the enclave image from a compiled shared object file. Or are you
suggesting an environment where the library loads dynamically
generated object code into an enclave using Enclave Dynamic Memory
Management (EDMM)? Building, i.e. compiling and linking an enclave,
doesn't seem to be the province of library support.

Perhaps a more accurate phrase would be; 'to load, initialize and
execute an enclave image'.

To step back further and frame the issue most precisely. What the
VDSO work is proposing to support is shifting from a model where
applications 'own' enclaves to a model where dynamically linked shared
libraries 'own' enclaves, correct?

Currently, both your PSW and the one we deliver use SGX libraries to
support the loading, initialization and execution of an enclave. In
the Intel world, applications link against libsgx_urts.so, in our
world, applications link against the SGXfusion.a library in order to
get the enclave entry and exception handling code. In this model the
library code also provides a signal handler that analyzes the ucontext
structure in order to determine how it got called and to take
appropriate enclave exception handling actions.

With the VDSO model you are proposing an environment where library
developers can implement SGX/enclave based protections of code and
data which the actual application linking against the library would be
totally unaware of, correct?

> The big gotcha is that because enclaves can generate *and* handle
> exceptions, any SGX library must be prepared to handle nearly any
> exception at any time (well, any time a thread is executing in an
> enclave). In Linux, this means the SGX library must register a
> signal handler in order to intercept relevant exceptions and forward
> them to the enclave (or in some cases, take action on behalf of the
> enclave). Unfortunately, Linux's signal mechanism doesn't mesh well
> with libraries, e.g. signal handlers are process wide, are difficult
> to chain, etc... This becomes particularly nasty when using multiple
> levels of libraries that register signal handlers, e.g. running an
> enclave via cgo inside of the Go runtime.
>
> In comes vDSO to save the day. Now that vDSO can fixup exceptions,
> add a function, __vdso_sgx_enter_enclave(), to wrap enclave transitions
> and intercept any exceptions that occur when running the enclave.
>
> __vdso_sgx_enter_enclave() accepts four parameters:
>
> - A pointer to a Thread Control Structure (TCS). A TCS is a page
> within the enclave that defines/tracks the context of an enclave
> thread.
>
> - An opaque pointer passed to the enclave via RDI, e.g. to marshal
> data into the enclave.
>
> - A pointer to a struct sgx_enclave_exit_info that is used to relay
> exit/fault information back to the caller. The primary variable
> in the exit info is the ENCLU leaf at the time of exit, which can
> be queried to determine whether the enclave exited cleanly (EEXIT),
> or took an exception (EENTER or ERESUME).
>
> The exact leaf is captured, instead of e.g. a fault flag, so that
> the caller can identity whether a fault occurred in the enclave or
> on EENTER. A fault on EENTER generally means the enclave has died
> and needs to be restarted.
>
> On a clean EEXIT, registers RDI, RSI and RDX are captured as-is,
> e.g. to pass data out of the enclave. On a fault that is reported
> to the caller, the exit info is stuffed (by way of the vDSO fixup
> handler) with the trapnr, error_code and address. Not all enclave
> exits are reported to the caller, e.g. interrupts and faults that
> are handled by the kernel do not trigger fixup and IRET back to
> ENCLU[ERESUME], i.e. unconditionally resume the enclave.
>
> - An optional exit handler that, when defined, is invoked prior to
> returning. The callback provides the enclave's untrusted runtime an
> opportunity to resolve a fault or service a remote procedure call
> without losing its stack context.
>
> In addition to allowing the runtime to do silly shenanigans with its
> stack, e.g. pushing data onto the stack from within the enclave, the
> handler approach preserves the full call stack for debugging purposes.
> For example, to accept a new EPC page into the enclave, an enclave
> with a single TCS must re-EENTER the enclave using the same TCS to
> EACCEPT the new page prior to executing ERESUME to restart at the
> fault context. The handler approach allows reentry without undoing
> the original call stack, i.e. preserves the view of the original
> interrupted call.

To summarize succinctly, would it be correct to assert that there are
three possible advantages to the VDSO approach:

1.) Shared libraries can own enclaves without the knowledge of
applications.

2.) EDMM responses can be implemented more efficiently.

3.) Reduction in enclave attack surface.

With respect to point three, perhaps the most important attack on SGX
security guarantees to date has been documented in Lee et.al.'s
dark-ROP attack. A significant aspect of that attack was AEE based
probing of enclave execution. Do you have reflections with respect to
how the proposed archictecture would lessen or facilitate such
attacks?

The economics of software development seem to be motivating the use of
libOS approaches to porting applications to enclaves which has
significant implications with respect to AEE based probing attacks.

> Note that this effectively requires userspace to implement an exit
> handler if they want to support correctable enclave faults, as there
> is no other way to request ERESUME.

Once again to be clear for those of us that have investments in the
existing ABI.

I'm assuming that in the proposed model the URTS would interrogate the
VDSO to determine the availability of entry and exception handling
support and then setup the appropriate infrastructure and exit
handler? VDSO's are typically the domain of the system library.
Given the nature of SGX I couldn't even conceive of Glibc offering
support and, if it was acceptable to provide support, the potential
timeframe that would be involved in seeing deployment in the field.

As a result, do you anticipate the need for a 'flag day' with respect
to URTS/PSW/SDK support for all of this?

In addition, would you anticipate anything in the design that would be
problematic for environments where the application would choose to
implement an enclave in addition to linking against a library that
implements enclaves?

We will be interested in your reflections on the above issues.

Have a good weekend.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker
IDfusion, LLC
4206 N. 19th Ave. Implementing measured information privacy
Fargo, ND 58102 and integrity architectures.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: [email protected]
------------------------------------------------------------------------------
"My thoughts on the composition and effectiveness of the advisory
committee?

I think they are destined to accomplish about the same thing as what
you would get from locking 9 chimpanzees in a room with an armed
thermonuclear weapon and a can opener with orders to disarm it."
-- Dr. Greg Wettstein
Resurrection

2018-12-07 16:52:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

+Cc: linux-sgx, Haitao, Greg and Jethro

My apologies for neglecting to cc the SGX folks, original thread is here:

https://lkml.kernel.org/r/[email protected]

On Thu, Dec 06, 2018 at 02:50:01PM -0800, Andy Lutomirski wrote:
> On Thu, Dec 6, 2018 at 2:19 PM Sean Christopherson
> <[email protected]> wrote:
> >
> +
> > + /*
> > + * Invoke the caller's exit handler if one was provided. The return
> > + * value tells us whether to re-enter the enclave (EENTER or ERESUME)
> > + * or to return (EEXIT).
> > + */
> > + if (exit_handler) {
> > + leaf = exit_handler(exit_info, tcs, priv);
> > + if (leaf == SGX_EENTER || leaf == SGX_ERESUME)
> > + goto enter_enclave;
> > + if (leaf == SGX_EEXIT)
> > + return 0;
> > + return -EINVAL;
> > + } else if (leaf != SGX_EEXIT) {
> > + return -EFAULT;
> > + }
>
> This still seems overcomplicated to me. How about letting the
> requested leaf (EENTER or ERESUME) be a parameter to the function and
> then just returning here? As it stands, you're requiring any ERESUME
> that gets issued (other than the implicit ones) to be issued in the
> same call stack, which is very awkward if you're doing something like
> forwarding the fault to a different task over a socket and then
> waiting in epoll_wait() or similar before resuming the enclave.

Ah, yeah, wasn't thinking about usage models where the enclave could
get passed off to a different thread.

What about supporting both, i.e. keep the exit handler but make it 100%
optional? And simplify the exit_handler to effectively return a boolean,
i.e. "exit or continue".

Something like this:

notrace long __vdso_sgx_enter_enclave(u32 op, void *tcs, void *priv,
struct sgx_enclave_exit_info *exit_info,
sgx_enclave_exit_handler *exit_handler)
{
u64 rdi, rsi, rdx;
u32 leaf;
long ret;

if (!tcs || !exit_info)
return -EINVAL;

enter_enclave:
if (op != SGX_EENTER && op != SGX_ERESUME)
return -EINVAL;

<same core code>

/*
* Invoke the caller's exit handler if one was provided. The return
* value tells us whether to re-enter the enclave (EENTER or ERESUME)
* or to return (EEXIT).
*/
if (exit_handler) {
if (exit_handler(exit_info, tcs, priv)) {
op = exit_info->leaf;
goto enter_enclave;
}
}

if (exit_info->leaf == SGX_EEXIT)
return -EFAULT;

return 0;
}


I like that the exit handler allows userspace to trap/panic with the full
call stack in place, and in a dedicated path, i.e. outside of the basic
enter/exit code. An exit handler probably doesn't fundamentally change
what userspace can do with respect to debugging/reporting, but I think
it would actually simplify some userspace implementations, e.g. I'd use
it in my tests like so:

long fault_handler(struct sgx_enclave_exit_info *exit_info, void *tcs, void *priv)
{
if (exit_info->leaf == SGX_EEXIT)
return 0;

<report exception and die/hang>
}


2018-12-07 17:57:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 7, 2018 at 8:51 AM Sean Christopherson
<[email protected]> wrote:
>
> +Cc: linux-sgx, Haitao, Greg and Jethro
>
> My apologies for neglecting to cc the SGX folks, original thread is here:
>
> https://lkml.kernel.org/r/[email protected]
>
> On Thu, Dec 06, 2018 at 02:50:01PM -0800, Andy Lutomirski wrote:
> > On Thu, Dec 6, 2018 at 2:19 PM Sean Christopherson
> > <[email protected]> wrote:
> > >
> > +
> > > + /*
> > > + * Invoke the caller's exit handler if one was provided. The return
> > > + * value tells us whether to re-enter the enclave (EENTER or ERESUME)
> > > + * or to return (EEXIT).
> > > + */
> > > + if (exit_handler) {
> > > + leaf = exit_handler(exit_info, tcs, priv);
> > > + if (leaf == SGX_EENTER || leaf == SGX_ERESUME)
> > > + goto enter_enclave;
> > > + if (leaf == SGX_EEXIT)
> > > + return 0;
> > > + return -EINVAL;
> > > + } else if (leaf != SGX_EEXIT) {
> > > + return -EFAULT;
> > > + }
> >
> > This still seems overcomplicated to me. How about letting the
> > requested leaf (EENTER or ERESUME) be a parameter to the function and
> > then just returning here? As it stands, you're requiring any ERESUME
> > that gets issued (other than the implicit ones) to be issued in the
> > same call stack, which is very awkward if you're doing something like
> > forwarding the fault to a different task over a socket and then
> > waiting in epoll_wait() or similar before resuming the enclave.
>
> Ah, yeah, wasn't thinking about usage models where the enclave could
> get passed off to a different thread.
>
> What about supporting both, i.e. keep the exit handler but make it 100%
> optional? And simplify the exit_handler to effectively return a boolean,
> i.e. "exit or continue".
>
> Something like this:
>
> notrace long __vdso_sgx_enter_enclave(u32 op, void *tcs, void *priv,
> struct sgx_enclave_exit_info *exit_info,
> sgx_enclave_exit_handler *exit_handler)
> {
> u64 rdi, rsi, rdx;
> u32 leaf;
> long ret;
>
> if (!tcs || !exit_info)
> return -EINVAL;
>
> enter_enclave:
> if (op != SGX_EENTER && op != SGX_ERESUME)
> return -EINVAL;
>
> <same core code>
>
> /*
> * Invoke the caller's exit handler if one was provided. The return
> * value tells us whether to re-enter the enclave (EENTER or ERESUME)
> * or to return (EEXIT).
> */
> if (exit_handler) {
> if (exit_handler(exit_info, tcs, priv)) {
> op = exit_info->leaf;
> goto enter_enclave;
> }
> }
>
> if (exit_info->leaf == SGX_EEXIT)
> return -EFAULT;
>
> return 0;
> }
>
>
> I like that the exit handler allows userspace to trap/panic with the full
> call stack in place, and in a dedicated path, i.e. outside of the basic
> enter/exit code. An exit handler probably doesn't fundamentally change
> what userspace can do with respect to debugging/reporting, but I think
> it would actually simplify some userspace implementations, e.g. I'd use
> it in my tests like so:
>
> long fault_handler(struct sgx_enclave_exit_info *exit_info, void *tcs, void *priv)
> {
> if (exit_info->leaf == SGX_EEXIT)
> return 0;
>
> <report exception and die/hang>
> }
>

Hmm. That't not totally silly, although you could accomplish almost
the same thing by wrapping the vDSO helper in another function.

2018-12-07 18:06:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 7, 2018 at 8:31 AM Dr. Greg <[email protected]> wrote:
>
> On Thu, Dec 06, 2018 at 02:19:22PM -0800, Sean Christopherson wrote:
>
> Good morning, I hope the week is ending well for everyone.
>
> The audience for the issues that Sean is addressing are the groups
> that have developed and are delivering an Untrusted RunTime System
> (URTS) as a component of SGX Platform SoftWare (PSW). At the current
> time I believe that is Intel and us, although there may be stealth
> initiatives as well.
>
> Sean is obviously coordinating with and supporting the Intel SDK/PSW
> team. SGX has now been in the wild for 2-3 years so there is work
> other then the reference implementation in the field. The purpose of
> this mail is to make sure that everyone understands the issues and
> ramifications of changes that may end up in 'Flag Day' events, if
> nothing else so we can get the best possible implementation put
> forward.
>
> Baidu and Fortanix are working on Trusted RunTime Systems (TRTS) based
> on RUST, I believe, so this will affect them to the extent that they
> are implementing their own low level enclave runtime support or they
> may be simply building on top of the low level Intel TRTS. Perhaps
> Jethro would comment on these issues if he could.
>
> > Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
> > enclave mode that runs as a sort of black box shared object that is
> > hosted by an untrusted normal CPL3 process.
> >
> > Enclave transitions have semantics that are a lovely blend of SYCALL,
> > SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
> > an enclave can only be done through SGX-specific instructions, EENTER
> > and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
> > e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
> > RIP from R{B,C}X.
> >
> > But in a faulting/interrupting scenario, enclave transitions act more
> > like VM-Exit and VMRESUME. Maintaining the black box nature of the
> > enclave means that hardware must automatically switch CPU context when
> > an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
> > or exception (exceptions are AEEs because asynchronous in this context
> > is relative to the enclave and not CPU execution, e.g. the enclave
> > doesn't get an opportunity to save/fuzz CPU state).
> >
> > Like VM-Exits, all AEEs jump to a common location, referred to as the
> > Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
> > via register passed to EENTER/ERESUME, similar to how the hypervisor
> > specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
> > Resuming the enclave/VM after the exiting event is handled is done via
> > ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
> > kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
> > the AEP which then ERESUMEs th enclave.
> >
> > Enclaves also behave a bit like VMs in the sense that they can generate
> > exceptions as part of their normal operation that for all intents and
> > purposes need to handled in the enclave/VM. However, unlike VMX, SGX
> > doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
> > as doing so would circumvent the enclave's security. So to handle an
> > exception, the enclave must first be re-entered through the normal
> > EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
> > (VMRESUME behavior) after the source of the exception is resolved.
> >
> > All of the above is just the tip of the iceberg when it comes to running
> > an enclave. But, SGX was designed in such a way that the host process
> > can utilize a library to build, launch and run an enclave. This is
> > roughly analogous to how e.g. libc implementations are used by most
> > applications so that the application can focus on its business
> > logic.
>
> Just to make sure we are on the same page.
>
> When you refer to 'build' an enclave I assume you mean to construct
> the enclave image from a compiled shared object file. Or are you
> suggesting an environment where the library loads dynamically
> generated object code into an enclave using Enclave Dynamic Memory
> Management (EDMM)? Building, i.e. compiling and linking an enclave,
> doesn't seem to be the province of library support.
>
> Perhaps a more accurate phrase would be; 'to load, initialize and
> execute an enclave image'.
>
> To step back further and frame the issue most precisely. What the
> VDSO work is proposing to support is shifting from a model where
> applications 'own' enclaves to a model where dynamically linked shared
> libraries 'own' enclaves, correct?

Or just a model where an application can own an enclave but not need
to register a process-wide SIGSEGV handler. The current model where
try_init_enclave registers a signal handler is extremely impolite.

>
> With the VDSO model you are proposing an environment where library
> developers can implement SGX/enclave based protections of code and
> data which the actual application linking against the library would be
> totally unaware of, correct?

That too.

> To summarize succinctly, would it be correct to assert that there are
> three possible advantages to the VDSO approach:
>
> 1.) Shared libraries can own enclaves without the knowledge of
> applications.

Yes.

>
> 2.) EDMM responses can be implemented more efficiently.

I don't know what that means. If an enclave is managing its own
memory by asking the untrusted runtime to forward exceptions to it,
this seems like a lovely attack surface.

>
> 3.) Reduction in enclave attack surface.
>
> With respect to point three, perhaps the most important attack on SGX
> security guarantees to date has been documented in Lee et.al.'s
> dark-ROP attack. A significant aspect of that attack was AEE based
> probing of enclave execution. Do you have reflections with respect to
> how the proposed archictecture would lessen or facilitate such
> attacks?

No effect whatsoever. This is an ISA design issue and the untrusted
code has nothing to do with it.

Personally, my opinion is that, if the hardware permits an attack
channel against an enclave, it's in the best interest of everyone for
Linux to make that attack channel available to the greatest extent
possible. This way no one says "well, my enclave is secure under
Linux, so no big deal.)

>
> The economics of software development seem to be motivating the use of
> libOS approaches to porting applications to enclaves which has
> significant implications with respect to AEE based probing attacks.

That sounds like a generally poor idea. Maybe it's economically
reasonable, but enclaves really ought not to be that big or
complicated.


>
> > Note that this effectively requires userspace to implement an exit
> > handler if they want to support correctable enclave faults, as there
> > is no other way to request ERESUME.
>
> Once again to be clear for those of us that have investments in the
> existing ABI.
>
> I'm assuming that in the proposed model the URTS would interrogate the
> VDSO to determine the availability of entry and exception handling
> support and then setup the appropriate infrastructure and exit
> handler?

...

> As a result, do you anticipate the need for a 'flag day' with respect
> to URTS/PSW/SDK support for all of this?
>

There will be a flag day when the upstream driver lands. It would be
great if the vDSO code lands in the same kernel so it's always
available.

> In addition, would you anticipate anything in the design that would be
> problematic for environments where the application would choose to
> implement an enclave in addition to linking against a library that
> implements enclaves?


Nope, should be fine.

2018-12-07 18:17:36

by Jethro Beekman

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On 2018-12-07 03:49, Sean Christopherson wrote:

...

> diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.c b/arch/x86/entry/vdso/vsgx_enter_enclave.c
> new file mode 100644
> index 000000000000..896c2eb079bb
> --- /dev/null
> +++ b/arch/x86/entry/vdso/vsgx_enter_enclave.c

...

> +enter_enclave:
> + asm volatile(
> + /*
> + * When an event occurs in an enclave, hardware first exits the
> + * enclave to the AEP, switching CPU context along the way, and
> + * *then* delivers the event as usual. As part of the context
> + * switching, registers are loaded with synthetic state (except
> + * BP and SP, which are saved/restored). The defined synthetic
> + * state loads registers so that simply executing ENCLU will do
> + * ERESUME, e.g. RAX=4, RBX=TCS and RCX=AEP after an AEE. So,
> + * we only need to load RAX, RBX and RCX for the initial entry.
> + * The AEP can point at that same ENCLU, fixup will jump us out
> + * if an exception was unhandled.
> + */
> + " lea 1f(%%rip), %%rcx\n"
> + "1: enclu\n"
> + "2:\n"
> +
> + ".pushsection .fixup, \"ax\" \n"
> + "3: jmp 2b\n"
> + ".popsection\n"
> + _ASM_VDSO_EXTABLE_HANDLE(1b, 3b)
> +
> + : "=a"(leaf), "=D" (rdi), "=S" (rsi), "=d" (rdx)
> + : "a" (leaf), "b" (tcs), "D" (priv)
> + : "cc", "memory",
> + "rcx", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"
> + );

This is not sufficient to support the Fortanix SGX ABI calling
convention, which was designed to be mostly compatible with the SysV
64-bit calling convention. The following registers need to be passed in
to an enclave from userspace: RDI, RSI, RDX, R8, R9, R10. The following
registers need to be passed out from an enclave to userspace: RDI, RSI,
RDX, R8, R9.

You can find the ABI specification at
https://github.com/fortanix/rust-sgx/blob/master/doc/FORTANIX-SGX-ABI.md#enclave-calling-convention

--
Jethro Beekman | Fortanix


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2018-12-07 18:21:58

by Jethro Beekman

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On 2018-12-07 22:01, Dr. Greg wrote:
> Baidu and Fortanix are working on Trusted RunTime Systems (TRTS) based
> on RUST, I believe, so this will affect them to the extent that they
> are implementing their own low level enclave runtime support or they
> may be simply building on top of the low level Intel TRTS. Perhaps
> Jethro would comment on these issues if he could.

As far as I know, Baidu merely provides Rust bindings to the Intel SDK.
As far as our requirements, I've sent those in my previous email.

> I'm assuming that in the proposed model the URTS would interrogate the
> VDSO to determine the availability of entry and exception handling
> support and then setup the appropriate infrastructure and exit
> handler? VDSO's are typically the domain of the system library.
> Given the nature of SGX I couldn't even conceive of Glibc offering
> support and, if it was acceptable to provide support, the potential
> timeframe that would be involved in seeing deployment in the field.
>
> As a result, do you anticipate the need for a 'flag day' with respect
> to URTS/PSW/SDK support for all of this?

It is my understanding that the use of the vDSO enclave entry will be
optional. i.e., if your application/library/enclave combination installs
a signal handler and calls ENCLU directly, that would still work. Of
course, using the vDSO will be very strongly recommended.

--
Jethro Beekman | Fortanix


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2018-12-07 18:46:21

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On 12/7/18 10:15 AM, Jethro Beekman wrote:
> This is not sufficient to support the Fortanix SGX ABI calling
> convention, which was designed to be mostly compatible with the SysV
> 64-bit calling convention. The following registers need to be passed in
> to an enclave from userspace: RDI, RSI, RDX, R8, R9, R10. The following
> registers need to be passed out from an enclave to userspace: RDI, RSI,
> RDX, R8, R9.

Are you asking nicely to change the new Linux ABI to be consistent with
your existing ABI? Or, are you saying that the new ABI *must* be
compatible with this previous out-of-tree implementation?

2018-12-07 18:51:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions



> On Dec 7, 2018, at 10:44 AM, Dave Hansen <[email protected]> wrote:
>
>> On 12/7/18 10:15 AM, Jethro Beekman wrote:
>> This is not sufficient to support the Fortanix SGX ABI calling
>> convention, which was designed to be mostly compatible with the SysV
>> 64-bit calling convention. The following registers need to be passed in
>> to an enclave from userspace: RDI, RSI, RDX, R8, R9, R10. The following
>> registers need to be passed out from an enclave to userspace: RDI, RSI,
>> RDX, R8, R9.
>
> Are you asking nicely to change the new Linux ABI to be consistent with
> your existing ABI? Or, are you saying that the new ABI *must* be
> compatible with this previous out-of-tree implementation?

I think that allowing the enclave to return at least a few registers is quite reasonable, but I don’t have a strong opinion.

2018-12-07 19:04:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 07, 2018 at 09:56:09AM -0800, Andy Lutomirski wrote:
> On Fri, Dec 7, 2018 at 8:51 AM Sean Christopherson
> <[email protected]> wrote:
> > I like that the exit handler allows userspace to trap/panic with the full
> > call stack in place, and in a dedicated path, i.e. outside of the basic
> > enter/exit code. An exit handler probably doesn't fundamentally change
> > what userspace can do with respect to debugging/reporting, but I think
> > it would actually simplify some userspace implementations, e.g. I'd use
> > it in my tests like so:
> >
> > long fault_handler(struct sgx_enclave_exit_info *exit_info, void *tcs, void *priv)
> > {
> > if (exit_info->leaf == SGX_EEXIT)
> > return 0;
> >
> > <report exception and die/hang>
> > }
> >
>
> Hmm. That't not totally silly, although you could accomplish almost
> the same thing by wrapping the vDSO helper in another function.

True, but I think there's value in having the option to intercept an
exception at the exact(ish) point of failure, without having to return
up the stack.

The enclave has full access to the process' memory space, including the
untrsuted stack. It's not too far fetched to envision a scenario where
the enclave corrupts the stack even if the enclave isn't intentionally
using the stack, e.g. the host passes in variable that a points at the
stack instead of whatever memory is supposed to be shared between the
enclave and host. It'd be nice to have the ability to triage something
like that without having to e.g. configure breakpoints on the stack.

2018-12-07 19:26:01

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions


> On Dec 7, 2018, at 11:02 AM, Sean Christopherson <[email protected]> wrote:
>
>> On Fri, Dec 07, 2018 at 09:56:09AM -0800, Andy Lutomirski wrote:
>> On Fri, Dec 7, 2018 at 8:51 AM Sean Christopherson
>> <[email protected]> wrote:
>>> I like that the exit handler allows userspace to trap/panic with the full
>>> call stack in place, and in a dedicated path, i.e. outside of the basic
>>> enter/exit code. An exit handler probably doesn't fundamentally change
>>> what userspace can do with respect to debugging/reporting, but I think
>>> it would actually simplify some userspace implementations, e.g. I'd use
>>> it in my tests like so:
>>>
>>> long fault_handler(struct sgx_enclave_exit_info *exit_info, void *tcs, void *priv)
>>> {
>>> if (exit_info->leaf == SGX_EEXIT)
>>> return 0;
>>>
>>> <report exception and die/hang>
>>> }
>>>
>>
>> Hmm. That't not totally silly, although you could accomplish almost
>> the same thing by wrapping the vDSO helper in another function.
>
> True, but I think there's value in having the option to intercept an
> exception at the exact(ish) point of failure, without having to return
> up the stack.
>
> The enclave has full access to the process' memory space, including the
> untrsuted stack. It's not too far fetched to envision a scenario where
> the enclave corrupts the stack even if the enclave isn't intentionally
> using the stack, e.g. the host passes in variable that a points at the
> stack instead of whatever memory is supposed to be shared between the
> enclave and host. It'd be nice to have the ability to triage something
> like that without having to e.g. configure breakpoints on the stack.

Ah, I see. You’re saying that, if the non-enclave stare is corrupted such that RIP is okay and RSP still points somewhere reasonable but the return address is garbage, then we can at least get to the fault handler and print something? This only works if the fault handler pointer itself is okay, though, which somewhat limits the usefulness, given that its pointer is quite likely to be on the stack very close to the return address.

I really wish the ENCLU instruction had seen fit to preserve some registers.

2018-12-07 20:10:30

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 07, 2018 at 11:23:10AM -0800, Andy Lutomirski wrote:
>
> > On Dec 7, 2018, at 11:02 AM, Sean Christopherson <[email protected]> wrote:
> >
> >> On Fri, Dec 07, 2018 at 09:56:09AM -0800, Andy Lutomirski wrote:
> >> On Fri, Dec 7, 2018 at 8:51 AM Sean Christopherson
> >> <[email protected]> wrote:
> >>> I like that the exit handler allows userspace to trap/panic with the full
> >>> call stack in place, and in a dedicated path, i.e. outside of the basic
> >>> enter/exit code. An exit handler probably doesn't fundamentally change
> >>> what userspace can do with respect to debugging/reporting, but I think
> >>> it would actually simplify some userspace implementations, e.g. I'd use
> >>> it in my tests like so:
> >>>
> >>> long fault_handler(struct sgx_enclave_exit_info *exit_info, void *tcs, void *priv)
> >>> {
> >>> if (exit_info->leaf == SGX_EEXIT)
> >>> return 0;
> >>>
> >>> <report exception and die/hang>
> >>> }
> >>>
> >>
> >> Hmm. That't not totally silly, although you could accomplish almost
> >> the same thing by wrapping the vDSO helper in another function.
> >
> > True, but I think there's value in having the option to intercept an
> > exception at the exact(ish) point of failure, without having to return
> > up the stack.
> >
> > The enclave has full access to the process' memory space, including the
> > untrsuted stack. It's not too far fetched to envision a scenario where
> > the enclave corrupts the stack even if the enclave isn't intentionally
> > using the stack, e.g. the host passes in variable that a points at the
> > stack instead of whatever memory is supposed to be shared between the
> > enclave and host. It'd be nice to have the ability to triage something
> > like that without having to e.g. configure breakpoints on the stack.
>
> Ah, I see. You’re saying that, if the non-enclave stare is corrupted such
> that RIP is okay and RSP still points somewhere reasonable but the return
> address is garbage, then we can at least get to the fault handler and print
> something?

Yep. Even for something more subtle like GPR corruption it could dump the
entire call stack before attempting to return back up.

> This only works if the fault handler pointer itself is okay, though, which
> somewhat limits the usefulness, given that its pointer is quite likely to
> be on the stack very close to the return address.

Yeah, it's not a silver bullet by any means, but it does seem useful for at
least some scenarios. Even exploding when invoking the handler instead of
at a random point might prove useful, e.g. "calling my exit handler exploded,
maybe my enclave corrupted the stack!".

> I really wish the ENCLU instruction had seen fit to preserve some registers.

Speaking of preserving registers, the asm blob needs to mark RBX as
clobbered since it's modified for EEXIT.

2018-12-07 20:19:19

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions



> On Dec 7, 2018, at 12:09 PM, Sean Christopherson <[email protected]> wrote:
>
>> On Fri, Dec 07, 2018 at 11:23:10AM -0800, Andy Lutomirski wrote:
>>
>>>> On Dec 7, 2018, at 11:02 AM, Sean Christopherson <[email protected]> wrote:
>>>>
>>>> On Fri, Dec 07, 2018 at 09:56:09AM -0800, Andy Lutomirski wrote:
>>>> On Fri, Dec 7, 2018 at 8:51 AM Sean Christopherson
>>>> <[email protected]> wrote:
>>>>> I like that the exit handler allows userspace to trap/panic with the full
>>>>> call stack in place, and in a dedicated path, i.e. outside of the basic
>>>>> enter/exit code. An exit handler probably doesn't fundamentally change
>>>>> what userspace can do with respect to debugging/reporting, but I think
>>>>> it would actually simplify some userspace implementations, e.g. I'd use
>>>>> it in my tests like so:
>>>>>
>>>>> long fault_handler(struct sgx_enclave_exit_info *exit_info, void *tcs, void *priv)
>>>>> {
>>>>> if (exit_info->leaf == SGX_EEXIT)
>>>>> return 0;
>>>>>
>>>>> <report exception and die/hang>
>>>>> }
>>>>>
>>>>
>>>> Hmm. That't not totally silly, although you could accomplish almost
>>>> the same thing by wrapping the vDSO helper in another function.
>>>
>>> True, but I think there's value in having the option to intercept an
>>> exception at the exact(ish) point of failure, without having to return
>>> up the stack.
>>>
>>> The enclave has full access to the process' memory space, including the
>>> untrsuted stack. It's not too far fetched to envision a scenario where
>>> the enclave corrupts the stack even if the enclave isn't intentionally
>>> using the stack, e.g. the host passes in variable that a points at the
>>> stack instead of whatever memory is supposed to be shared between the
>>> enclave and host. It'd be nice to have the ability to triage something
>>> like that without having to e.g. configure breakpoints on the stack.
>>
>> Ah, I see. You’re saying that, if the non-enclave stare is corrupted such
>> that RIP is okay and RSP still points somewhere reasonable but the return
>> address is garbage, then we can at least get to the fault handler and print
>> something?
>
> Yep. Even for something more subtle like GPR corruption it could dump the
> entire call stack before attempting to return back up.
>
>> This only works if the fault handler pointer itself is okay, though, which
>> somewhat limits the usefulness, given that its pointer is quite likely to
>> be on the stack very close to the return address.
>
> Yeah, it's not a silver bullet by any means, but it does seem useful for at
> least some scenarios. Even exploding when invoking the handler instead of
> at a random point might prove useful, e.g. "calling my exit handler exploded,
> maybe my enclave corrupted the stack!".

Here’s another idea: calculate some little hash or other checksum of RSP, RBP, and perhaps a couple words on the stack, and do:

call __vdso_enclave_corrupted_state

If you get a mismatch after return. That function could be:

call __vdso_enclave_corrupted_state:
ud2

And now the debug trace makes it very clear what happened.

This may or may not be worth the effort. But ISTM the enclave is almost as likely to corrupt the host state and the. EEXIT as it is to corrupt the host state and then fault.

BTW, I read through Fortanix’s documentation of the Windows SGX calling conventions, and it seems to want RSI and RDI as out params. Letting the vDSO be used to invoke Windows-targeted enclaves seems useful.

>
>> I really wish the ENCLU instruction had seen fit to preserve some registers.
>
> Speaking of preserving registers, the asm blob needs to mark RBX as
> clobbered since it's modified for EEXIT.

Have fun with that. The x86_32 compiler seems to really like having its PIC register preserved, and you may get some lovely compiler errors.

2018-12-07 20:36:50

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 07, 2018 at 12:16:59PM -0800, Andy Lutomirski wrote:
>
> > On Dec 7, 2018, at 12:09 PM, Sean Christopherson <[email protected]> wrote:
> >
> > Speaking of preserving registers, the asm blob needs to mark RBX as
> > clobbered since it's modified for EEXIT.
>
> Have fun with that. The x86_32 compiler seems to really like having its
> PIC register preserved, and you may get some lovely compiler errors.

Tagentinally related, as-is the SGX vDSO is only compiled for x86_64
since CONFIG_SGX depends on CONFIG_X86_64. Mapping the EPC in 32-bit
mode complicates things and no one is asking for SGX support on 32-bit
builds, so...

2018-12-07 21:27:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 07, 2018 at 12:16:59PM -0800, Andy Lutomirski wrote:
>
>
> > On Dec 7, 2018, at 12:09 PM, Sean Christopherson <[email protected]> wrote:
> >
> >> On Fri, Dec 07, 2018 at 11:23:10AM -0800, Andy Lutomirski wrote:
> >>
> >> Ah, I see. You’re saying that, if the non-enclave stare is corrupted such
> >> that RIP is okay and RSP still points somewhere reasonable but the return
> >> address is garbage, then we can at least get to the fault handler and print
> >> something?
> >
> > Yep. Even for something more subtle like GPR corruption it could dump the
> > entire call stack before attempting to return back up.
> >
> >> This only works if the fault handler pointer itself is okay, though, which
> >> somewhat limits the usefulness, given that its pointer is quite likely to
> >> be on the stack very close to the return address.
> >
> > Yeah, it's not a silver bullet by any means, but it does seem useful for at
> > least some scenarios. Even exploding when invoking the handler instead of
> > at a random point might prove useful, e.g. "calling my exit handler exploded,
> > maybe my enclave corrupted the stack!".
>
> Here’s another idea: calculate some little hash or other checksum of
> RSP, RBP, and perhaps a couple words on the stack, and do:

Corrupting RSP and RBP as opposed to the stack memory seems much less
likely since the enclave would have to poke into the save state area.
And as much as I dislike the practice of intentionally manipulating
SSA.RSP, preventing the user from doing something because we're "helping"
doesn't seem right.

> call __vdso_enclave_corrupted_state
>
> If you get a mismatch after return. That function could be:
>
> call __vdso_enclave_corrupted_state:
> ud2
>
> And now the debug trace makes it very clear what happened.
>
> This may or may not be worth the effort.

Running a checksum on the stack for every exit doesn't seem like it'd
be worth the effort, especially since this type of bug should be quite
rare, at least in production environments.

If we want to pursue the checksum idea I think the easiest approach
would be to combine it with an exit_handler and do a simple check on
the handler. It'd be minimal overhead in the fast path and would flag
cases where invoking exit_handle() would explode, while deferring all
other checks to the user.

E.g. something like this:

diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.c b/arch/x86/entry/vdso/vsgx_enter_enclave.c
index d5145e5c5a54..c89dd3cd8da9 100644
--- a/arch/x86/entry/vdso/vsgx_enter_enclave.c
+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.c
@@ -42,10 +42,13 @@ enum sgx_enclu_leaf {
SGX_EEXIT = 4,
};

+#define VDSO_MAGIC 0xa5a5a5a5a5a5a5a5UL
+
notrace long __vdso_sgx_enter_enclave(u32 op, void *tcs, void *priv,
struct sgx_enclave_exit_info *exit_info,
sgx_enclave_exit_handler *exit_handler)
{
+ volatile unsigned long hash;
u64 rdi, rsi, rdx;
u32 leaf;
long ret;
@@ -53,6 +56,9 @@ notrace long __vdso_sgx_enter_enclave(u32 op, void *tcs, void *priv,
if (!tcs || !exit_info)
return -EINVAL;

+ /* Always hash the handler. XOR is much cheaper than Jcc. */
+ hash = (unsigned long)exit_handler ^ VDSO_MAGIC;
+
enter_enclave:
if (op != SGX_EENTER && op != SGX_ERESUME)
return -EINVAL;
@@ -107,6 +113,8 @@ notrace long __vdso_sgx_enter_enclave(u32 op, void *tcs, void *priv,
* or to return (EEXIT).
*/
if (exit_handler) {
+ if (hash != ((unsigned long)exit_handler ^ VDSO_MAGIC))
+ asm volatile("ud2\n");
if (exit_handler(exit_info, tcs, priv)) {
op = exit_info->leaf;
goto enter_enclave;

> But ISTM the enclave is almost as likely to corrupt the host state and
> the. EEXIT as it is to corrupt the host state and then fault.

Agreed, I would say even more likely. But the idea is that the
exit_handler is called on any exit, not just exceptions.

2018-12-07 23:35:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 7, 2018 at 1:26 PM Sean Christopherson
<[email protected]> wrote:
>
> On Fri, Dec 07, 2018 at 12:16:59PM -0800, Andy Lutomirski wrote:
> >
> >
> > > On Dec 7, 2018, at 12:09 PM, Sean Christopherson <[email protected]> wrote:
> > >
> > >> On Fri, Dec 07, 2018 at 11:23:10AM -0800, Andy Lutomirski wrote:
> > >>
> > >> Ah, I see. You’re saying that, if the non-enclave stare is corrupted such
> > >> that RIP is okay and RSP still points somewhere reasonable but the return
> > >> address is garbage, then we can at least get to the fault handler and print
> > >> something?
> > >
> > > Yep. Even for something more subtle like GPR corruption it could dump the
> > > entire call stack before attempting to return back up.
> > >
> > >> This only works if the fault handler pointer itself is okay, though, which
> > >> somewhat limits the usefulness, given that its pointer is quite likely to
> > >> be on the stack very close to the return address.
> > >
> > > Yeah, it's not a silver bullet by any means, but it does seem useful for at
> > > least some scenarios. Even exploding when invoking the handler instead of
> > > at a random point might prove useful, e.g. "calling my exit handler exploded,
> > > maybe my enclave corrupted the stack!".
> >
> > Here’s another idea: calculate some little hash or other checksum of
> > RSP, RBP, and perhaps a couple words on the stack, and do:
>
> Corrupting RSP and RBP as opposed to the stack memory seems much less
> likely since the enclave would have to poke into the save state area.
> And as much as I dislike the practice of intentionally manipulating
> SSA.RSP, preventing the user from doing something because we're "helping"
> doesn't seem right.
>
> > call __vdso_enclave_corrupted_state
> >
> > If you get a mismatch after return. That function could be:
> >
> > call __vdso_enclave_corrupted_state:
> > ud2
> >
> > And now the debug trace makes it very clear what happened.
> >
> > This may or may not be worth the effort.
>
> Running a checksum on the stack for every exit doesn't seem like it'd
> be worth the effort, especially since this type of bug should be quite
> rare, at least in production environments.
>
> If we want to pursue the checksum idea I think the easiest approach
> would be to combine it with an exit_handler and do a simple check on
> the handler. It'd be minimal overhead in the fast path and would flag
> cases where invoking exit_handle() would explode, while deferring all
> other checks to the user.

How about this variant?

#define MAGIC 0xaaaabbbbccccddddul
#define RETADDR_HASH ((unsigned long)__builtin_return_address(0) ^ MAGIC)

void foo(void)
{
volatile unsigned long hash = RETADDR_HASH;

/* placeholder for your actual code */
asm volatile ("nop");

if (hash != RETADDR_HASH)
asm volatile ("ud2");
}

But I have a real argument for dropping exit_handler: in this new age
of Spectre, the indirect call is a retpoline, and it's therefore quite
slow. So I'm not saying NAK, but I do think it's unnecessary.

I don't suppose you've spent a bunch of time programming in the
continuation-passing style? :)

2018-12-08 08:17:01

by Jethro Beekman

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On 2018-12-08 00:14, Dave Hansen wrote:
> On 12/7/18 10:15 AM, Jethro Beekman wrote:
>> This is not sufficient to support the Fortanix SGX ABI calling
>> convention, which was designed to be mostly compatible with the SysV
>> 64-bit calling convention. The following registers need to be passed in
>> to an enclave from userspace: RDI, RSI, RDX, R8, R9, R10. The following
>> registers need to be passed out from an enclave to userspace: RDI, RSI,
>> RDX, R8, R9.
>
> Are you asking nicely to change the new Linux ABI to be consistent with
> your existing ABI? Or, are you saying that the new ABI *must* be
> compatible with this previous out-of-tree implementation?

What's being discussed here is one of the alternatives for SGX fault
handling, meant to improve the current status quo of having to use a
signal handler.

I'm merely providing a data point that the currently proposed solution
is not sufficient to support current use of the (ring 3) ENCLU
instruction. You might find this useful in determining whether proposed
kernel features will actually be used by users, and in further
developing this solution or other solutions to the fault handling issue.

If going with the vDSO solution, I think something with semantics closer
to the actual instruction would be preferred, like the following:

notrace __attribute__((naked)) long __vdso_sgx_enclu_with_aep()
{
asm volatile(
" lea 2f(%%rip), %%rcx\n"
"1: enclu\n"
"2: ret\n"
".pushsection .fixup, \"ax\" \n"
"3: jmp 2b\n"
".popsection\n"
_ASM_VDSO_EXTABLE_HANDLE(1b, 3b)
:::
);
}

--
Jethro Beekman | Fortanix


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2018-12-11 19:33:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 07, 2018 at 03:33:57PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 7, 2018 at 1:26 PM Sean Christopherson
> <[email protected]> wrote:
> >
> > Running a checksum on the stack for every exit doesn't seem like it'd
> > be worth the effort, especially since this type of bug should be quite
> > rare, at least in production environments.
> >
> > If we want to pursue the checksum idea I think the easiest approach
> > would be to combine it with an exit_handler and do a simple check on
> > the handler. It'd be minimal overhead in the fast path and would flag
> > cases where invoking exit_handle() would explode, while deferring all
> > other checks to the user.
>
> How about this variant?
>
> #define MAGIC 0xaaaabbbbccccddddul
> #define RETADDR_HASH ((unsigned long)__builtin_return_address(0) ^ MAGIC)
>
> void foo(void)
> {
> volatile unsigned long hash = RETADDR_HASH;
>
> /* placeholder for your actual code */
> asm volatile ("nop");
>
> if (hash != RETADDR_HASH)
> asm volatile ("ud2");
> }
>
> But I have a real argument for dropping exit_handler: in this new age
> of Spectre, the indirect call is a retpoline, and it's therefore quite
> slow.

Technically slower, but would the extra CALL+RET pair even be noticeable
in the grand scheme of SGX?

> So I'm not saying NAK, but I do think it's unnecessary.

Ya, definitely not necessary, but it does allow userspace do things that
are otherwise cumbersome or impossible to do with the vanilla vDSO. How
much value that actually adds is another question...

2018-12-11 20:05:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Tue, Dec 11, 2018 at 11:31 AM Sean Christopherson
<[email protected]> wrote:
>
> On Fri, Dec 07, 2018 at 03:33:57PM -0800, Andy Lutomirski wrote:
> > On Fri, Dec 7, 2018 at 1:26 PM Sean Christopherson
> > <[email protected]> wrote:
> > >
> > > Running a checksum on the stack for every exit doesn't seem like it'd
> > > be worth the effort, especially since this type of bug should be quite
> > > rare, at least in production environments.
> > >
> > > If we want to pursue the checksum idea I think the easiest approach
> > > would be to combine it with an exit_handler and do a simple check on
> > > the handler. It'd be minimal overhead in the fast path and would flag
> > > cases where invoking exit_handle() would explode, while deferring all
> > > other checks to the user.
> >
> > How about this variant?
> >
> > #define MAGIC 0xaaaabbbbccccddddul
> > #define RETADDR_HASH ((unsigned long)__builtin_return_address(0) ^ MAGIC)
> >
> > void foo(void)
> > {
> > volatile unsigned long hash = RETADDR_HASH;
> >
> > /* placeholder for your actual code */
> > asm volatile ("nop");
> >
> > if (hash != RETADDR_HASH)
> > asm volatile ("ud2");
> > }
> >
> > But I have a real argument for dropping exit_handler: in this new age
> > of Spectre, the indirect call is a retpoline, and it's therefore quite
> > slow.
>
> Technically slower, but would the extra CALL+RET pair even be noticeable
> in the grand scheme of SGX?

But it's CALL, CALL, MOV to overwrite return address, intentionally
midpredicted RET, and RET because Spectre. That whole sequence seems
to be several tens of cycles, so it's a lot worse than just CALL+RET.
Whether it's noticeable overall is a fair question, though.

2018-12-11 22:01:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Tue, Dec 11, 2018 at 12:04:15PM -0800, Andy Lutomirski wrote:
> On Tue, Dec 11, 2018 at 11:31 AM Sean Christopherson
> <[email protected]> wrote:
> >
> > On Fri, Dec 07, 2018 at 03:33:57PM -0800, Andy Lutomirski wrote:
> > > On Fri, Dec 7, 2018 at 1:26 PM Sean Christopherson
> > > <[email protected]> wrote:
> > > >
> > > > Running a checksum on the stack for every exit doesn't seem like it'd
> > > > be worth the effort, especially since this type of bug should be quite
> > > > rare, at least in production environments.
> > > >
> > > > If we want to pursue the checksum idea I think the easiest approach
> > > > would be to combine it with an exit_handler and do a simple check on
> > > > the handler. It'd be minimal overhead in the fast path and would flag
> > > > cases where invoking exit_handle() would explode, while deferring all
> > > > other checks to the user.
> > >
> > > How about this variant?
> > >
> > > #define MAGIC 0xaaaabbbbccccddddul
> > > #define RETADDR_HASH ((unsigned long)__builtin_return_address(0) ^ MAGIC)
> > >
> > > void foo(void)
> > > {
> > > volatile unsigned long hash = RETADDR_HASH;
> > >
> > > /* placeholder for your actual code */
> > > asm volatile ("nop");
> > >
> > > if (hash != RETADDR_HASH)
> > > asm volatile ("ud2");
> > > }
> > >
> > > But I have a real argument for dropping exit_handler: in this new age
> > > of Spectre, the indirect call is a retpoline, and it's therefore quite
> > > slow.
> >
> > Technically slower, but would the extra CALL+RET pair even be noticeable
> > in the grand scheme of SGX?
>
> But it's CALL, CALL, MOV to overwrite return address, intentionally
> midpredicted RET, and RET because Spectre. That whole sequence seems
> to be several tens of cycles, so it's a lot worse than just CALL+RET.
> Whether it's noticeable overall is a fair question, though.

I was thinking of the case where the handler re-entered the enclave vs.
leaving and re-calling the vDSO, which would be RET+CALL and some other
stuff.

2018-12-11 23:24:03

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Tue, Dec 11, 2018 at 2:00 PM Sean Christopherson
<[email protected]> wrote:
>
> On Tue, Dec 11, 2018 at 12:04:15PM -0800, Andy Lutomirski wrote:
> > On Tue, Dec 11, 2018 at 11:31 AM Sean Christopherson
> > <[email protected]> wrote:
> > >
> > > On Fri, Dec 07, 2018 at 03:33:57PM -0800, Andy Lutomirski wrote:
> > > > On Fri, Dec 7, 2018 at 1:26 PM Sean Christopherson
> > > > <[email protected]> wrote:
> > > > >
> > > > > Running a checksum on the stack for every exit doesn't seem like it'd
> > > > > be worth the effort, especially since this type of bug should be quite
> > > > > rare, at least in production environments.
> > > > >
> > > > > If we want to pursue the checksum idea I think the easiest approach
> > > > > would be to combine it with an exit_handler and do a simple check on
> > > > > the handler. It'd be minimal overhead in the fast path and would flag
> > > > > cases where invoking exit_handle() would explode, while deferring all
> > > > > other checks to the user.
> > > >
> > > > How about this variant?
> > > >
> > > > #define MAGIC 0xaaaabbbbccccddddul
> > > > #define RETADDR_HASH ((unsigned long)__builtin_return_address(0) ^ MAGIC)
> > > >
> > > > void foo(void)
> > > > {
> > > > volatile unsigned long hash = RETADDR_HASH;
> > > >
> > > > /* placeholder for your actual code */
> > > > asm volatile ("nop");
> > > >
> > > > if (hash != RETADDR_HASH)
> > > > asm volatile ("ud2");
> > > > }
> > > >
> > > > But I have a real argument for dropping exit_handler: in this new age
> > > > of Spectre, the indirect call is a retpoline, and it's therefore quite
> > > > slow.
> > >
> > > Technically slower, but would the extra CALL+RET pair even be noticeable
> > > in the grand scheme of SGX?
> >
> > But it's CALL, CALL, MOV to overwrite return address, intentionally
> > midpredicted RET, and RET because Spectre. That whole sequence seems
> > to be several tens of cycles, so it's a lot worse than just CALL+RET.
> > Whether it's noticeable overall is a fair question, though.
>
> I was thinking of the case where the handler re-entered the enclave vs.
> leaving and re-calling the vDSO, which would be RET+CALL and some other
> stuff.

Fair enough, although the case where we do an EENTER, an AEP, a kernel
entry, an IRET, and an ERESUME will be so slow that the CALL+RET seems
even less relevant. The EENTER+EEXIT case at least avoids the round
trip through x86's amazingly performant exception handling mechanism
:)

2018-12-14 15:07:11

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Sat, Dec 08, 2018 at 08:15:38AM +0000, Jethro Beekman wrote:
> On 2018-12-08 00:14, Dave Hansen wrote:
> >On 12/7/18 10:15 AM, Jethro Beekman wrote:
> >>This is not sufficient to support the Fortanix SGX ABI calling
> >>convention, which was designed to be mostly compatible with the SysV
> >>64-bit calling convention. The following registers need to be passed in
> >>to an enclave from userspace: RDI, RSI, RDX, R8, R9, R10. The following
> >>registers need to be passed out from an enclave to userspace: RDI, RSI,
> >>RDX, R8, R9.
> >
> >Are you asking nicely to change the new Linux ABI to be consistent with
> >your existing ABI? Or, are you saying that the new ABI *must* be
> >compatible with this previous out-of-tree implementation?
>
> What's being discussed here is one of the alternatives for SGX fault
> handling, meant to improve the current status quo of having to use a signal
> handler.
>
> I'm merely providing a data point that the currently proposed solution is
> not sufficient to support current use of the (ring 3) ENCLU instruction. You
> might find this useful in determining whether proposed kernel features will
> actually be used by users, and in further developing this solution or other
> solutions to the fault handling issue.
>
> If going with the vDSO solution, I think something with semantics closer to
> the actual instruction would be preferred, like the following:
>
> notrace __attribute__((naked)) long __vdso_sgx_enclu_with_aep()
> {
> asm volatile(
> " lea 2f(%%rip), %%rcx\n"
> "1: enclu\n"
> "2: ret\n"
> ".pushsection .fixup, \"ax\" \n"
> "3: jmp 2b\n"
> ".popsection\n"
> _ASM_VDSO_EXTABLE_HANDLE(1b, 3b)
> :::
> );
> }

Part of me likes this idea, but it's a documentation nightmare since
it's a completely customer register ABI. And the caller's exception
handling gets a bit weird since RAX implicitly defines whether or not
an exception occurred. I also think there's value in making the vDSO
function callable from standard C.